Topic: Strings initialising character arrays


Author: schuenem@informatik.tu-muenchen.de (Ulf Schuenemann)
Date: 1995/06/20
Raw View

Not long ago I've read about the idea of using \c (?) at the end of a
string-literal to denote a character-array without a terminating NUL.
Thus

 char hello[5] = "hello\c";

would be ok. [ Whereas  hello[4] = "hello\c"  would still be illegal. ]


How about that, Bob Kline? "..\c" could give you the now missing 'elegant'
notation for character-arrays.
Yes, it still breaks code, but adding \c to the end is much easiser
than replacing it by a list of characters.
The IMHO important error-reducing disallowance of initstring-longer-than-array
can be preserved.
\c makes explicit that you don't want the terminating NUL,
that it's not an accident but done on purpose.


Ulf Schuenemann

--------------------------------------------------------------------
Ulf Sch   nemann
Fakult   t f   r Informatik, Technische Universit   t M   nchen, Germany.
email: schuenem@informatik.tu-muenchen.de





Author: mikec@fred.rchland.ibm.com (Michael Corrigan)
Date: 1995/06/20
Raw View
In article <3rqcc5$t4q@engnews2.Eng.Sun.COM>, clamage@Eng.Sun.COM (Steve Clamage) writes:
|> In article 22731@nlm.nih.gov, bkline@cortex.nlm.nih.gov (Bob Kline Phoenix Contract) writes:

---snip---

|> >  And I can write checks like
|> >
|> >    if (memcmp(block, sig, sizeof sig)) { ....
|> >
|> >which is preferable (if from no other perspective than performance)
|> >to
|> >
|> >    if (memcmp(block, sig, strlen(sig))) { ...
|>
|> But if you use a null-terminated sig, you can just write
|>
|>  if (strcmp(block, sig)) { ...
|>
|> which is at least as elegant, and the runtime code is at least as fast and
|> as small as the memcmp you were using before.
|>

For any decent compiler, strcmp(block, sig) should never be faster
than memcmp(block, sig, sizeof sig).  In fact, on most machines
when the length is a constant, as in this case, memcmp could/should be
significantly faster.

OTOH, one could always use memcmp when the length is known anyway.

---snip---


Mike Corrigan
corrigan@vnet.ibm.com





Author: A.Main@dcs.warwick.ac.uk (Zefram)
Date: 1995/06/17
Raw View
Steve Clamage <clamage@Eng.Sun.COM> wrote:
>Yes, it breaks existing code. Every language feature is disliked by
>someone, whether it breaks existing code or not. Removing this
>incompatibility with C would mean persuading the majority of the C++
>Committee that the amount of broken code (even given the other, much
>worse, incompatibilities that definitely will not be removed)
>outweighs the advantages of preserving the type system.

That paragraph seems to summarise your argument on this matter.  You
seem to be of the opinion that (a) existing ANSI C code that uses
non-terminated strings to initialise character arrays is broken; and
(b) the language feature that allows this compromises type safety.

On the first of these points, I can only say that I do not consider
such code to be broken.  I consider it a matter of style whether one
uses this language feature or not.  It is not the place of a
standardisation committee to dictate coding style.

Leaving aside the matter of the numerous features of C that have been
retained that actually *do* compromise type safety -- such as weak
typedefing -- I disagree with your statement.  We are discussing the
initialisation of a *character array*, not a string.  The fact that a
character array can be initialised with a string is a convenient
holdover from the C string model, i.e. strings being arrays of
characters.  But initialising with what is lexically a string doesn't
turn the array into a string.  It is most definitely an array of
characters.  If it is initialised with a string that needs to be
truncated in order to fit the array, it is still an array of
characters.

As you have pointed out, character arrays can be used as strings for
some purposes, and if an array is used in this way it needs to contain
a NUL, which initialisation with a non-truncated string will
conveniently provide.  But even when used as strings, they are still
really arrays of characters.  *This* is type unsafe, if one wishes to
draw such a strong distinction between the two types in question.  Of
course, because strings (of the character pointer type) and character
arrays can be manipulated in some of the same ways, andare, for many
purposes, interchangeable, there will always be this type unsafety.
This is what string classes are for.

-zefram





Author: bkline@cortex.nlm.nih.gov (Bob Kline Phoenix Contract)
Date: 1995/06/15
Raw View
Steve Clamage (clamage@Eng.Sun.COM) wrote:

: The case under discussion was C++ disallowing a shorthand notation
: for a potentially dangerous and error-prone situation like
:  char hello1[5] = "hello"; /* ok in C, but not in C++ */
: If you really want a 5-character array with no terminating null
: you can still get it:
:  char hello2[5] = { 'h', 'e', 'l', 'l', 'o' };
: C++ just makes it harder to get one accidently. The judgement (and
: it is only a judgement) is that arrays like hello1 are much more
: likely to be written in error than on purpose.

FWIW, I'd like to add my vote (I'll do so formally as well) to those
who feel that this incompatability is gratuitous and breaks more
existing code than it is worth (including some of my own :->}).

For example, I have some code which I use in various incarnations
for debugging, and which places a 16-byte signature marker around
blocks of memory to check for memory corruption.  Its definition
looks like this:

    static char sig[16] = "SignatureMarker";

which is much more elegant IMO (and easier to write) than

    static char sig[16] = { 'S', 'i', 'g', 'n', 'a', 't', 'u', 'r',
 'e', 'M', 'a', 'r', 'k', 'e', 'r' };

if you ask me.  And I can write checks like

    if (memcmp(block, sig, sizeof sig)) { ....

which is preferable (if from no other perspective than performance)
to

    if (memcmp(block, sig, strlen(sig))) { ...

which I would have to switch to if the definition of sig has to be

    static char sig[] = "SignatureMarker";

(yes, I know I can switch to 'sizeof sig - 1', but it's still not
as elegant, and I'm not sure I'm guaranteed by the standard that this
will always produce 16; next best would be static char sig[17] = "...";
... sizeof sig - 1).

Ignoring all arguments in each direction for which of the available
approaches is "better" the bottom line is that the incompatability
breaks existing code.

I understand the concern for protecting unsuspecting programmers from
their own mistakes when a language feature presents an especially nasty
and insidious trap, and I also understand the need for balance between
this concern and the requirement to break as little existing code (and
introduce as few incompatibilities with C) as possible.  In most cases
the committee (and Dr. Stroustrup) appear to have made the best choice.
I just don't believe that the right call was made in this case.

--
/*----------------------------------------------------------------------*/
/* Bob Kline                                       Stream International */
/* bob_kline@stream.com               formerly Corporate Software, Inc. */
/* voice: (703) 522-0820 x-311                      fax: (703) 522-5407 */
/*----------------------------------------------------------------------*/





Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/06/15
Raw View
In article 22731@nlm.nih.gov, bkline@cortex.nlm.nih.gov (Bob Kline Phoenix Contract) writes:
>Steve Clamage (clamage@Eng.Sun.COM) wrote:
>
>: The case under discussion was C++ disallowing a shorthand notation
>: for a potentially dangerous and error-prone situation like
>:  char hello1[5] = "hello"; /* ok in C, but not in C++ */
>: If you really want a 5-character array with no terminating null
>: you can still get it:
>:  char hello2[5] = { 'h', 'e', 'l', 'l', 'o' };
>: C++ just makes it harder to get one accidently. The judgement (and
>: it is only a judgement) is that arrays like hello1 are much more
>: likely to be written in error than on purpose.
>
>FWIW, I'd like to add my vote (I'll do so formally as well) to those
>who feel that this incompatability is gratuitous and breaks more
>existing code than it is worth (including some of my own :->}).
>
>For example, I have some code which I use in various incarnations
>for debugging, and which places a 16-byte signature marker around
>blocks of memory to check for memory corruption.  Its definition
>looks like this:
>
>    static char sig[16] = "SignatureMarker";
>
>which is much more elegant IMO (and easier to write) than
>
>    static char sig[16] = { 'S', 'i', 'g', 'n', 'a', 't', 'u', 'r',
> 'e', 'M', 'a', 'r', 'k', 'e', 'r' };
>
>if you ask me.

No question it is easier to write. Anyone reading your code has to wonder
whether you intended to have a terminating null. You can then add a
comment saying "We don't need no steenking null" or words to that effect,
but I don't think that adds to the elegance.

>  And I can write checks like
>
>    if (memcmp(block, sig, sizeof sig)) { ....
>
>which is preferable (if from no other perspective than performance)
>to
>
>    if (memcmp(block, sig, strlen(sig))) { ...

But if you use a null-terminated sig, you can just write

 if (strcmp(block, sig)) { ...

which is at least as elegant, and the runtime code is at least as fast and
as small as the memcmp you were using before.

Yes, it breaks existing code. Every language feature is disliked by
someone, whether it breaks existing code or not. Removing this
incompatibility with C would mean persuading the majority of the C++
Committee that the amount of broken code (even given the other, much
worse, incompatibilities that definitely will not be removed)
outweighs the advantages of preserving the type system.
---
Steve Clamage, stephen.clamage@eng.sun.com







Author: "Ronald F. Guilmette" <rfg@rahul.net>
Date: 1995/06/11
Raw View
In article <1995Jun8.094259.17052@dcs.warwick.ac.uk>,
Zefram <A.Main@dcs.warwick.ac.uk> wrote:
>Steve Clamage <clamage@Eng.Sun.COM> wrote:
>>There is a third choice:
>>
>>const char digits[] = "01234567890XE";

Speaking of array initialization, could someone please check the draft
and let me know if the following declarations are considered valid?

 char two_dimensional[2][5] = "123456789";

 int two_dim[][] = { { 1, 2, 3 }, { 4, 5, 6 }, {7, 8, 9 } };

If either or both of the above are not valid in C++, quotations from
the draft standard which prove that would be appreciated.
--

-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- E-mail: rfg@segfault.us.com ----------- Purveyors of Compiler Test ----
---- finger: rfg@rahul.net ----------------- Suites and Bullet-Proof Shoes -





Author: maney@MCS.COM (Martin Maney)
Date: 1995/06/09
Raw View
Steve Clamage (clamage@Eng.Sun.COM) wrote:
> In article nqg@Venus.mcs.com, maney@MCS.COM (Martin Maney) writes:
> >Steve Clamage (clamage@Eng.Sun.COM) wrote:
> >> In article 17052@dcs.warwick.ac.uk, A.Main@dcs.warwick.ac.uk (Zefram) writes:
> >> >NUL byte.  In the knowledge that this syntax is available, it seems fairly
> >> >clear that
> >> >
> >> >const char digits[12] = "01234567890XE";
> >> >
> >> >is not intended to have the NUL.
> >
> >> Thank you, thank you, for proving my point. Your literal string does not
> >> have 12 characters, which only reinforces my comment that you should
> >> let the computer do the counting.
> >
> >While I am really in agreement with you (and the draft <g>), Steve, I
> >can't agree that this typo proves your point.  In fact, in this case the
> >compiler would have caught the error iff the literal string initializer
> >worked as (I understand) Zefram would like it to; using digits[] as you
> >suggested would not catch this error.

> The compiler catches that particular typographical error. It does not
> catch the similar typographical error
>  const char digits[12] = "012346789XE";
> which does result in a terminating null, which is claimed to be unwanted.
> (The C and C++ rules are the same for this case.)

Yes, so?  I never said that the behavior that Zefram preferred would catch
all possible errors, I was merely bemused at your statement that a
typographical error that his preferred scheme *would* happen to have
caught demonstrated that he "should let the computer do the counting".

> The posting I commented on also contained the claim that it was "obvious"
> that a terminating null wasn't wanted. I was pointing out that it was
> far from obvious, since a human did the orginal counting, and anyone
> looking at the code has to repeat the counting, and both humans must
> not make any counting errors.

Well, if you intended to say that perhaps you should have said it.  What I
have seen you saying in this thread is "don't worry, who cares if the NUL
is there, use digits[] and let the compiler count your initializer, or use
a different form."  All of which I agree with, but none of which appears
to be reinforced by Zefram's typo in that example.


Lest this thread should die a deserved death quite so soon, I'd like to
partially reverse myself.  The more I think about this, the more it looks
like an incompatibility with C that has little justification - the errors
it avoids seem to me to be pretty uncommon, but maybe I just don't
appreciate how commonly folks write definitions for character arrays with
explicit sizes and use the string initializer form.  Considering how many
more useful (IMO) improvements have been foregone in the interests of C
compatability, this seems a little odd.  Comments?






Author: A.Main@dcs.warwick.ac.uk (Zefram)
Date: 1995/06/08
Raw View
Steve Clamage <clamage@Eng.Sun.COM> wrote:
>There is a third choice:
>
>const char digits[] = "01234567890XE";
>
>In other words, let the computer do the counting. Now there is no
>question about counting correctly or whether the string ends in
>a null. How big is the array? sizeof(digits), including the null.

That type of initialisation looks as though one intends to have the terminating
NUL byte.  In the knowledge that this syntax is available, it seems fairly
clear that

const char digits[12] = "01234567890XE";

is not intended to have the NUL.  I'm drawing a distinction here between an
initialised *string* (char digits[]) and an initialised *fixed-size array*
(char digits[12]).

-zefram





Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/06/08
Raw View
In article 17052@dcs.warwick.ac.uk, A.Main@dcs.warwick.ac.uk (Zefram) writes:
>Steve Clamage <clamage@Eng.Sun.COM> wrote:
>>There is a third choice:
>>
>>const char digits[] = "01234567890XE";
>>
>>In other words, let the computer do the counting. Now there is no
>>question about counting correctly or whether the string ends in
>>a null. How big is the array? sizeof(digits), including the null.
>
>That type of initialisation looks as though one intends to have the terminating
>NUL byte.  In the knowledge that this syntax is available, it seems fairly
>clear that
>
>const char digits[12] = "01234567890XE";
>
>is not intended to have the NUL.

Thank you, thank you, for proving my point. Your literal string does not
have 12 characters, which only reinforces my comment that you should
let the computer do the counting.

Once again: Is it really the case that you cannot allow an extra null in
the array? Don't forget, you can still get an initialization without a null.
We are only discussing the availability of a particular shorthand version
of initialization, a version which violates the type system.

---
Steve Clamage, stephen.clamage@eng.sun.com







Author: maney@MCS.COM (Martin Maney)
Date: 1995/06/08
Raw View
Steve Clamage (clamage@Eng.Sun.COM) wrote:
> In article 17052@dcs.warwick.ac.uk, A.Main@dcs.warwick.ac.uk (Zefram) writes:
> >NUL byte.  In the knowledge that this syntax is available, it seems fairly
> >clear that
> >
> >const char digits[12] = "01234567890XE";
> >
> >is not intended to have the NUL.

> Thank you, thank you, for proving my point. Your literal string does not
> have 12 characters, which only reinforces my comment that you should
> let the computer do the counting.

While I am really in agreement with you (and the draft <g>), Steve, I
can't agree that this typo proves your point.  In fact, in this case the
compiler would have caught the error iff the literal string initializer
worked as (I understand) Zefram would like it to; using digits[] as you
suggested would not catch this error.





Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/06/08
Raw View
In article nqg@Venus.mcs.com, maney@MCS.COM (Martin Maney) writes:
>Steve Clamage (clamage@Eng.Sun.COM) wrote:
>> In article 17052@dcs.warwick.ac.uk, A.Main@dcs.warwick.ac.uk (Zefram) writes:
>> >NUL byte.  In the knowledge that this syntax is available, it seems fairly
>> >clear that
>> >
>> >const char digits[12] = "01234567890XE";
>> >
>> >is not intended to have the NUL.
>
>> Thank you, thank you, for proving my point. Your literal string does not
>> have 12 characters, which only reinforces my comment that you should
>> let the computer do the counting.
>
>While I am really in agreement with you (and the draft <g>), Steve, I
>can't agree that this typo proves your point.  In fact, in this case the
>compiler would have caught the error iff the literal string initializer
>worked as (I understand) Zefram would like it to; using digits[] as you
>suggested would not catch this error.

The compiler catches that particular typographical error. It does not
catch the similar typographical error
 const char digits[12] = "012346789XE";
which does result in a terminating null, which is claimed to be unwanted.
(The C and C++ rules are the same for this case.)

The posting I commented on also contained the claim that it was "obvious"
that a terminating null wasn't wanted. I was pointing out that it was
far from obvious, since a human did the orginal counting, and anyone
looking at the code has to repeat the counting, and both humans must
not make any counting errors.
---
Steve Clamage, stephen.clamage@eng.sun.com







Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/06/03
Raw View
johnw@jove.acs.unt.edu (John Robert Williams) writes:

>After reading followups saying that this feature is dangerous, I wonder
>why the comittee did not do what is traditional for things that are
>usually wrong but not always: make it cause a warning! This seems to me
>like it would satisfy both points of view.

It may be traditional in some arenas, but not in language
specifications. The C++ standard specifies what constitutes a well-
formed program and the meaning of well-formed programs.

Some rule violations require a diagnostic message. The language
standard does not attempt to specify the format of such messages
or how they are emitted. The concept of "warning" is not part of
the standard. A program is well-formed or not, requires diagnostics
or not. Otherwise we get into the business of trying to provide
a standardized grading system (as in schoolwork assignments).

(Implementations may emit any sort of informational messages, helpful
or annoying, that the implementor wishes.)

The case under discussion was C++ disallowing a shorthand notation
for a potentially dangerous and error-prone situation like
 char hello1[5] = "hello"; /* ok in C, but not in C++ */
If you really want a 5-character array with no terminating null
you can still get it:
 char hello2[5] = { 'h', 'e', 'l', 'l', 'o' };
C++ just makes it harder to get one accidently. The judgement (and
it is only a judgement) is that arrays like hello1 are much more
likely to be written in error than on purpose.
--
Steve Clamage, stephen.clamage@eng.sun.com





Author: johnw@jove.acs.unt.edu (John Robert Williams)
Date: 1995/06/03
Raw View
Zefram (A.Main@dcs.warwick.ac.uk) wrote:
> The draft ANSI C++ standard specifically disallows a common usage for the
> initialisation of character arrays (specifically, initialising to a string
> that gets truncated to the correct length).  Can anyone enlighten me as to the
> reason for this change, bearing in mind that it will break code without
> gaining any benefit?

After reading followups saying that this feature is dangerous, I wonder
why the comittee did not do what is traditional for things that are
usually wrong but not always: make it cause a warning! This seems to me
like it would satisfy both points of view.

--
John Williams  <johnw@jove.acs.unt.edu>
This is your fortune.





Author: A.Main@dcs.warwick.ac.uk (Zefram)
Date: 1995/06/03
Raw View
Marc Shepherd <shepherd@debussy.sbi.com> wrote:
>We're talking about probabilities here.  99% of the time, when someone
>writes code like this:
>
> const char foo[5] = "Hello";
>
>the INTENT is to initialize a string, even though the programmer has
>failed to allow enough space for the terminating NUL.

Surely if one wants a string it would be better to write

char foo[] = "Hello";

(Or, if as above you want it to be const,

char *foo = "Hello";

.)

>The poster observed that, on rare occasions, someone might really
>WANT an array of five characters with no null terminator.  But that's
>the problem--it's rare--and I concur with the standards-writers that
>the language should not support an unsafe feature because, 1% of the
>time, it's what someone really wants.

I suspect that the [] usage is more common when one actually wants a modifiable
string.  Actually specifying the dimensions of the array indicates that one
wants an array of the specified number of characters, rather than a string.

[On case fall-through being the default behaviour]

That's an interesting point.  I think it comes under the heading of "no hidden
costs", though only to a minor extent.  If one were redesigning C from scratch,
or designing C++ without starting from C, it would make perfect sense to have
an implicit break by default.  Of course, we would then need to have a better
way to specify multiple values for the same case than the present ugly multiple
case labels.  One would also have to be careful to ensure that gotos are
possible between cases, for those occasions where one does want fall-through.

That's a rather different matter from character array initialisation, though,
in that an equally neat (or IMO neater) syntax is available.  In the case of
character arrays, there is really no neater way to initialise than by using the
string syntax.

>The best way to initialize such arrays is like this:
>
> const char digits[] = "0123456789XE";
>
>When you leave out the subscript, the compiler figures it out on your
>behalf, eliminating any chance of error.  In this case, you'd get an
>array of 13 characters (including the terminating NUL).  Even though
>the NUL is not needed in your duodecimal conversion routine, it doesn't
>appear to do any harm, either.

That's ugly, though.  Stylistically, I'd rather tell the compiler that I want
an array of 12 characters (which is what I want) than that I want an array of
whatever size is needed to contain this string.  Also, the 13th element of the
array is wasting space.  Not much space, but it's terribly inelegant, and might
make one wonder later whether the array really needs to have 13 elements or
not.

If one were starting from scratch, it might be possible to have the present C++
rule, if there were also a syntax to represent a string without a terminating
NUL.  For example:

const char digits[12] = "0123456789XE"0;

However, we are not starting from scratch, and the present C++ rule will break
existing code.

-zefram





Author: shepherd@debussy.sbi.com (Marc Shepherd)
Date: 1995/06/02
Raw View
In article 15465@dcs.warwick.ac.uk, A.Main@dcs.warwick.ac.uk (Zefram) writes:
>The draft ANSI C++ standard specifically disallows a common usage for the
>initialisation of character arrays (specifically, initialising to a string
>that gets truncated to the correct length).  Can anyone enlighten me as to the
>reason for this change, bearing in mind that it will break code without
>gaining any benefit?

We're talking about probabilities here.  99% of the time, when someone
writes code like this:

 const char foo[5] = "Hello";

the INTENT is to initialize a string, even though the programmer has
failed to allow enough space for the terminating NUL.  Traditional C
compilers accepted such declarations, and the programmer would get a
core dump (or worse) the first time the string was used.

The poster observed that, on rare occasions, someone might really
WANT an array of five characters with no null terminator.  But that's
the problem--it's rare--and I concur with the standards-writers that
the language should not support an unsafe feature because, 1% of the
time, it's what someone really wants.

(A diversion: in the recent book "Expert C Programming - Deep C Secrets,"
the author mentioned that he did a survey of all the 'switch' statements
in the SunOS operating system.  Hundreds of such statements were found.
In all the switch statements, only 3% of the case labels fell through
to the next label.  97% of the cases were terminated with a break,
a return, an exit(), or the like.  The author opined, and I agree, that
any language feature that provides the wrong default behavior 97% of
the time is a wrongly-designed feature.  C's policy of truncating a
statically-initialized character array--at the expense of the terminating
NUL--is a similarly foolish feature, and I endorse the policy of fixing
it.)

>
>Finally, the comment about `poor coding style' seems odd.  Suppose I'm writing
>a function that needs to convert numbers to, let's say, duodecimal, using `X'
>to represent ten and `E' to represent eleven.  (This notation has been widely
>suggested in the past for a duodecimal system.)  My lookup table for the
>digits is going to be a character array.  I could initialise it in one of two
>reasonable ways:
>
>const char digits[12] = "0123456789XE";
>
>const char digits[12] = {'0','1','2','3','4','5','6','7','8','9','X','E'};
>
>I think most people would agree that the first is more readable than the
>second.
>

The best way to initialize such arrays is like this:

 const char digits[] = "0123456789XE";

When you leave out the subscript, the compiler figures it out on your
behalf, eliminating any chance of error.  In this case, you'd get an
array of 13 characters (including the terminating NUL).  Even though
the NUL is not needed in your duodecimal conversion routine, it doesn't
appear to do any harm, either.


---
Marc Shepherd
Salomon Brothers Inc
shepherd@schubert.sbi.com The opinions I express are no one's but mine!






Author: A.Main@dcs.warwick.ac.uk (Zefram)
Date: 1995/05/31
Raw View
The draft ANSI C++ standard specifically disallows a common usage for the
initialisation of character arrays (specifically, initialising to a string
that gets truncated to the correct length).  Can anyone enlighten me as to the
reason for this change, bearing in mind that it will break code without
gaining any benefit?

For reference, the "changes from C" appendix says:

   Change:
          In C++, when initializing an array of character with a
   string, the number of characters in the string (including
   the terminating '\0') must not exceed the number of
   elements in the array.  In C, an array can be initialised
   with a string even if the array is not large enough to
   contain the string terminating '\0'.
   ...
   Rationale:
   When these non-terminated arrays are manipulated by
   standard string routines, there is potential for a major
   catastrophe.
   ...
   How widely used:
   Seldom.  This style of array initialization is seen as
   poor coding style.

Personally, I find that explanation inadequate.  Arrays initialised this way
are generally not being treated as normal strings, but as character arrays.
(It can easily be more convenient to initialise using a string than by using a
list of characters, which might take up four times as much space in the
source.)  Hence, the comment about the use of standard string routines is
irrelevant, as these string routines would normally never be used on such an
array.  (And initialising it with a list of characters makes it no safer with
respect to the string functions.)

Finally, the comment about `poor coding style' seems odd.  Suppose I'm writing
a function that needs to convert numbers to, let's say, duodecimal, using `X'
to represent ten and `E' to represent eleven.  (This notation has been widely
suggested in the past for a duodecimal system.)  My lookup table for the
digits is going to be a character array.  I could initialise it in one of two
reasonable ways:

const char digits[12] = "0123456789XE";

const char digits[12] = {'0','1','2','3','4','5','6','7','8','9','X','E'};

I think most people would agree that the first is more readable than the
second.

Comments?  Is there any chance of this being fixed in the final standard?

-zefram





Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/05/31
Raw View
A.Main@dcs.warwick.ac.uk (Zefram) writes:

>The draft ANSI C++ standard specifically disallows a common usage for the
>initialisation of character arrays (specifically, initialising to a string
>that gets truncated to the correct length).  Can anyone enlighten me as to the
>reason for this change, bearing in mind that it will break code without
>gaining any benefit?

>For reference, the "changes from C" appendix says:

>   Change:
>          In C++, when initializing an array of character with a
>   string, the number of characters in the string (including
>   the terminating '\0') must not exceed the number of
>   elements in the array.  In C, an array can be initialised
>   with a string even if the array is not large enough to
>   contain the string terminating '\0'.
>   ...
>   Rationale:
>   When these non-terminated arrays are manipulated by
>   standard string routines, there is potential for a major
>   catastrophe.
>   ...
>   How widely used:
>   Seldom.  This style of array initialization is seen as
>   poor coding style.

>Personally, I find that explanation inadequate.

Well, that's the explanation. Not everyone likes the reasons behind
every rule.


>Arrays initialised this way
>are generally not being treated as normal strings, but as character arrays.
>(It can easily be more convenient to initialise using a string than by using a
>list of characters, which might take up four times as much space in the
>source.)

The question is how common is it to require exactly that many characters
and not be able to allow room for a trailing null. Evidently you feel
this is so common as to outweigh the uncertain semantics of what C
allows for char arrays (and only for char arrays). Consider this:
 char hello[5] = "hello";
In C, this is well-formed, whether or not you intended to have a
trailing null, Is that a good idea?


>Finally, the comment about `poor coding style' seems odd.  Suppose I'm writing
>a function that needs to convert numbers to, let's say, duodecimal, using `X'
>to represent ten and `E' to represent eleven.  (This notation has been widely
>suggested in the past for a duodecimal system.)  My lookup table for the
>digits is going to be a character array.  I could initialise it in one of two
>reasonable ways:

>const char digits[12] = "0123456789XE";

>const char digits[12] = {'0','1','2','3','4','5','6','7','8','9','X','E'};

>I think most people would agree that the first is more readable than the
>second.

Maybe. The first leaves me with the question of whether you intended
for the string to end with a null. I would have to examine every use
of "digits" in the program to determine whether you miscounted or used
a size of 12 intentionally.

There is a third choice:

const char digits[] = "01234567890XE";

In other words, let the computer do the counting. Now there is no
question about counting correctly or whether the string ends in
a null. How big is the array? sizeof(digits), including the null.

Is it so terrible to waste a char for the extra null? (OK, on many
systems you actually waste a whole 4-byte word due to alignment of
adjoining data. How many of these are there in the program? What is
the total amount of wasted space?)

--
Steve Clamage, stephen.clamage@eng.sun.com





Author: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 1995/05/31
Raw View
In article <1995May31.002154.15465@dcs.warwick.ac.uk>
A.Main@dcs.warwick.ac.uk (Zefram) writes:

|> The draft ANSI C++ standard specifically disallows a common usage for the
|> initialisation of character arrays (specifically, initialising to a string
|> that gets truncated to the correct length).  Can anyone enlighten me as to the
|> reason for this change, bearing in mind that it will break code without
|> gaining any benefit?

|> For reference, the "changes from C" appendix says:

|>    Change:
|>           In C++, when initializing an array of character with a
|>    string, the number of characters in the string (including
|>    the terminating '\0') must not exceed the number of
|>    elements in the array.  In C, an array can be initialised
|>    with a string even if the array is not large enough to
|>    contain the string terminating '\0'.
|>    ...
|>    Rationale:
|>    When these non-terminated arrays are manipulated by
|>    standard string routines, there is potential for a major
|>    catastrophe.
|>    ...
|>    How widely used:
|>    Seldom.  This style of array initialization is seen as
|>    poor coding style.

|> Personally, I find that explanation inadequate.  Arrays initialised this way
|> are generally not being treated as normal strings, but as character arrays.
|> (It can easily be more convenient to initialise using a string than by using a
|> list of characters, which might take up four times as much space in the
|> source.)  Hence, the comment about the use of standard string routines is
|> irrelevant, as these string routines would normally never be used on such an
|> array.  (And initialising it with a list of characters makes it no safer with
|> respect to the string functions.)

|> Finally, the comment about `poor coding style' seems odd.  Suppose I'm writing
|> a function that needs to convert numbers to, let's say, duodecimal, using `X'
|> to represent ten and `E' to represent eleven.  (This notation has been widely
|> suggested in the past for a duodecimal system.)  My lookup table for the
|> digits is going to be a character array.  I could initialise it in one of two
|> reasonable ways:

|> const char digits[12] = "0123456789XE";

|> const char digits[12] = {'0','1','2','3','4','5','6','7','8','9','X','E'};

|> I think most people would agree that the first is more readable than the
|> second.

|> Comments?  Is there any chance of this being fixed in the final standard?

In most cases, having to count the characters is poor style, and a
frequent case of errors.  In fact, even though I don't need the
trailing '\0' in the above example, I would probably just write:

 char const digits[] = "0123456789XE" ;

This still works in C++:-).
--
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                              -- Beratung in industrieller Datenverarbeitung