Thread

Topic: signess of plain char

Author: jsa@edg.com (J. Stephen Adamczyk)
Date: 1997/01/28 Raw View

In article <E4oAny.Hqs@research.att.com> ark@research.att.com
(Andrew Koenig) writes:
>A quick check of the C standard reveals that isalpha is required
>to return a well-defined value for any int you might possibly give
>it as an argument.

Um, I don't think so.

ANSI/ISO C 7.3: "The header <ctype.h> declares several functions useful
for testing and mapping characters.  In all cases the argument is an int,
the value of which shall be representable as an unsigned char or shall equal
the value of the macro EOF.  If the argument has any other value, the
behavior is undefined."

I am indebted to P.J. Plauger's excellent book "The Standard C Library" for
the tip that for maximum portability in using the character classification
functions, the right approach is to cast the argument to unsigned char.
If EOF is a possibility, one must test for it separately before calling
the character classification function.

I've gotten to the point where a call to isalpha only looks right to me
in the form "isalpha((unsigned char)c)"; a simple "isalpha(c)" looks
suspect.

Steve Adamczyk
Edison Design Group
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: d96-mst@nada.kth.se (Mikael St ldal)
Date: 1997/01/29 Raw View

In article <E4oAny.Hqs@research.att.com>,
ark@research.att.com (Andrew Koenig) wrote:
>> >Sort of like, if you naively write something stupid like:
>
>> >    if ( isalpha( c ) ) ...
>
>> >where "c" has type character.
>
>A quick check of the C standard reveals that isalpha is required
>to return a well-defined value for any int you might possibly give
>it as an argument.

Yes, but storing characters in ints makes no sense in C++ so the
character classification functions in <cctype> should be overloaded for
type char.
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: Matt Austern <austern@isolde.mti.sgi.com>
Date: 1997/01/29 Raw View

d96-mst@nada.kth.se (Mikael St   ldal) writes:

> Yes, but storing characters in ints makes no sense in C++ so the
> character classification functions in <cctype> should be overloaded for
> type char.

In fact, the C standard library requires storing characters in ints.
This is described in detail in Kernighan and Ritchie.  (See section
2.7 of the first edition.  I don't know the section number in the
second edition.)

The point is that getchar() must be able to return every possible
valid character, and also must be able to signal end-of-file.  The C
standard library does this by allowing getchar() to return a value
EOF, where EOF is not a valid character value.  This means that
getchar()'s return type can't, in general, be char, but must be a type
that can store every value that a char can represent, and also one
additional value that a char can't represent.

The C++ standard library has other methods of indicating end-of-file,
but to some extent it has retained this mechanism from C.  If you look
at char_traits (section 21.1 of the draft standard), you'll see that
C++ retains the idea of a special eof value.  The mechanism is quite
general, and also rather complicated, but the basic idea is still the
same as in C: for every character type X there exists a special value
E of some *other* type Y, with the property that, for every x of type
X, Y(x) != E.

The bottom line: storing characters in ints is common in C, and it's
not going to go away in C++ either.

[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: James Kanze <james-albert.kanze@vx.cit.alcatel.fr>
Date: 1997/01/30 Raw View

jsa@edg.com (J. Stephen Adamczyk) writes:

|>  I am indebted to P.J. Plauger's excellent book "The Standard C Library" for
|>  the tip that for maximum portability in using the character classification
|>  functions, the right approach is to cast the argument to unsigned char.
|>  If EOF is a possibility, one must test for it separately before calling
|>  the character classification function.

Plauger's book is excellent, but many of us over here learned this
before reading it.  We kept getting wrong results if we didn't:-).  Of
course, as long as your input just consists of US ASCII...

|>  I've gotten to the point where a call to isalpha only looks right to me
|>  in the form "isalpha((unsigned char)c)"; a simple "isalpha(c)" looks
|>  suspect.

Yes and no.  The isxxx functions/macros are defined over the legal
return values of getc/getchar.  So if you store these values in the
correct return type (an int), you can use it without the cast.  The cast
is principally necessary when you test for EOF somewhere immediately on
input, and then store the characters in an array of char.

In practice, of course, the latter scenario is about the only one I've
every really seen in code, so using the functions without the cast is,
as you say, almost certainly a sign of an error.  I'm willing to bet,
however, that it is one of the most frequent errors (although I have
seen one application that rigorously used:

    #define isAlpha( c )   (isalpha( (unsigned char)( c ) ) != 0)

Also fixed the problem that the results are not guaranteed to be a legal
boolean value.)

Anyway, I've pretty much decided that once all of my platforms support
it, I will exclusively use the versions in <locale>.  (I presume that it
is only for reasons of compatibility with <cctype> that these functions
do not have the global locale as a default.)

--
James Kanze      home:     kanze@gabi-soft.fr        +33 (0)1 39 55 85 62
                 office:   kanze@vx.cit.alcatel.fr   +33 (0)1 69 63 14 54
GABI Software, Sarl., 22 rue Jacques-Lemercier, F-78000 Versailles France
     -- Conseils en informatique industrielle --
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: Michael Hudson <sorry.no.email@nowhere.com>
Date: 1997/01/30 Raw View

Matt Austern wrote:
>
> In fact, the C standard library requires storing characters in ints.
> This is described in detail in Kernighan and Ritchie.  (See section
> 2.7 of the first edition.  I don't know the section number in the
> second edition.)
>
> The point is that getchar() must be able to return every possible
> valid character, and also must be able to signal end-of-file.  The C
> standard library does this by allowing getchar() to return a value
> EOF, where EOF is not a valid character value.  This means that
> getchar()'s return type can't, in general, be char, but must be a type
> that can store every value that a char can represent, and also one
> additional value that a char can't represent.
>

I understand that I will often receive chars in ints, but I don't store
them in int arrays for two reasons.

One is space. Why should I store strings in an array that takes up twice
(or four times if we're talking four-byte ints) the memory it needs?

The other is (or are) the functions declared in string.h (rather
cstring). These take 'pointer-to-chars' and passing 'array-of-int's will
generate very far from the expected results.

Dealing with wchar_t's is a whole other kettle of fish of course.

--
Regards,
    Michael Hudson

Please don't email this address - it's not mine.
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: ark@research.att.com (Andrew Koenig)
Date: 1997/01/27 Raw View

In article <ynP5yEpfh6eH092yn@nada.kth.se> d96-mst@nada.kth.se (Mikael St   ldal) writes:

> James Kanze <james-albert.kanze@vx.cit.alcatel.fr> wrote:
> >Sort of like, if you naively write something stupid like:

> >    if ( isalpha( c ) ) ...

> >where "c" has type character.

> >I think that Steve's use of "naively" suggests that this is something
> >that will only happen to beginners.  I also think that the above example
> >pretty much shows that it is a trap which is incredibly easy to fall
> >into, even if you aren't a beginner.  On most implementations using
> >signed plain char's, the above will result in undefined behavior.

> That is a VERY good reson for making plain chars unsigned by default.
> Or overload all <cctype> functions for char.

A quick check of the C standard reveals that isalpha is required
to return a well-defined value for any int you might possibly give
it as an argument.

C++ inherits the behavior of isalpha from C.
--
    --Andrew Koenig
      ark@research.att.com
      http://www.research.att.com/info/ark
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: James Kanze <james-albert.kanze@vx.cit.alcatel.fr>
Date: 1997/01/28 Raw View

ark@research.att.com (Andrew Koenig) writes:

|>  In article <ynP5yEpfh6eH092yn@nada.kth.se> d96-mst@nada.kth.se
|>  (Mikael Steldal) writes:
|>
|>  > James Kanze <james-albert.kanze@vx.cit.alcatel.fr> wrote:
|>  > >Sort of like, if you naively write something stupid like:
|>
|>  > >    if ( isalpha( c ) ) ...
|>
|>  > >where "c" has type character.
|>
|>  > >I think that Steve's use of "naively" suggests that this is something
|>  > >that will only happen to beginners.  I also think that the above example

|>  > >pretty much shows that it is a trap which is incredibly easy to fall
|>  > >into, even if you aren't a beginner.  On most implementations using
|>  > >signed plain char's, the above will result in undefined behavior.
|>
|>  > That is a VERY good reson for making plain chars unsigned by default.
|>  > Or overload all <cctype> functions for char.
|>
|>  A quick check of the C standard reveals that isalpha is required
|>  to return a well-defined value for any int you might possibly give
|>  it as an argument.

Is this a correction, or a change from the original?  I don't have my
copy here to verify, but I'm 99.9% sure that the original C standard
specified undefined behavior except for EOF and the range
[0..UCHAR_MAX].  This is also what the implementations I'm familiar with
(HP, Sun and the one in Plauger's "The Standard C Library") do (and they
all claim standards conformance).  Passing a value outside of this range
results in accessing an array out of bounds.

The other question would be: what is the well-defined value for a
negative argument?

|>  C++ inherits the behavior of isalpha from C.

For the isalpha in <cctype>, of course.  The one in <locale> is new, and
IMHO, is the only one a C++ programmer should use (once it appears in
his compiler, of course).

--
James Kanze      home:     kanze@gabi-soft.fr        +33 (0)1 39 55 85 62
                 office:   kanze@vx.cit.alcatel.fr   +33 (0)1 69 63 14 54
GABI Software, Sarl., 22 rue Jacques-Lemercier, F-78000 Versailles France
     -- Conseils en informatique industrielle --
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: stephen.clamage@Eng.Sun.COM (Steve Clamage)
Date: 1997/01/28 Raw View

In article Hqs@research.att.com, ark@research.att.com (Andrew Koenig) writes:
>In article <ynP5yEpfh6eH092yn@nada.kth.se> d96-mst@nada.kth.se (Mikael St   ldal) writes:
>
>> James Kanze <james-albert.kanze@vx.cit.alcatel.fr> wrote:
>> >Sort of like, if you naively write something stupid like:
>
>> >    if ( isalpha( c ) ) ...
>
>> >where "c" has type character.
>
>> >I think that Steve's use of "naively" suggests that this is something
>> >that will only happen to beginners.  I also think that the above example
>> >pretty much shows that it is a trap which is incredibly easy to fall
>> >into, even if you aren't a beginner.  On most implementations using
>> >signed plain char's, the above will result in undefined behavior.
>
>> That is a VERY good reson for making plain chars unsigned by default.
>> Or overload all <cctype> functions for char.
>
>A quick check of the C standard reveals that isalpha is required
>to return a well-defined value for any int you might possibly give
>it as an argument.
>
>C++ inherits the behavior of isalpha from C.

My copy of the ISO C standard says in 7.3:

"In all cases the argument is an int, the value of which shall be
repesentable as an unsigned char or shall equal the value of the
macro EOF. If the argument has any other value, the behavior
is undefined."

---
Steve Clamage, stephen.clamage@eng.sun.com
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: d96-mst@nada.kth.se (Mikael St ldal)
Date: 1997/01/22 Raw View

In article <rf54tgc8umh.fsf@vx.cit.alcatel.fr>,
James Kanze <james-albert.kanze@vx.cit.alcatel.fr> wrote:
>Sort of like, if you naively write something stupid like:
>
>    if ( isalpha( c ) ) ...
>
>where "c" has type character.
>
>I think that Steve's use of "naively" suggests that this is something
>that will only happen to beginners.  I also think that the above example
>pretty much shows that it is a trap which is incredibly easy to fall
>into, even if you aren't a beginner.  On most implementations using
>signed plain char's, the above will result in undefined behavior.

That is a VERY good reson for making plain chars unsigned by default.
Or overload all <cctype> functions for char.

How can you accomplish that task in a conforming manner now?


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: Marcelo Cantos <marcelo@mds.rmit.edu.au>
Date: 1997/01/22 Raw View

James Kanze wrote:
>
> stephen.clamage@eng.sun.com (Steve Clamage) writes:
> |>  I believe in practice that you only get into trouble if you
> |>  naively promote a char to int and depend on the sign of the result.
>
> Sort of like, if you naively write something stupid like:
>
>     if ( isalpha( c ) ) ...
>
> where "c" has type character.

Actually the problem could be even more subtle:

    if ( isalpha( *s ) )

I habitually use int's when dealing with char's so the problem never
really surfaces.  But when dealing with char *'s you have no choice
but to promote and the issue *must* be addressed!


Marcelo Cantos
__________________________________________________
Multimedia Database Systems Group, RMIT, Australia
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: James Kanze <james-albert.kanze@vx.cit.alcatel.fr>
Date: 1997/01/23 Raw View

d96-mst@nada.kth.se (Mikael St   ldal) writes:

|>  In article <rf54tgc8umh.fsf@vx.cit.alcatel.fr>,
|>  James Kanze <james-albert.kanze@vx.cit.alcatel.fr> wrote:
|>  >Sort of like, if you naively write something stupid like:
|>  >
|>  >    if ( isalpha( c ) ) ...
|>  >
|>  >where "c" has type character.
|>  >
|>  >I think that Steve's use of "naively" suggests that this is something
|>  >that will only happen to beginners.  I also think that the above example
|>  >pretty much shows that it is a trap which is incredibly easy to fall
|>  >into, even if you aren't a beginner.  On most implementations using
|>  >signed plain char's, the above will result in undefined behavior.
|>
|>  That is a VERY good reson for making plain chars unsigned by default.
|>  Or overload all <cctype> functions for char.
|>
|>  How can you accomplish that task in a conforming manner now?

Use <locale>, and not <cctype>.  Presumably, the isalpha, etc. functions
in <locale> are defined over all possible values of charT.

Note that a good implementation could make <cctype> work even with 8 bit
signed characters.  This would entail defining EOF as -129, and
accepting character values in the range of -128...-1 (as well as
128...255).  According to the standard, accessing one of the ctype
functions with a value other than 0...UCHAR_MAX or EOF is undefined
behavior, so an implementation is free to do something intelligent.

The problem here is that there are probably more (broken) programs
assuming EOF == -1 than there are assuming plain char is signed.  So the
same motivations for making plain char signed (on a new architecture)
tend to force EOF to -1.

For a real hack, that would, however, work in the real world, and not
expose existing programs as broken: leave EOF as -1, define isalpha, and
etc. over the range -128...255, with -1 acting as EOF, rather than 0xff,
but all of the other negative values "as if" they were positive.  In ISO
8859-1, 0xff corresponds to a y with two dots over it.  Testing this
character as a plain char would give wrong results (because it would be
indistinguishable from EOF), but since this character isn't used in any
known language, the "bug" is probably tolerable (especially if
documented).

Note that this hack would be a minor change in the existing library,
would have no effect on correct programs (since it "defines" undefined
behavior), and in fact, would have no effect on any program that seems
to work now.  So I expect to see it in the next release of all
compilers:-).

--
James Kanze      home:     kanze@gabi-soft.fr        +33 (0)1 39 55 85 62
                 office:   kanze@vx.cit.alcatel.fr   +33 (0)1 69 63 14 54
GABI Software, Sarl., 22 rue Jacques-Lemercier, F-78000 Versailles France
     -- Conseils en informatique industrielle --


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: James Kanze <james-albert.kanze@vx.cit.alcatel.fr>
Date: 1997/01/20 Raw View

stephen.clamage@eng.sun.com (Steve Clamage) writes:

|>  d96-mst@nada.kth.se (Mikael Steldal) writes:
|>
|>  >In article <199701080252.SAA11693@cornerstone.Eng.Sun.COM>,
|>  >clamage@sabretooth-142.Eng.Sun.COM (Steve Clamage) wrote:
|>  >>If C++ mandated a signedness or implementation for type char, it would
|>  >>break C compatibility. In addition, it would require an inefficient
|>  >>implementation of char on some systems.
|>
|>  >But does (will) the standard allow valid printable characters in the
|>  >execution character set to have negative values in plain char?
|>
|>  Yes, but portable code can't make any assumptions. The draft says
|>  "The value of a character literal is implementation-defined if it
|>  falls outside of the implementation-defined range defined for char
|>  (for ordinary literals) or wchar_t (for wide literals)."
|>
|>  If we have 8-bit bytes and chars, and the MSB of the encoding of a
|>  member of the execution character set is not zero, the value of
|>  a char holding that character is implementation-defined.
|>
|>  I believe in practice that you only get into trouble if you
|>  naively promote a char to int and depend on the sign of the result.

Sort of like, if you naively write something stupid like:

    if ( isalpha( c ) ) ...

where "c" has type character.

I think that Steve's use of "naively" suggests that this is something
that will only happen to beginners.  I also think that the above example
pretty much shows that it is a trap which is incredibly easy to fall
into, even if you aren't a beginner.  On most implementations using
signed plain char's, the above will result in undefined behavior.
Which, in this case, is a way of saying that it will pass all of your
test cases beautifully, but will fail in strange ways at the customer
site.  (I don't particularly consider myself a "naive beginner", and I
think that I am probably more sensitivized than most people here to the
problem, given that I actually use characters with the 8th bit set on a
daily basis.  But I still have to regularly double check that I've not
done this.)

--
James Kanze      home:     kanze@gabi-soft.fr        +33 (0)1 39 55 85 62
                 office:   kanze@vx.cit.alcatel.fr   +33 (0)1 69 63 14 54
GABI Software, Sarl., 22 rue Jacques-Lemercier, F-78000 Versailles France
     -- Conseils en informatique industrielle --


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: James Kanze <james-albert.kanze@vx.cit.alcatel.fr>
Date: 1997/01/15 Raw View

d96-mst@nada.kth.se (Mikael St   ldal) writes:

|>  The signess of plain char is not defined in the standard. What are the
|>  resons for not mandating it to be unsigned? I think that most
|>  implementations currently have it signed, but only for historical
|>  resons, it would make more sense to have it unsigned. Can there be any
|>  sensible resons, except historical and backwards compatibility, for
|>  having char signed?

See Steve Clamage's response.

|>  For example Borland C++ use to have plain char signed by default, but
|>  char is 8-bit and there are more than 128 characters in the execution
|>  character set so some of them have negative values. What about
|>  mandating that all characters in both source and execution charsets
|>  have to be non-negative, effectivly mandating char to be unsigned on
|>  many platforms.

This is one of my pet peaves, too.  I can accept that to write a program
for several languages (my definition of internationalization) requires
special effort, but I cannot even write it for the local language (which
on my platforms, uses the execution character set ISO 8859-1) without
jumping through hoops.

As far as I can tell, it was the ISO standardization committee which
introduced the distinction that only the required characters had to be
positive.  In K&R(I), there is no such distinction, and if memory serves
me correctly (my books are packed up due to moving), there is even some
mention that an implementation having character codes greater than 127
(EBCDIC was mentioned, I think) would have to treat char as unsigned if
it only had 8 bits.

Regretfully, most later implementers decided to maintain compatibility
with the PDP-11/VAX implementations, using 8 bit signed char's and 7 bit
ASCII.  Then, when they shifted to ISO 8859-1, they only changed the
character set, and nothing else in the implementation.

It's probably worth noting, as well, that for most implementors,
internationalization means 8859-1 for the Americas and Western Europe,
some special hacks for Japan, and a requirement to learn English for the
rest of the world.  (In fairness, of course, it should also be pointed
out that this IS most of the market, and that newer systems do tend
toward Unicode, although the main reason may be simply to be able to
support Japanese from the standard system, without extra hacks.)

--
James Kanze      home:     kanze@gabi-soft.fr        +33 (0)1 39 55 85 62
                 office:   kanze@vx.cit.alcatel.fr   +33 (0)1 69 63 14 54
GABI Software, Sarl., 22 rue Jacques-Lemercier, F-78000 Versailles France
     -- Conseils en informatique industrielle --


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: David R Tribble <david.tribble@central.beasys.com>
Date: 1997/01/13 Raw View

> But does (will) the standard allow valid printable characters in the
> execution character set to have negative values in plain char?

Yes, it already does.  Whether a compiler chooses to implement 'plain char'
as implicitly 'signed char' or 'unsigned char' is implementation-defined.
(AIX compilers, for example, use unsigned char.)  If your execution character
set is 8-bit Latin-1 ASCII (ISO-8859-1) (which is the standard character
set adopted by HTML, for example), and your compiler chooses 'signed char'
for plain char, then, yes, you will have printable characters that have
negative values.

One thing you might check on your favorite compiler is whether isprint()
et al handle negative 'printable' values correctly.

    isprint('\xA9')     - Should be true, if Latin-1 char set is used.
    isprint(-1)         - Should be false, if EOF == -1.
    isprint('\xFF')     - Should be true, if Latin-1 is used.

-- David R. Tribble, david.tribble@central.beasys.com --
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: d96-mst@nada.kth.se (Mikael St ldal)
Date: 1997/01/12 Raw View

In article <199701080252.SAA11693@cornerstone.Eng.Sun.COM>,
clamage@sabretooth-142.Eng.Sun.COM (Steve Clamage) wrote:
>If C++ mandated a signedness or implementation for type char, it would
>break C compatibility. In addition, it would require an inefficient
>implementation of char on some systems.

But does (will) the standard allow valid printable characters in the
execution character set to have negative values in plain char?


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: clamage@sabretooth-142.Eng.Sun.COM (Steve Clamage)
Date: 1997/01/07 Raw View

In article NRm0yEpfhaZF090yn@nada.kth.se, d96-mst@nada.kth.se (Mikael St   ldal) writes:
>The signess of plain char is not defined in the standard. What are the
>resons for not mandating it to be unsigned?

>From the beginning in C, the signedness of 'char' was left up to the
implementation. The reason was that extending a char to an int ought
to be an efficient operation, and computers varied (and still vary)
in whether unsigned or signed extension was more efficient. On some
machines it makes no difference. On others, the "wrong" kind of
extension takes 3 instructions. When you consider that C extends
chars to ints all over the place, particularly for most standard library
functions, that is an important consideration.

Although some implementations of C allowed it earlier, Standard C
allows you to specify a character type as specifically signed or
unsigned.

Digression: IMHO, the whole thing was ill-considered. A 'char' ought to
be a character in the character set, and not a "tiny integer". If you want
the language to have a "tiny integer" or a "byte" type, 'char' should
not be overloaded for those purposes. If that principle had been adopted
in C (as it was in Pascal 8 years before K&R1), we would not need to
have these interminable discussions about behavior and implementation
of type char. End of digression.

If C++ mandated a signedness or implementation for type char, it would
break C compatibility. In addition, it would require an inefficient
implementation of char on some systems. IMHO, this is one of many
features of C++ that must remain suboptimal (or broken) in C++
because compatibility is considered more important than abstract
(or concrete) notions of good language design.

---
Steve Clamage, stephen.clamage@eng.sun.com
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]