Topic: char signedness
Author: richard.herring@baesystems.com
Date: Thu, 7 Feb 2002 17:51:36 GMT Raw View
In message <d6651fb6.0201290500.68badf46@posting.google.com>, James
Kanze <kanze@gabi-soft.de> writes
>Martin von Loewis <loewis@informatik.hu-berlin.de> wrote in message
>news:<j4ofjgriu6.fsf@informatik.hu-berlin.de>...
>> Michiel.Salters@cmg.nl (Michiel Salters) writes:
>
>> > I'm not that familiar with ISO 8859-1, but '? probably is the
>> > Dutch vowel commonly written as 'ij' ('? isn't on most keyboards
>> > ).
>
>> I don't think this is true. Instead, I'd rather assume that this is
>> meant to be used with Welsh, see
>
>> http://www.ask-group.co.uk/publish/welshalph.htm
>
I don't understand that page - it shows acute and grave accents, which
AFAIK Welsh doesn't use at all. Somehow I don't think you can take it as
an authoritative alphabet.
--
Richard Herring
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 29 Jan 2002 14:32:47 CST Raw View
Martin von Loewis <loewis@informatik.hu-berlin.de> wrote in message
news:<j4ofjgriu6.fsf@informatik.hu-berlin.de>...
> Michiel.Salters@cmg.nl (Michiel Salters) writes:
> > I'm not that familiar with ISO 8859-1, but '? probably is the
> > Dutch vowel commonly written as 'ij' ('? isn't on most keyboards
> > ).
> I don't think this is true. Instead, I'd rather assume that this is
> meant to be used with Welsh, see
> http://www.ask-group.co.uk/publish/welshalph.htm
I'm not sure about Welsh; I'm not even sure if ISO 8859-1 is intended
to support Welsh. The letter is used in French, however, although
only in a few proper names.
> > And that's a very common vowel; it's close to impossible to write
> > a paragraph of Dutch text not using 'ij'.
> In Unicode, this ligature has a different character, U+0132 (LATIN
> CAPITAL LIGATURE IJ), which is clearly distinct from U+00DD (LATIN
> CAPITAL LETTER Y WITH ACUTE); likewise for the small letters.
Right. It occasionally occurs that people do use for the Dutch ij
ligature, because this character is also missing in ISO 8859-1. Not
being Dutch, I would hesitate to pronounce on the correctness of this;
if it really is a ligature, it would normally be preferrable to use
the two letters instead of a single letter which only vaguely looks
like the ligature.
--
James Kanze mailto:kanze@gabi-soft.de
Beratung in objektorientierer Datenverarbeitung --
-- Conseils en informatique orient e objet
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany, T l.: +49 (0)69 19 86 27
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Geert-Jan Giezeman" <geert@cs.uu.nl>
Date: Wed, 30 Jan 2002 11:08:52 CST Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0201290500.68badf46@posting.google.com...
> Martin von Loewis <loewis@informatik.hu-berlin.de> wrote in message
> news:<j4ofjgriu6.fsf@informatik.hu-berlin.de>...
> > Michiel.Salters@cmg.nl (Michiel Salters) writes:
>
> > > I'm not that familiar with ISO 8859-1, but '? probably is the
> > > Dutch vowel commonly written as 'ij' ('? isn't on most keyboards
> > > ).
>
> > I don't think this is true. Instead, I'd rather assume that this is
> > meant to be used with Welsh, see
>
> > http://www.ask-group.co.uk/publish/welshalph.htm
>
> I'm not sure about Welsh; I'm not even sure if ISO 8859-1 is intended
> to support Welsh. The letter is used in French, however, although
> only in a few proper names.
On http://www.unicode.org/charts/ in 'Latin-1 Supplement' it says:
U+00FF Latin small letter y with diaeresis; French.
You can look up how the character looks there too.
>
> > > And that's a very common vowel; it's close to impossible to write
> > > a paragraph of Dutch text not using 'ij'.
>
> > In Unicode, this ligature has a different character, U+0132 (LATIN
> > CAPITAL LIGATURE IJ), which is clearly distinct from U+00DD (LATIN
> > CAPITAL LETTER Y WITH ACUTE); likewise for the small letters.
>
> Right. It occasionally occurs that people do use for the Dutch ij
> ligature, because this character is also missing in ISO 8859-1. Not
> being Dutch, I would hesitate to pronounce on the correctness of this;
> if it really is a ligature, it would normally be preferrable to use
> the two letters instead of a single letter which only vaguely looks
> like the ligature.
Indeed, in the absence of U+0132 (IJ ligature) and U+0133 (ij ligature)
you should use the two letters. Rijkaard, not R kaard.
Geert-Jan
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Michiel.Salters@cmg.nl (Michiel Salters)
Date: Wed, 30 Jan 2002 11:18:06 CST Raw View
Francis Glassborow <francis.glassborow@ntlworld.com> wrote in message news:<ctpy8mc6HZU8EwY+@robinton.ntlworld.com>...
> In article <cefd6cde.0201250138.5021e3f5@posting.google.com>, Michiel
> Salters <Michiel.Salters@cmg.nl> writes
> >If you care about this, please note that when capitalizing 'ij' or
> >'? in Dutch words, the result is 'IJ'. Yet another of the reasons
> >C++ can't just offer a bunch of locale-independent case-insensitive
> >string functions :(
>
> You know it really was most inconsiderate of our ancestors to be so
> atrociously bad at developing writing and the symbols required for that.
> It is so much better now we have standards to help us. :-)
Don't get me started on standards here. Some French group ( CEN/TC304 )
submitted a European standard requiring IJ to be written as Ij in
Dutch, never mind the fact that we've used IJ for the last
600 years. I can't blame you English for being suspicious about
European rules®ulations :(
BTW, it seems that I was mistaken about the being Dutch; the
Windows Dutch keyboard layout produced it where it should
produce 'ij'. Seems I'm not the only one surprised that you can't
write proper Dutch in ISO 8859.
( I ought to set the followup to comp.std.broken :) )
Regards,
--
Michiel Salters
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Michiel.Salters@cmg.nl (Michiel Salters)
Date: Fri, 25 Jan 2002 15:17:12 GMT Raw View
kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0201230700.371fb108@posting.google.com>...
> "P.J. Plauger" <pjp@dinkumware.com> wrote in message
> news:<3c4dc359$0$19813$4c41069e@reader0.ash.ops.us.uu.net>...
> > "Dave Leimbach" <leimbacd@bellsouth.net> wrote in message
> > news:E6j38.44988$Ee7.3775826@e3500-atl1.usenetserver.com...
[SNIP]
>, some
> implementations, like Sun, write their code so that using plain char
> as input to the functions in <cctype> almost works. Almost, because
> if fails when the character value is -1 (0xff, or ' ' in ISO 8859-1).
> Since this is such an obvious border case, however, I can't conceive
> of anyone not having thoroughly tested it. (Although the character is
> extremely rare; the only language I know of which uses it is French,
> and then only in a very few proper nouns. Which means that if you
> don't explicitly test for it, there's a good chance that random test
> data won't reveal the problem.)
I'm not that familiar with ISO 8859-1, but ' ' probably is the Dutch
vowel commonly written as 'ij' (' ' isn't on most keyboards ).
And that's a very common vowel; it's close to impossible to write
a paragraph of Dutch text not using 'ij'.
If you care about this, please note that when capitalizing 'ij' or
' ' in Dutch words, the result is 'IJ'. Yet another of the reasons
C++ can't just offer a bunch of locale-independent case-insensitive
string functions :(
Regards,
--
Michiel Salters
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Fri, 25 Jan 2002 17:26:47 GMT Raw View
In article <cefd6cde.0201250138.5021e3f5@posting.google.com>, Michiel=20
Salters <Michiel.Salters@cmg.nl> writes
>If you care about this, please note that when capitalizing 'ij' or
>'=FF' in Dutch words, the result is 'IJ'. Yet another of the reasons
>C++ can't just offer a bunch of locale-independent case-insensitive
>string functions :(
You know it really was most inconsiderate of our ancestors to be so=20
atrociously bad at developing writing and the symbols required for that.=20
It is so much better now we have standards to help us. :-)
--=20
Francis Glassborow
Check out the ACCU Spring Conference 2002
4 Days, 4 tracks, 4+ languages, World class speakers
For details see: http://www.accu.org/events/public/accu0204.htm
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Martin von Loewis <loewis@informatik.hu-berlin.de>
Date: Sat, 26 Jan 2002 22:38:38 GMT Raw View
Michiel.Salters@cmg.nl (Michiel Salters) writes:
> I'm not that familiar with ISO 8859-1, but '=FF' probably is the Dutch
> vowel commonly written as 'ij' ('=FF' isn't on most keyboards ).
I don't think this is true. Instead, I'd rather assume that this is
meant to be used with Welsh, see
http://www.ask-group.co.uk/publish/welshalph.htm
> And that's a very common vowel; it's close to impossible to write
> a paragraph of Dutch text not using 'ij'.=20
In Unicode, this ligature has a different character, U+0132 (LATIN
CAPITAL LIGATURE IJ), which is clearly distinct from U+00DD (LATIN
CAPITAL LETTER Y WITH ACUTE); likewise for the small letters.
Regards,
Martin
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 23 Jan 2002 10:43:40 CST Raw View
"P.J. Plauger" <pjp@dinkumware.com> wrote in message
news:<3c4dc359$0$19813$4c41069e@reader0.ash.ops.us.uu.net>...
> "Dave Leimbach" <leimbacd@bellsouth.net> wrote in message
> news:E6j38.44988$Ee7.3775826@e3500-atl1.usenetserver.com...
> > Is the datatype "char" specified in the standard [ C or C++ ] to
> > be signed by default? I ask because I noticed that the default
> > signedness of char on PowerPC architecture Linux is unsigned and
> > seemingly unsigned on non PPC architecture.
> > I don't know how badly this can hurt portability of code.
> We've lived with a mixture of unsigned (actually non-negative) and
> signed chars for nearly three decades now. A minor saving grace is
> that printable characters are supposed to have codes that convert to
> non-negative values when promoted to int. But you still have to be
> careful.
Where is that guarantee? All that I can find is in 2.2/3: "For each
basic execution character set, the values of the members shall be
non-negative and distinct from one another. The execution character
set and the execution wide-character set are supersets of the basic
execution character set and the basic execution wide-character set,
respectively. The values of the execution character sets are
implementation-defined, and any additional members are
locale-specific." The only guarantee I see there concerns the roughly
100 members of the BASIC execution character set. In practice, of
course (at least where I live), there are a number of printable
characters outside of the basic execution character set, and these
characters can have, and generally do have, negative values if char is
an eight bit signed quality.
In practice, I've not found this to be a great problem. The main
thing to remember is to cast to unsigned char before invoking any of
the functions in <cctype>; once <locale> becomes universal, even this
problem should go away. Since on most of the machines I work on,
plain char is signed anyway, this has become a second reflex to me,
but if you develop on a machine where plain char is unsigned, I can
see that this could lead to portability problems. Also, some
implementations, like Sun, write their code so that using plain char
as input to the functions in <cctype> almost works. Almost, because
if fails when the character value is -1 (0xff, or ' ' in ISO 8859-1).
Since this is such an obvious border case, however, I can't conceive
of anyone not having thoroughly tested it. (Although the character is
extremely rare; the only language I know of which uses it is French,
and then only in a very few proper nouns. Which means that if you
don't explicitly test for it, there's a good chance that random test
data won't reveal the problem.)
The other important rule is not to use char if you want a small,
signed int. That's what signed char is for. Leave char for
characters.
--
James Kanze mailto:kanze@gabi-soft.de
Beratung in objektorientierer Datenverarbeitung --
-- Conseils en informatique orient e objet
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany, T l.: +49 (0)69 19 86 27
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 23 Jan 2002 16:22:34 CST Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message news:d6651fb6.0201230700.371fb108@posting.google.com...
> > We've lived with a mixture of unsigned (actually non-negative) and
> > signed chars for nearly three decades now. A minor saving grace is
> > that printable characters are supposed to have codes that convert to
> > non-negative values when promoted to int. But you still have to be
> > careful.
>
> Where is that guarantee? All that I can find is in 2.2/3: "For each
> basic execution character set, the values of the members shall be
> non-negative and distinct from one another. The execution character
> set and the execution wide-character set are supersets of the basic
> execution character set and the basic execution wide-character set,
> respectively. The values of the execution character sets are
> implementation-defined, and any additional members are
> locale-specific." The only guarantee I see there concerns the roughly
> 100 members of the BASIC execution character set. In practice, of
> course (at least where I live), there are a number of printable
> characters outside of the basic execution character set, and these
> characters can have, and generally do have, negative values if char is
> an eight bit signed quality.
You're right. I overstated the case, weak as it was to begin with.
> In practice, I've not found this to be a great problem. The main
> thing to remember is to cast to unsigned char before invoking any of
> the functions in <cctype>; once <locale> becomes universal, even this
> problem should go away. Since on most of the machines I work on,
> plain char is signed anyway, this has become a second reflex to me,
> but if you develop on a machine where plain char is unsigned, I can
> see that this could lead to portability problems. Also, some
> implementations, like Sun, write their code so that using plain char
> as input to the functions in <cctype> almost works. Almost, because
> if fails when the character value is -1 (0xff, or ' ' in ISO 8859-1).
> Since this is such an obvious border case, however, I can't conceive
> of anyone not having thoroughly tested it. (Although the character is
> extremely rare; the only language I know of which uses it is French,
> and then only in a very few proper nouns. Which means that if you
> don't explicitly test for it, there's a good chance that random test
> data won't reveal the problem.)
>
> The other important rule is not to use char if you want a small,
> signed int. That's what signed char is for. Leave char for
> characters.
Good coding guidelines, all.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Dave Leimbach <leimbacd@bellsouth.net>
Date: Tue, 22 Jan 2002 19:49:07 GMT Raw View
Is the datatype "char" specified in the standard [ C or C++ ] to be signed
by default? I ask because I noticed that the default signedness of char on
PowerPC architecture Linux is unsigned and seemingly unsigned on non PPC
architecture.
I don't know how badly this can hurt portability of code.
Dave
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Tue, 22 Jan 2002 20:29:33 GMT Raw View
"Dave Leimbach" <leimbacd@bellsouth.net> wrote in message news:E6j38.44988$Ee7.3775826@e3500-atl1.usenetserver.com...
> Is the datatype "char" specified in the standard [ C or C++ ] to be signed
> by default? I ask because I noticed that the default signedness of char on
> PowerPC architecture Linux is unsigned and seemingly unsigned on non PPC
> architecture.
>
> I don't know how badly this can hurt portability of code.
We've lived with a mixture of unsigned (actually non-negative) and signed
chars for nearly three decades now. A minor saving grace is that printable
characters are supposed to have codes that convert to non-negative values
when promoted to int. But you still have to be careful.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Pete Becker <petebecker@acm.org>
Date: Tue, 22 Jan 2002 20:30:10 GMT Raw View
Dave Leimbach wrote:
>
> Is the datatype "char" specified in the standard [ C or C++ ] to be signed
> by default?
No.
> I ask because I noticed that the default signedness of char on
> PowerPC architecture Linux is unsigned and seemingly unsigned on non PPC
> architecture.
>
> I don't know how badly this can hurt portability of code.
>
It doesn't hurt it at all if you don't make assumptions about the sign.
<g>
--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]