Topic: ANSI/C++ string class not correct for non English languages ?????


Author: jkanze@otelo.ibmmail.com
Date: 1998/03/11
Raw View
In article <35058106.2681@central.beasys.com>,
  david.tribble@noSPAM.central.beasys.com wrote:
>
> I, David R Tribble <david.tribble@central.beasys.com> wrote:
> >> It should behave correctly, but it doesn't.  Consider an
> >> implementation with signed chars that defines EOF to be -1:
> >>
> >>     #include <stdio.h>
> >>
> >>     if (isprint('\xFF')) ...  // True, FF =3D Latin-1 y+diaeresis
> >>     if (isprint(EOF)) ...     // False, EOF is not printable
>
> Mea culpa, that should be:
>     #include <ctype.h>
>
> Achim Gratz wrote:
> > That would be a bug in your program, since EOF is not a char, but an
> > integer in this case.  It must be distinguishable from any char and
> > thus it cannot be a char itself, signed or not.
>
> No, my program is quite legal and bug-free.  In C, '\xFF' and EOF
> are both integer constants.  In C++, '\xFF' will convert to an
> 'int' constant, either through sign-extension (if 'char' is signed)
> or zero-extension (if 'char' is unsigned).

Correct so far.

> The C++ library is based on the C library, for which isprint() et al
> take an 'int' argument.  EOF is a perfectly legal argument.
> See below.

PARTS of the C++ library is based on the C library.  The functions in
<locale> are NOT.

> I also wrote:
> >> Consider an
> >> implementation with signed chars that defines EOF to be -1:
>
> jkanze@otelo.ibmmail.com wrote:
> > Irrelevant: EOF is NOT a legal input for the is.... functions in
> > locale.
>
> I beg to differ, quoting from the C89 standard, upon which the
> C++ library is based:

Your quote concerns the functions in <ctype.h> (or <cctype>), not the
functions in <locale>.  The header files <ctype.h> and <cctype> are
mainly present for C compatibility, I would imagine.  At any rate,
C++ code should be using <locale>, except that most compilers don't
support it yet.


> > Here, you are probably getting the old isxxx functions from ctype.h.
> > The new ones *require* a locale as the second argument.  It would be
> > nice if the second argument could default to the global locale, but
> > as someone (Nathan Meyers?) pointed out, the ambiguities with the
> > older functions would create more problems than it was worth.
>
> Perhaps the situation is better in the C++ library.  Is EOF not
> allowed as an argument to isprint() in the C++ library?

Maybe.  As to be expected:-), the new functions are templates over
the type of the character.  Of course, they only make sense for types
over which they are specialized, but "isprint( 2.4 , locale() )" is
not forbidden (although the signification would be implementation
defined).

An implementation is required to provide specializations only for char
and wchar_t, so using one of the functions with any other type is
unportable.  I've not been able to find a concrete statement concerning
legal input values, but without any stated restrictions, I would suppose
any value the instantiation type can hold would be legal and defined.

And to your question: if char's are signed, and EOF is a typedef to int,
the value EOF will fit on a char, and so can be passed to the new functions.
In this context, however, its value will not be interpreted as EOF, but
as whatever character happens to have this code (or bit pattern, since
most implementations do put ISO 8859-1 on a signed eight bit char, and
interpret the char's using the underlying bit pattern, rather than the
numeric value).

> But there's still the inherent problem, on systems where 'char' is
> signed and EOF is -1, that '\xFF' == EOF is true.  Character '\xFF'
> is a valid Latin-1 character code, and EOF is a code that's supposed
> to be distinct from all other valid character codes, so they should
> compare unequal.  This can be fixed by assuming that plain 'char' is
> unsigned, and/or by defining EOF to be something like -256.

As in C, all of the functions in C++ which can return an EOF return
an int, with values between 0 and UCHAR_MAX, or EOF.

Technically, you're correct, but in practice, it isn't a problem.

> >> I've suggested a couple of times in the past that plain 'char' should
> >> be unsigned in order to simplify (i.e., to correct) such things, but
> >> to no avail.
>
> > I've suggested this too.  The problem is that there really is a large
> > body of code that uses char (and not signed char) for small signed
> > int's.
> > Such code isn't portable; you can even argue that it isn't correct.
> > But I can well understand a vendor not wanting to break it.
>
> If it really is incorrect code, then breaking it is acceptable for
> the new language standard - the new standards never claimed they
> wouldn't break bad code.  And this kind of code can be fixed, by
> substituting 'signed char'.

The standard can say what it likes, but if you require char to be unsigned,
then every vendor who currently has a signed char will continue to
support it, probably by default.  Which means that compiler flags will
affect the semantics of your program.

Frankly, I'm not sure what the correct solution is.  For me, the problem
is simple.  They can and should declare that char must be unsigned.
It won't break any of my code, and if it does, I'll be glad of it; it
will make me fix an error I wasn't aware of.  Not all programmers have
the same attitude as I do, here, and vendors are in the business to sell
as many compilers as they can, not to be right at all cost.  If conforming
to the (new) standard costs sales, they won't conform, or they will only
conform with special flags that nobody uses.  In the end, the results
will probably be worse, even for me, that the current situation.

> > There are also machines on which an unsigned char is slower than a
> > signed one.
>
> Indeed.  But if you agree that 'char' should not be signed, then
> efficiency is not a reasonable argument for keeping it signed.  The
> question is, should 'char' mean "character code" or "small integer"?
> (Java has a nice answer to this question.)

It's a nice answer for a new language which doesn't have to worry about
backwards compatibility or existing code, although I wonder how good it
will seem when they realize that 16 bits isn't enough.  And I think that
most people agree that failing to distinguish between "small integer"
and "character" is probably an error in language design.  But it's too
late to change any of this in C or C++ today.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/11
Raw View
In article <clnujasdc.fsf@ite127.inf.tu-dresden.de>,
  Achim Gratz <gratz@ite.inf.tu-dresden.de> wrote:
>
> David R Tribble <david.tribble@central.beasys.com> writes:
>
> > It should behave correctly, but it doesn't.  Consider an
> > implementation with signed chars that defines EOF to be -1:
> >
> >     #include <stdio.h>
> >
> >     if (isprint('\xFF')) ...    // True, FF =3D Latin-1 y+diaeresis (=FF)
> >     if (isprint(EOF)) ...       // False, EOF is not printable
>
> That would be a bug in your program, since EOF is not a char, but an
> integer in this case.  It must be distinguishable from any char and
> thus it cannot be a char itself, signed or not.

It's not a bug in his program, since his program invokes the isprint
function declared in <cctype>, which is defined for all values between
0 and UCHAR_MAX inclusive, plus EOF.

It's just not relevant to my comments, which concerned the isxxx
functions in <locale>.  These are not defined for EOF, but for all
possible values of the instantiation type.  (I suppose.  I would feel
happier if I could find a sentence in the standard which said this
explicitly.)

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James Kuyper <kuyper@wizard.net>
Date: 1998/03/09
Raw View
Jerry Coffin wrote:
>
> In article <35002C69.3BF853D4@acm.org>, petebecker@acm.org says...
> > Jerry Coffin wrote:
> > >
> > >
> > > As we speak, it's getting close to 8 years since the C standard was
> > > ratified, but the majority of C compilers seem to have limited (or
> > > nonexistent) support for any but one locale: the C locale that's been
> > > required since the beginning.  The problem here isn't the standard,
> > > it's the vendors...
> >
> > Well, maybe. But remember that vendors respond to customers. So maybe
> > the problem is lack of demand.
>
> I'll bow to your superior personal knowledge of that.  I have to agree
> that if demand was strong, the vendors would undoubtedly meet it.  I
> think to some extent, it's a bit of a chicken and egg problem: if one
> vendor were to come out with really good locale support, demand for it
> would likely grow.  (Hint hint...)

Well, the SGI compiler we use at my company supports the following
locales:
C             en_AU         fr_BE         nl            sv
cs            en_CA         fr_CA         nl_BE         tr
da            en_US         fr_CH         no            zh_CN.ugb
de            es            is            pl            zh_TW.uncs
de_AT         es_AR         it            pt
de_CH         es_MX         it_CH         pt_BR
el            fi            ja_JP.EUC     ru
en            fr                          sk

In addition, the man pages for viturally every IRIX 6.2 system utility
indicate support of locales, based upon environment variables with the
same names as the setlocale() macros. I don't know how well locales are
actually supported; my work just barely requires support for English, so
I haven't had to use the other locales. Still, if a single vendor with
decent support were all that was needed, that problem has already been
solved.

I think one of the key barriers to greater use of locales, is the
difficulty of writing portable interoperable C code that makes use of
them. The fact that so many aspects of locales are implementation
defined means that most code which uses them must contain significant
sections which need to be re-written for each implementation. This
problem could be greatly reduced by:

Requiring an implementation to document through a macro (WC_LOCALE?)
defined in the locale.h the name of the locale used to convert wide
character constants to the execution wide character set.

Adding a library function

 const char *getlocale(unsigned int n);

which retrieves a pointer to the name of the n'th supported locale,
returning NULL if and only if n > N, for some implementation-defined
value of N.

Mandating that if the name of any of the properties recognised by
towctrans() is a locale name, then that property implements a
transformation from the current locale to the specified locale.

Finally, the current standard requires the development of an
implementation-defined conversion from UCN's to the execution wide
character set, to be used in translation phase 5. It seems to me that it
wouldn't require much additional work by the implementor if the standard
required that the same conversion be made available to programs through
a function such as

 wchar_t UCNtowc(uint_least32_t ucn);

That conversion needn't be invertible, but a semi-inverse wctoUCN() can
and should be made available, satisfying the requirement that

 UCNtowc(wctoUCN(a))==a

where 'a' is any legal wide character constant. The return type of
wctoUCN could be smaller than uint_least32_t, depending on the value of
the largest UCN with a non-trivial conversion to wchar_t. An
implementation which makes wchar_t identical with char could make do
with 'unsigned char' as the return type.

"string" oriented versions of these functions would be even more useful.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/10
Raw View
In article <MPG.f6bbf9625c607bf98988e@news.rmi.net>,
  jcoffin@taeus.com (Jerry Coffin) wrote:
>
> In article <35002C69.3BF853D4@acm.org>, petebecker@acm.org says...
> > Jerry Coffin wrote:
> > >
> > >
> > > As we speak, it's getting close to 8 years since the C standard was
> > > ratified, but the majority of C compilers seem to have limited (or
> > > nonexistent) support for any but one locale: the C locale that's be=
en
> > > required since the beginning.  The problem here isn't the standard,
> > > it's the vendors...
> >
> > Well, maybe. But remember that vendors respond to customers. So maybe
> > the problem is lack of demand.
>
> I'll bow to your superior personal knowledge of that.  I have to agree
> that if demand was strong, the vendors would undoubtedly meet it.  I
> think to some extent, it's a bit of a chicken and egg problem: if one
> vendor were to come out with really good locale support, demand for it
> would likely grow.  (Hint hint...)

I'm not so sure.  The vendors, or at least some, already have correct
support for internationalization.  But the only programs I see that use
what's there, on Sun at least, are those from Sun itself.  Third party
software seems to ignore it completely.  The same thing under Windows:
the OS and Word are German, but almost everything else is US English.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient=E9e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----=3D=3D Posted via Deja News, The Leader in Internet Discussion =3D=3D=
-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/10
Raw View
In article <MPG.f6bc0e38cc4927698988f@news.rmi.net>,
  jcoffin@taeus.com (Jerry Coffin) wrote:

> It's certainly true that any locale can include extra characters if it
> chooses to do so.  However, I wouldn't particularly expect to see
> sensible support for, say, Swedish characters unless I selected a
> Swedish locale.  Unfortunately, with most compilers, I simply can't do
> that...

Are you sure?  Perhaps it is simply because your sysadmin hasn't installed
it.  Or maybe the delivered versions of the programs are different in
the US.  I've been able to get both French and German support with all
compilers tried under Solaris, HP-UX and AIX. And I suspect that it
would work under Windows; the documentation says so, but I haven't tried
it yet.

In all cases, however, it is dependant on support for the locale having
been installed in the OS.  Here, I can only speak for Solaris, but it
IS a choice in the installation procedure of the OS.  On my home machine,
I only bothered with French, German, Italian and English, because I don't
normally use any other languages.  And my case is a bit special: my wife's
Italian, and I lived for many years directly on the French-German border.
I would guess that most sysadmins in France only bother with French and
English, and that most in American only install the English.  (This may
also be a question of packaging.  I know that Sun ships machines to
Japan, with support for Japanese, but I didn't have that option on the
Solaris I purchased in France.  Perhaps in the US, they only ship the
added support as a special option.)

The point is: the support is potentially there.  On all of the compilers
I've tried, setlocale works.  For all of the installed locales.

The other thing I've noticed, however, is that setting the locale environment
variables doesn't do anything for most programs.  It's not the OS, and
it's not the compiler, because it works with my programs, and most of
those delivered by Sun.  So I can only conclude that the problem is with
most programmers.  (Of course, this may be conditioned by other facts.
I develop programs, and all of the third party software I have is related
to this activity.  And people who develop C++ programs do know English,
regardless of what their native language is, and generally prefer to
use an American keyboard when developing.  So the preasure to
internationalize is probably considerably less than in other application
areas.)

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/10
Raw View
In article <3505e5fe.48537757@news>,
  menright@cts.cog wrote:
>
> jcoffin@taeus.com (Jerry Coffin) wrote:
>
> >Sorry to follow up to my own post, but I should amend the statement
> >above a bit: calling this decision "stupid" is overly harsh, and
> >basically incorrect; requiring char to be unsigned would have made
> >some internationalization simpler, but would have caused other
> >problems, particularly breaking substantial amounts of existing code.
> >In the end, there was no easy way out, and the one they took was
> >reasonable, even though it'd be nice if the circumstances had been
> >enough different that they could have made a different decision.
>
> Actually, in the past some widely used compilers did some things
> loosely enough that American programmers, pre-standard, were able to
> use unsigned characters pretty freely. We looked forward to the
> standard, but we got bit by it because it restricted the use of
> unsigned characters to basically small unsigned quantities that the
> library doesn't want to use for text. Substantial existing code WAS
> broken by the standard. And the recovery process tended to make the
> code harder to internationalize.

I'm curious, but what code was broken by the standard in this case?

The signedness of char has always been undefined, since the earliest
days (PDP was signed, Interdata -- the second port, I think -- unsigned).
Portable programs couldn't count on it being one way or the other, and
didn't.

It was signed on a lot of implementations, and a lot of non-portable
programs did use this fact, either conciously or not.  Although it is
(and probably was) quite clear that technically, requiring char to be
unsigned would be a good thing, I can well understand why implementors
who had been doing it signed before would oppose a change which would
break their users' code.  Such implementors would, of course, keep it
signed, so nothing would change for their users.

I'll admit that I don't fully understand your comment, but the problem
with internationalization is NOT caused by any recovery process involving
code broken by the standard.  The problem with internationalization is
caused by the (understandable) refusal of a large number of implementors
to break existing non-portable code, by being required to change the
signedness of THEIR char.

In the end, the problem with internationalization is caused by programmers
who prefered to just "try it and see", rather than reading K&R.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/10
Raw View
In article <35014A4B.C7D60A9C@acm.org>,
  Pete Becker <petebecker@acm.org> wrote:

> > Basically put, the standard allows (but doesn't mandate) reasonable
> > behavior, though there were a few stupid things they did that made
> > life much more difficult for most Europeans, such as allowing char to
> > be signed.
>
> I'm not sure what the point here is. The string class is designed to be
> used with more or less arbitrary character-like types. If you don't want
> to use signed characters in a string, don't use them.

In theory, a "character" is not a numerical value, so it is neither
signed nor unsigned.  In practice, the *only* time I've had problems
with the signedness of char's is in using the functions defined in ctype.h,
where an expression with type char must be cast to unsigned char before
being passed to any of the functions declared in the header.

This is, or shortly will be, history.  The C++ standard defines the
same functions, more or less, in locale.  Given that the functions
in locale take a char, however, and not an int, and that there are
no restrictions on the value of this char in the standard, I suppose
that calling the function with a char will work--if char's are signed,
it's the implementations problem to ensure that I still get the correct
response for the character in the character set defined by the locale
(which is the second argument).

So about the only problem left is when you need to define a ctype like
array of your own.  But then, I've always done this with an access function
(or macro, in the old days) which subtracts CHAR_MIN from the input value
anyway.  (The old ctype.h functions didn't do this because they also
had to support EOF.)

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orientie objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: David R Tribble <david.tribble@central.beasys.com>
Date: 1998/03/10
Raw View
jcoffin@taeus.com (Jerry Coffin) wrote:
>> Basically put, the standard allows (but doesn't mandate) reasonable
>> behavior, though there were a few stupid things they did that made
>> life much more difficult for most Europeans, such as allowing char
>> to be signed.

jkanze@otelo.ibmmail.com wrote:=20
> Actually, C++ has solved the main problem with signedness of char, I
> think.  In locale, std::isxxx( char , locale ) should behave correctly
> for all possible characters, even if the numerical value is negative.
> And this was the major problem with char's possibly having negative
> values.

It should behave correctly, but it doesn't.  Consider an
implementation with signed chars that defines EOF to be -1:

    #include <stdio.h>

    if (isprint('\xFF')) ...    // True, FF =3D Latin-1 y+diaeresis (=FF)
    if (isprint(EOF)) ...       // False, EOF is not printable

Only one of these should be true.

I've suggested a couple of times in the past that plain 'char' should
be unsigned in order to simplify (i.e., to correct) such things, but
to no avail.  The usual response is to quote the ISO (C) standard,
where it states that the isxxx() functions are well defined only
for unsigned (non-negative) character arguments.  In other words,
the first example above is considered incorrect, and instead should
be:
    if (isprint((unsigned char) '\xFF')) ...

This of course appears to contradict the existence of signed 'char',
but some have argued that this is really not a contradiction.  Such
arguments fail to convince me.  What a pain.


-- David R. Tribble, david.tribble@noSPAM.central.beasys.com --
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Achim Gratz <gratz@ite.inf.tu-dresden.de>
Date: 1998/03/10
Raw View
David R Tribble <david.tribble@central.beasys.com> writes:

> It should behave correctly, but it doesn't.  Consider an
> implementation with signed chars that defines EOF to be -1:
>
>     #include <stdio.h>
>
>     if (isprint('\xFF')) ...    // True, FF =3D Latin-1 y+diaeresis (=FF)
>     if (isprint(EOF)) ...       // False, EOF is not printable

That would be a bug in your program, since EOF is not a char, but an
integer in this case.  It must be distinguishable from any char and
thus it cannot be a char itself, signed or not.


Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+--
WWW:    http://www.inf.tu-dresden.de/~ag7/{english/}
E-Mail: gratz@ite.inf.tu-dresden.de
Phone:  +49 351 463 - 8325
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/10
Raw View
In article <35045CC2.7F4@central.beasys.com>,
  david.tribble@noSPAM.central.beasys.com wrote:
>
> jcoffin@taeus.com (Jerry Coffin) wrote:
> >> Basically put, the standard allows (but doesn't mandate) reasonable
> >> behavior, though there were a few stupid things they did that made
> >> life much more difficult for most Europeans, such as allowing char
> >> to be signed.
>
> jkanze@otelo.ibmmail.com wrote:=20
> > Actually, C++ has solved the main problem with signedness of char, I
> > think.  In locale, std::isxxx( char , locale ) should behave correctly
> > for all possible characters, even if the numerical value is negative.
> > And this was the major problem with char's possibly having negative
> > values.
>
> It should behave correctly, but it doesn't.

The standard requires it to behave correctly.  As to what happens in
actual implementations, none of the compilers I have fully support
locale yet anyway, so I am unable to test.

> Consider an
> implementation with signed chars that defines EOF to be -1:

Irrelevant: EOF is NOT a legal input for the is.... functions in locale.


>     #include <stdio.h>
>
>     if (isprint('\xFF')) ...    // True, FF =3D Latin-1 y+diaeresis (=FF)
>     if (isprint(EOF)) ...       // False, EOF is not printable

Here, you are probably getting the old isxxx functions from ctype.h.
The new ones *require* a locale as the second argument.  It would be
nice if the second argument could default to the global locale, but
as someone (Nathan Meyers?) pointed out, the ambiguities with the
older functions would create more problems than it was worth.  (Perhaps
the new functions should have been named is_xxx, or some such.  But
it's too late for that now.)

> Only one of these should be true.
>
> I've suggested a couple of times in the past that plain 'char' should
> be unsigned in order to simplify (i.e., to correct) such things, but
> to no avail.

I've suggested this too.  The problem is that there really is a large
body of code that uses char (and not signed char) for small signed int's.
Such code isn't portable; you can even argue that it isn't correct.  But
I can well understand a vendor not wanting to break it.  (Personally,
I'd have broken it anyway, but I can understand the vendor's arguments,
even if I don't agree with them.)

There are also machines on which an unsigned char is slower than a signed
one.  (The Vax is one, I think.  The PDP certainly was, which is why signed
char's ever got in there in the first place.)

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orientie objet --
              -- Beratung in objektorientierter Datenverarbeitung


-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: David R Tribble <david.tribble@central.beasys.com>
Date: 1998/03/10
Raw View
I, David R Tribble <david.tribble@central.beasys.com> wrote:
>> It should behave correctly, but it doesn't.  Consider an
>> implementation with signed chars that defines EOF to be -1:
>>
>>     #include <stdio.h>
>>
>>     if (isprint('\xFF')) ...  // True, FF =3D Latin-1 y+diaeresis
>>     if (isprint(EOF)) ...     // False, EOF is not printable

Mea culpa, that should be:
    #include <ctype.h>

Achim Gratz wrote:
> That would be a bug in your program, since EOF is not a char, but an
> integer in this case.  It must be distinguishable from any char and
> thus it cannot be a char itself, signed or not.

No, my program is quite legal and bug-free.  In C, '\xFF' and EOF
are both integer constants.  In C++, '\xFF' will convert to an
'int' constant, either through sign-extension (if 'char' is signed)
or zero-extension (if 'char' is unsigned).

The C++ library is based on the C library, for which isprint() et al
take an 'int' argument.  EOF is a perfectly legal argument.
See below.

I also wrote:
>> Consider an
>> implementation with signed chars that defines EOF to be -1:

jkanze@otelo.ibmmail.com wrote:
> Irrelevant: EOF is NOT a legal input for the is.... functions in
> locale.

I beg to differ, quoting from the C89 standard, upon which the
C++ library is based:

  7.3  Character handling <ctype.h>

  The header <ctype.h> declares several functions useful for
  testing and mapping characters.  In all cases the argument is an
  'int', the value of which shall be representable as an 'unsigned
  char' or shall equal the value of the macro 'EOF'.  If the
  argument has any other value, the behavior is undefined.

>>     #include <stdio.h>
>>
>>     if (isprint('\xFF')) ...    // True, FF =3D Latin-1 y+diaeresis
>>     if (isprint(EOF)) ...       // False, EOF is not printable

> Here, you are probably getting the old isxxx functions from ctype.h.
> The new ones *require* a locale as the second argument.  It would be
> nice if the second argument could default to the global locale, but
> as someone (Nathan Meyers?) pointed out, the ambiguities with the
> older functions would create more problems than it was worth.

Perhaps the situation is better in the C++ library.  Is EOF not
allowed as an argument to isprint() in the C++ library?

But there's still the inherent problem, on systems where 'char' is
signed and EOF is -1, that '\xFF' == EOF is true.  Character '\xFF'
is a valid Latin-1 character code, and EOF is a code that's supposed
to be distinct from all other valid character codes, so they should
compare unequal.  This can be fixed by assuming that plain 'char' is
unsigned, and/or by defining EOF to be something like -256.

>> I've suggested a couple of times in the past that plain 'char' should
>> be unsigned in order to simplify (i.e., to correct) such things, but
>> to no avail.

> I've suggested this too.  The problem is that there really is a large
> body of code that uses char (and not signed char) for small signed
> int's.
> Such code isn't portable; you can even argue that it isn't correct.
> But I can well understand a vendor not wanting to break it.

If it really is incorrect code, then breaking it is acceptable for
the new language standard - the new standards never claimed they
wouldn't break bad code.  And this kind of code can be fixed, by
substituting 'signed char'.

> There are also machines on which an unsigned char is slower than a
> signed one.

Indeed.  But if you agree that 'char' should not be signed, then
efficiency is not a reasonable argument for keeping it signed.  The
question is, should 'char' mean "character code" or "small integer"?
(Java has a nice answer to this question.)


-- David R. Tribble, david.tribble@noSPAM.central.beasys.com --
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: dHarrison@worldnet.att.net (Doug Harrison)
Date: 1998/03/11
Raw View
On 10 Mar 98 05:04:18 GMT, David R Tribble
<david.tribble@central.beasys.com> wrote:

>I've suggested a couple of times in the past that plain 'char' should
>be unsigned in order to simplify (i.e., to correct) such things, but
>to no avail.

It's historical baggage that will likely never be dropped.

>the first example above is considered incorrect, and instead should
>be:
>    if (isprint((unsigned char) '\xFF')) ...
>
>This of course appears to contradict the existence of signed 'char',
>but some have argued that this is really not a contradiction.  Such
>arguments fail to convince me.  What a pain.

I agree. Subtle requirements such as the above truly set one up to
fail. std::locale at least addresses that specific problem.

Speaking of subtleties, the common implementation of
char_traits<char>::compare in terms of memcmp is wrong, if plain chars
are signed, because the former is defined in terms of
char_traits<char>::lt and eq, which are defined as the built-in < and
== operators. memcmp, however, is defined to compare unsigned chars! I
think char_traits<char>::lt and eq should be defined to compare
unsigned chars as well, and then memcmp would be acceptable, as I
believe is the intent. After all, what is a std::string, a sequence of
small integers, or a sequence of characters? Unless "negative
characters" make sense in some context unknown to me, comparing
unsigned chars seems the right approach.

--
Doug Harrison
dHarrison@worldnet.att.net
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/03/11
Raw View
Doug Harrison<dHarrison@worldnet.att.net> wrote:
>
>Speaking of subtleties, the common implementation of
>char_traits<char>::compare in terms of memcmp is wrong, if plain chars
>are signed, because the former is defined in terms of
>char_traits<char>::lt and eq, which are defined as the built-in < and
>== operators. memcmp, however, is defined to compare unsigned chars! I
>think char_traits<char>::lt and eq should be defined to compare
>unsigned chars as well, and then memcmp would be acceptable, as I
>believe is the intent. After all, what is a std::string, a sequence of
>small integers, or a sequence of characters? Unless "negative
>characters" make sense in some context unknown to me, comparing
>unsigned chars seems the right approach.

I agree this is a bug.  Certainly char_traits<char>::lt and
char_traits<char>::compare should give the same result!

However, whether that is based on an unsigned or signed compare
doesn't really matter; the result is not really meaningful for
any human language anyway.  It just needs to be deterministic for
the purposes of retrieval from sorted contatiners.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/03/11
Raw View
In article <6e1fgp$cq$1@nnrp1.dejanews.com>, jkanze@otelo.ibmmail.com
says...

[ I said: ]
> > It's certainly true that any locale can include extra characters if it
> > chooses to do so.  However, I wouldn't particularly expect to see
> > sensible support for, say, Swedish characters unless I selected a
> > Swedish locale.  Unfortunately, with most compilers, I simply can't do
> > that...
>
> Are you sure?  Perhaps it is simply because your sysadmin hasn't installed
> it.

Under the circumstances, the sysadmin would be me, to the extent there
is one at all.

> Or maybe the delivered versions of the programs are different in
> the US.  I've been able to get both French and German support with all
> compilers tried under Solaris, HP-UX and AIX. And I suspect that it
> would work under Windows; the documentation says so, but I haven't tried
> it yet.

Hmm...well, I mostly use the more popular compilers for Win32 and I
run them on Windows NT.  I haven't loaded any internationalized
versions of NT, but given that internally it uses UNICODE almost
exclusively, I was under the impression that internationalized
versions mostly just had the text of system messages and such
translated.  I might have to install another version to see exactly
what happens then.  It _may_ also be that when I installed the
compiler I didn't include the appropriate options -- I don't remember
asking that such things be left out, but given the amount of
programming I do for anything but an English speaking audience, I
might have left such a thing out without remembering it.

> I know that Sun ships machines to
> Japan, with support for Japanese, but I didn't have that option on the
> Solaris I purchased in France.  Perhaps in the US, they only ship the
> added support as a special option.)

That might well be, or as I mentioned above, it could simply be that
I've left the support out without remembering that I did so.

[ ... ]

> (Of course, this may be conditioned by other facts.
> I develop programs, and all of the third party software I have is related
> to this activity.  And people who develop C++ programs do know English,
> regardless of what their native language is, and generally prefer to
> use an American keyboard when developing.  So the preasure to
> internationalize is probably considerably less than in other application
> areas.)

Quite likely true.  I had to port a program to work with Japanese
characters a few years ago -- what I saw of Input Method Editors makes
me cringe at the idea of doing programming using such things.

(For those who aren't familiar with the terminology, an Input Method
Editor allows you to use a keyboard containing only a hundred or so
keys to work with languages that contain thousands of characters.  The
only real comfort is that in these languages, a single character is
roughly equivalent to what most of us think of as a word, so using
several keystrokes to enter a character may not be as much of a loss
as it initially seems.)

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James Kuyper <kuyper@wizard.net>
Date: 1998/03/11
Raw View
David R Tribble wrote:
...
> Indeed.  But if you agree that 'char' should not be signed, then
> efficiency is not a reasonable argument for keeping it signed.  The
> question is, should 'char' mean "character code" or "small integer"?
> (Java has a nice answer to this question.)

The importance of efficiency is different for each application. The
standard should not be designed based upon sweeping generalizations as
the importance of efficiency compared with other issues.
If plain 'char' were either prohibited, or mandated to be unsigned, then
there should also be some way for a program to identify whether
'unsigned char' or 'signed char' is faster. A typedef would be
appropriate; it could be called 'fast_char'. Of course, once you've done
that, you're essentially back where you started.

The real problem is not the signedness of plain char, but the existence
of standard library functions which require a specific signedness to
work properly. The new functions templated by character type go a long
way toward eliminating that problem.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James Kuyper <kuyper@wizard.net>
Date: 1998/03/07
Raw View
Jerry Coffin wrote:
>
> In article <888969589.372271@nn1>, lars.rosenberg@mbox302.swipnet.se
> says...
> > Does the standardisation for the string class tell if it shall convert
> > European characters from and to uppercase characters.
>
> If memory serves, the exact effect of a case conversion is locale
> dependent -- non-English characters aren't included in the default
> locale, so it's left up to the individual locales to implement the
> case conversion correctly.

The standard specifies very few details about the "C" locale, and
nothing at all about the default "" locale. In particular, it says
nothing to prohibit non-English characters. The LC_CTYPE "C" locale must
have the basic character set, but is not prohibited from containing
other characters as well.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Pete Becker <petebecker@acm.org>
Date: 1998/03/07
Raw View
Jerry Coffin wrote:
>
>
> As we speak, it's getting close to 8 years since the C standard was
> ratified, but the majority of C compilers seem to have limited (or
> nonexistent) support for any but one locale: the C locale that's been
> required since the beginning.  The problem here isn't the standard,
> it's the vendors...

Well, maybe. But remember that vendors respond to customers. So maybe
the problem is lack of demand.
 -- Pete
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/03/07
Raw View
In article <MPG.f67fc6bd93ba088989878@news.rmi.net>, jcoffin@taeus.com
says...

[ ... ]

> Basically put, the standard allows (but doesn't mandate) reasonable
> behavior, though there were a few stupid things they did that made
> life much more difficult for most Europeans, such as allowing char to
> be signed.

Sorry to follow up to my own post, but I should amend the statement
above a bit: calling this decision "stupid" is overly harsh, and
basically incorrect; requiring char to be unsigned would have made
some internationalization simpler, but would have caused other
problems, particularly breaking substantial amounts of existing code.
In the end, there was no easy way out, and the one they took was
reasonable, even though it'd be nice if the circumstances had been
enough different that they could have made a different decision.

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jkanze@otelo.ibmmail.com
Date: 1998/03/07
Raw View
In article <MPG.f67fc6bd93ba088989878@news.rmi.net>,
  jcoffin@taeus.com (Jerry Coffin) wrote:

> Basically put, the standard allows (but doesn't mandate) reasonable
> behavior, though there were a few stupid things they did that made
> life much more difficult for most Europeans, such as allowing char to
> be signed.

Actually, C++ has solved the main problem with signedness of char, I
think.  In locale, std::isxxx( char , locale ) should behave correctly
for all possible characters, even if the numerical value is negative.
And this was the major problem with char's possibly having negative
values.

> As we speak, it's getting close to 8 years since the C standard was
> ratified, but the majority of C compilers seem to have limited (or
> nonexistent) support for any but one locale: the C locale that's been
> required since the beginning.  The problem here isn't the standard,
> it's the vendors...

Is that true?  All of the compilers I use support multiple locales, most
going beyond the standard in terms of support (locale specific message
generation, etc.)  (None of them support the new <locale> stuff yet,
but setlocale works for France or Germany for all of them.)

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Pete Becker <petebecker@acm.org>
Date: 1998/03/08
Raw View
Jerry Coffin wrote:
>
> In article <888969589.372271@nn1>, lars.rosenberg@mbox302.swipnet.se
> says...
> > Does the standardisation for the string class tell if it shall convert
> > European characters from and to uppercase characters.
>
> If memory serves, the exact effect of a case conversion is locale
> dependent -- non-English characters aren't included in the default
> locale, so it's left up to the individual locales to implement the
> case conversion correctly.

To clarify: the definition of the string class says nothing at all about
case conversions. Any problems with conversions are in locales, not
strings.

>
> Basically put, the standard allows (but doesn't mandate) reasonable
> behavior, though there were a few stupid things they did that made
> life much more difficult for most Europeans, such as allowing char to
> be signed.

I'm not sure what the point here is. The string class is designed to be
used with more or less arbitrary character-like types. If you don't want
to use signed characters in a string, don't use them.
 -- Pete
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: menright@cts.cog (Mike Enright)
Date: 1998/03/08
Raw View
jcoffin@taeus.com (Jerry Coffin) wrote:

>Sorry to follow up to my own post, but I should amend the statement
>above a bit: calling this decision "stupid" is overly harsh, and
>basically incorrect; requiring char to be unsigned would have made
>some internationalization simpler, but would have caused other
>problems, particularly breaking substantial amounts of existing code.
>In the end, there was no easy way out, and the one they took was
>reasonable, even though it'd be nice if the circumstances had been
>enough different that they could have made a different decision.

Actually, in the past some widely used compilers did some things
loosely enough that American programmers, pre-standard, were able to
use unsigned characters pretty freely. We looked forward to the
standard, but we got bit by it because it restricted the use of
unsigned characters to basically small unsigned quantities that the
library doesn't want to use for text. Substantial existing code WAS
broken by the standard. And the recovery process tended to make the
code harder to internationalize.


--
Mike Enright
menright@cts.com (Email replies cheerfully ignored, use the news group)
http://www.users.cts.com/sd/m/menright/
Cardiff-by-the-Sea, California, USA
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/03/08
Raw View
In article <35002C69.3BF853D4@acm.org>, petebecker@acm.org says...
> Jerry Coffin wrote:
> >
> >
> > As we speak, it's getting close to 8 years since the C standard was
> > ratified, but the majority of C compilers seem to have limited (or
> > nonexistent) support for any but one locale: the C locale that's been
> > required since the beginning.  The problem here isn't the standard,
> > it's the vendors...
>
> Well, maybe. But remember that vendors respond to customers. So maybe
> the problem is lack of demand.

I'll bow to your superior personal knowledge of that.  I have to agree
that if demand was strong, the vendors would undoubtedly meet it.  I
think to some extent, it's a bit of a chicken and egg problem: if one
vendor were to come out with really good locale support, demand for it
would likely grow.  (Hint hint...)

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/03/08
Raw View
In article <3500185B.41C6@wizard.net>, kuyper@wizard.net says...

[ ... ]

> > If memory serves, the exact effect of a case conversion is locale
> > dependent -- non-English characters aren't included in the default
> > locale, so it's left up to the individual locales to implement the
> > case conversion correctly.
>
> The standard specifies very few details about the "C" locale, and
> nothing at all about the default "" locale. In particular, it says
> nothing to prohibit non-English characters. The LC_CTYPE "C" locale must
> have the basic character set, but is not prohibited from containing
> other characters as well.

Sorry I should have been more careful with my wording.  When I said
"default" I meant the "C" locale.  While I know the "" locale is
referred to as the default locale, it's not the default.  The "C"
locale is the default, and the default locale must be specifically
selected to be used...

It's certainly true that any locale can include extra characters if it
chooses to do so.  However, I wouldn't particularly expect to see
sensible support for, say, Swedish characters unless I selected a
Swedish locale.  Unfortunately, with most compilers, I simply can't do
that...

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: "Lars Rosenberg" <lars.rosenberg@mbox302.swipnet.se>
Date: 1998/03/04
Raw View
Does the standardisation for the string class tell if it shall convert
European characters from and to uppercase characters.

Lars Rosenberg
Consultant in CS and programming
Karlstad Sweden.

P:S  When shall you English talking people realize there are more languages
than English and design the tools for many languages.                           E    ? D.S




[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jkanze@otelo.ibmmail.com
Date: 1998/03/04
Raw View
In article <888969589.372271@nn1>,
  "Lars Rosenberg" <lars.rosenberg@mbox302.swipnet.se> wrote:
>
> Does the standardisation for the string class tell if it shall convert
> European characters from and to uppercase characters.

The standard string class doesn't support conversion between upper and
lower for any locale, European or other.  The locale class does have
such support, and as you would imagine from the name, it is supposed
to do this in a locale dependant way.

In practice, of course, it doesn't work, because it supposes a one to one
mapping between upper and lower case, which doesn't reflect the reality.
(For that matter, just speaking of upper and lower case doesn't reflect
the reality: there are at least three cases without leaving Europe, and
alphabets based on the Roman alphabet.)

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/   Now offering spam-free web-based newsreading
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/03/06
Raw View
In article <888969589.372271@nn1>, lars.rosenberg@mbox302.swipnet.se
says...
> Does the standardisation for the string class tell if it shall convert
> European characters from and to uppercase characters.

If memory serves, the exact effect of a case conversion is locale
dependent -- non-English characters aren't included in the default
locale, so it's left up to the individual locales to implement the
case conversion correctly.

Basically put, the standard allows (but doesn't mandate) reasonable
behavior, though there were a few stupid things they did that made
life much more difficult for most Europeans, such as allowing char to
be signed.

> P:S  When shall you English talking people realize there are more languages
> than English and design the tools for many languages.                           E    ? D.S

Keep in mind that a large part of the standardization was done by an
ISO committee -- there was ample representation of people who speak a
variety of languages.  The major problem has been less with the
standard than with simply getting vendors to actually implement what's
needed to do the job.

As we speak, it's getting close to 8 years since the C standard was
ratified, but the majority of C compilers seem to have limited (or
nonexistent) support for any but one locale: the C locale that's been
required since the beginning.  The problem here isn't the standard,
it's the vendors...

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]