Topic: internationalization


Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 13 Dec 2002 21:24:37 +0000 (UTC)
Raw View
"P.J. Plauger" <pjp@dinkumware.com> wrote in message
news:<3df5f446$0$13780$724ebb72@reader2.ash.ops.us.uu.net>...
> ""James Kuyper Jr."" <kuyper@wizard.net> wrote in message
> news:3DF577BE.2090700@wizard.net...

> > You're right, of course. I was thinking C99, which does not
> > explicitly require wchar_t to be a distinct type.

> In fact, C99 *requires* that wchar_t be a synonym for one of the basic
> integer types.

> > >>It's perfectly legal for wchar_t to be a typedef for an 8-bit
> > >>char,

> > > No.

> > Correct. However, the underlying type for wchar_t could be char.

> Yes. And to be perfectly clear, we should distinguish between shared
> representation and same type. It is required that char share the same
> representation as either signed char or unsigned char; but the three
> are always distinct types. In C++, wchar_t probably has the same
> representation as one of the basic integer types, but is always a
> distinct type. In C, wchar_t is always the same type as one of the
> basic integer types.

In C++, wchar_t probably has the same representation, or must have the
same representation.  It must have the same size, signedness and
alignment as one of the basic integer types.  That doesn't leave much
room for variation.  (Could wchar_t use one's complement, while the
other integer types use two's complement?  Or could it have some don't
care or must be zero bits which the underlying type doesn't have?)

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net ("James Kuyper Jr.")
Date: Sat, 14 Dec 2002 09:50:20 +0000 (UTC)
Raw View
James Kanze wrote:
....
> In C++, wchar_t probably has the same representation, or must have the
> same representation.  It must have the same size, signedness and
> alignment as one of the basic integer types.  That doesn't leave much
> room for variation.  (Could wchar_t use one's complement, while the
> other integer types use two's complement?  Or could it have some don't
> care or must be zero bits which the underlying type doesn't have?)

Yes, and yes. I don't see much point in allowing those possibilities;
3.9.1p5 should probably have included "same representation". I don't see
how it would help to have all of those other features the same, if the
actual representation is allowed to be different.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 16 Dec 2002 16:52:20 +0000 (UTC)
Raw View
kuyper@wizard.net ("James Kuyper Jr.") wrote in message
news:<3DFA8DD3.70306@wizard.net>...
> James Kanze wrote:
> ....
> > In C++, wchar_t probably has the same representation, or must have
> > the same representation. It must have the same size, signedness and
> > alignment as one of the basic integer types. That doesn't leave much
> > room for variation. (Could wchar_t use one's complement, while the
> > other integer types use two's complement? Or could it have some
> > don't care or must be zero bits which the underlying type doesn't
> > have?)

> Yes, and yes. I don't see much point in allowing those possibilities;
> 3.9.1p5 should probably have included "same representation". I don't
> see how it would help to have all of those other features the same, if
> the actual representation is allowed to be different.

Or we take the other direction, and allow it to be a different size as
well.

Of course, that's another hole in C compatibility.  In this case, the C
compatibility is probably important, since it is quite reasonable to
expect a C++ program to want to use an interface defined in C which uses
wchar_t.

Suppose that a C++ implementation did use one of the options I mention
above.  In that case, of course, the type cannot be compatible with C
(since in C, wchar_t is a typedef).  So what is the following supposed
to mean:

    extern "C" void f( wchar_t ) ;

Is this enough of a problem to warrent a defect report?

In practice, there is no problem, because wchar_t always does have the
same representation as one of the basic integer types.  If only for
reasons of C compatibility.  On the other hand, if the implementation
uses 32 bit words for Unicode characters, a debugging implementation
which traps if the 32 bit word contains anything larger than 0x10FFFF,
might be useful.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: allan_w@my-dejanews.com (Allan W)
Date: Wed, 11 Dec 2002 03:40:44 +0000 (UTC)
Raw View
kuyper@wizard.net ("James Kuyper Jr.") wrote
> >>The standard recognizes four character types: signed char, char,
> >>unsigned char, and wchar_t. The latter is a typedef.
>
> I'm surprised that no one caught me on this one.

It's like when your teacher solves a math problem on the blackboard,
showing that 6+6=11. Most of the class just sits in stunned silence,
hoping the teacher will notice without help, not wanting to be the
first to point out an error. Yeah, that's probably it.  :-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 11 Dec 2002 10:07:03 CST
Raw View
""James Kuyper Jr."" <kuyper@wizard.net> wrote in message news:3DF577BE.2090700@wizard.net...

> You're right, of course. I was thinking C99, which does not explicitly
> require wchar_t to be a distinct type.

In fact, C99 *requires* that wchar_t be a synonym for one of the basic
integer types.

> >>It's perfectly legal for wchar_t to be a typedef for an 8-bit char,
> >
> >
> > No.
>
> Correct. However, the underlying type for wchar_t could be char.

Yes. And to be perfectly clear, we should distinguish between shared
representation and same type. It is required that char share the same
representation as either signed char or unsigned char; but the three
are always distinct types. In C++, wchar_t probably has the same
representation as one of the basic integer types, but is always a
distinct type. In C, wchar_t is always the same type as one of the
basic integer types.

And you wonder why beginning C/C++ programmers get confused sometimes.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alanw@itee.uq.edu.au ("alan whiteside")
Date: Mon, 9 Dec 2002 01:15:28 +0000 (UTC)
Raw View
I am looking at writing a library I want to use in different locales.

how well supported is the std::wstring lib?  (I gather this is the C++ was
of doing international  chars).  What is the standard char type ie. wchar ?

What support functions exist for working with wstring (ie. converting from
string to wstring etc) ?

Alan.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 9 Dec 2002 18:13:49 +0000 (UTC)
Raw View
alanw@itee.uq.edu.au ("alan whiteside") wrote in message
news:<1039391881.119001@bottle.itee.uq.edu.au>...
> I am looking at writing a library I want to use in different locales.

> how well supported is the std::wstring lib?  (I gather this is the C++
> was of doing international chars).

It should be present in any standard conformant compiler.  In fact, if
you can restrict yourself to the most recent versions of the compiler
used, you will probably find it.

> What is the standard char type ie. wchar ?

wchar_t.

> What support functions exist for working with wstring (ie. converting
> from string to wstring etc) ?

Not much.  Basically, the idea is to work only in wchar_t withing the
application, and do all of the conversion when reading and writing
files.  (Which, of course, is of limited utility if you happen to be
writing sockets.)  The actual logic of the conversion is in the facet
codecvt.  From what I understand, it isn't the easiest to use, nor the
best specified.  And it only takes pointers -- no iterators allowed.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net ("James Kuyper Jr.")
Date: Mon, 9 Dec 2002 18:31:25 +0000 (UTC)
Raw View
alan whiteside wrote:
> I am looking at writing a library I want to use in different locales.
>
> how well supported is the std::wstring lib?  (I gather this is the C++ was
> of doing international  chars).  What is the standard char type ie. wchar ?

The standard recognizes four character types: signed char, char,
unsigned char, and wchar_t. The latter is a typedef.

The standard provides a great deal of support for wchar_t. Almost every
string and I/O function that uses 'char' has a parallel function that
works with wchar_t. One major exceptions are the names of files, which
must always be provided as char strings.

However, the standard doesn't provide a great deal of support for
internationalization. It's perfectly legal for wchar_t to be a typedef
for an 8-bit char, interpreted only as an ASCII string. The only locales
required by the standard are the "C" locale and the default locale
(which might be the same locale).
On the flip side, it's also legal for 'char' to be an 32-bit type with
Unicode representation.


 > What support functions exist for working with wstring (ie. converting
from
 > string to wstring etc) ?

Conversion from string to wstring can occur indirectly by working
through char and wchar_t arrays, using the C standard library functions
that have been borrowed into the C++ standard library. The relevant ones
are mbstowcs() and wcstombs(). I thought there was a C++ interface that
did the same thing for strings, but I can't find it.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: hyrosen@mail.com (Hyman Rosen)
Date: Mon, 9 Dec 2002 20:10:14 +0000 (UTC)
Raw View
James Kuyper Jr. wrote:

> The standard recognizes four character types: signed char, char,
> unsigned char, and wchar_t. The latter is a typedef.


Incorrect. 3.9.1/5 states that wchar_t is a distinct type.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: homer@cqg.com ("Homer Meyer")
Date: Mon, 9 Dec 2002 22:14:40 +0000 (UTC)
Raw View
""James Kuyper Jr."" <kuyper@wizard.net> wrote in message
news:3DF4B653.6040209@wizard.net...
> alan whiteside wrote:
> > I am looking at writing a library I want to use in different locales.
> >
> > how well supported is the std::wstring lib?  (I gather this is the C++
was
> > of doing international  chars).  What is the standard char type ie.
wchar ?
>
> The standard recognizes four character types: signed char, char,
> unsigned char, and wchar_t. The latter is a typedef.

How can a C++ compiler have wchar_t as a typedef and still conform to
3.9.1/5 in the standard?

"Type wchar_t is a distinct type whose values can represent distinct codes
for all members of the largest extended character set specified among the
supported locales (22.1.1). Type wchar_t shall have the same size,
signedness, and alignment requirements (3.9) as one of the other integral
types, called its underlying type."



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news@evo6.com (Andy Sawyer)
Date: Mon, 9 Dec 2002 23:28:09 +0000 (UTC)
Raw View
In article <3DF4B653.6040209@wizard.net>,
 on Mon, 9 Dec 2002 18:31:25 +0000 (UTC),
 kuyper@wizard.net ("James Kuyper Jr.") wrote:

> alan whiteside wrote:
> > I am looking at writing a library I want to use in different locales.
> > how well supported is the std::wstring lib?  (I gather this is the
> > C++ was
>
> > of doing international  chars).  What is the standard char type ie. wchar ?
>
> The standard recognizes four character types: signed char, char,
> unsigned char, and wchar_t. The latter is a typedef.

Not in C++ it isn't - wchar_t is a fundamental type. 3.9.1
(Fundamental types), p5 states:
,----
| Type wchar_t is a distinct type whose values can represent distinct
| codes for all members of the largest extended character set specified
| among the supported locales (22.1.1). Type wchar_t shall have the same
| size, signedness, and alignment requirements (3.9) as one of the other
| integral types, called its underlying type.
`----

> It's perfectly legal for wchar_t to be a typedef for an 8-bit char,

No.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net ("James Kuyper Jr.")
Date: Tue, 10 Dec 2002 05:20:42 +0000 (UTC)
Raw View
Hyman Rosen wrote:
> James Kuyper Jr. wrote:
>
>> The standard recognizes four character types: signed char, char,
>> unsigned char, and wchar_t. The latter is a typedef.
>
>
>
> Incorrect. 3.9.1/5 states that wchar_t is a distinct type.

Sorry - that's another one of my errors from switching between
comp.std.c and comp.std.c++.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net ("James Kuyper Jr.")
Date: Tue, 10 Dec 2002 05:20:43 +0000 (UTC)
Raw View
Andy Sawyer wrote:
> In article <3DF4B653.6040209@wizard.net>,
>  on Mon, 9 Dec 2002 18:31:25 +0000 (UTC),
>  kuyper@wizard.net ("James Kuyper Jr.") wrote:
>
>
>>alan whiteside wrote:
>>
>>>I am looking at writing a library I want to use in different locales.
>>>how well supported is the std::wstring lib?  (I gather this is the
>>>C++ was
>>
>>>of doing international  chars).  What is the standard char type ie. wchar ?
>>
>>The standard recognizes four character types: signed char, char,
>>unsigned char, and wchar_t. The latter is a typedef.

I'm surprised that no one caught me on this one. I was mixing up two
different concepts of "character type". Technically, "signed char",
"char" and "unsigned char" are the only true character types as
described in 3.9.1p1, while wchar_t is a character type in the more
general sense described in 17.1.2.

Incidentally, in the nominally more general sense described in 17.1.2,
neither 'signed char' nor 'unsigned char' is guaranteed to be a
character type: the section 21, 22, and 27 templates aren't required to
have been specialized for those types.

> Not in C++ it isn't - wchar_t is a fundamental type. 3.9.1
> (Fundamental types), p5 states:
> ,----
> | Type wchar_t is a distinct type whose values can represent distinct
> | codes for all members of the largest extended character set specified
> | among the supported locales (22.1.1). Type wchar_t shall have the same
> | size, signedness, and alignment requirements (3.9) as one of the other
> | integral types, called its underlying type.
> `----

You're right, of course. I was thinking C99, which does not explicitly
require wchar_t to be a distinct type.

>>It's perfectly legal for wchar_t to be a typedef for an 8-bit char,
>
>
> No.

Correct. However, the underlying type for wchar_t could be char.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]