Topic: Locale, time, etc...
Author: Hubert HOLIN <hh@ArtQuest.fr>
Date: 1998/07/08 Raw View
Paris (U.E.), le 08/07/1998
[NOTE: this is not an attempt to fan the flames of related threads]
Finally looking at the "locale" facilities, I am under the impression
that we are victim of some ill-considered carry-overs from C.
First, there is the problem of "std::toupper" which returns just *one*
char (or wchar_t). As someone pointed out, for the german Eszet (sp?),
this is inconvenient as the uppercase is *two* "S" characters. We would
have avoided this problem if we had a corresponding standard function
taking a string as an argument, and returning a string as a result.
Second, locale is used to find, for instance, the time and date
formatting. There is a formatting function "strftime" whose first
argument (the destination) is a "char *". But if you have a chinese
locale (simplified or traditional), then you are out of luck as there is
no (to my knowledge) overload which takes a "basic_string<wchar_t>" as
destination argument. We might use ntmbs, but then it rules out using
unicode (I think...).
I know that "everything is frozen", for the time being, and that
Unicode is another standardization effort (as is Posix), but I believe
the need for some standard taking into account the above problems is
real. The question then is, what can we do?
Hubert Holin
Hubert.Holin@Bigfoot.com
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/07/08 Raw View
Hubert HOLIN <Hubert.Holin@Bigfoot.com> wrote:
> First, there is the problem of "std::toupper" which returns just *one*
>char (or wchar_t). As someone pointed out, for the german Eszet (sp?),
>this is inconvenient as the uppercase is *two* "S" characters. We would
>have avoided this problem if we had a corresponding standard function
>taking a string as an argument, and returning a string as a result.
std::toupper is there purely for compatibility with C, so changing
that interface would have been pointless.
The corresponding std::ctype<>::toupper might have had a different
interface and supported German and similar semantics, but the design
criterion for those facets was to apply the information already
available in standard locale files, which doesn't include any
information about variable-length conversions, nor many of the
predicates defined for Unicode. It would have been silly to specify
semantics that couldn't actually be implemented on real systems
because no locale file represents them.
Probably at the next standard we will need another facet with
an interface appropriate for locale needs recognized since C
was standardized in 1989. (Things happen with glacial slowness
in the internationalization business.) It would be better simply
to let Posix or somebody just define facets themselves, and leave
ISO out of it.
The standard locale has other omissions. Probably the most glaring
is good support for time zones. Unfortunately the C committee seems
to be breaking C's library further, in that area.
> Second, locale is used to find, for instance, the time and date
>formatting. There is a formatting function "strftime" whose first
>argument (the destination) is a "char *". But if you have a chinese
>locale (simplified or traditional), then you are out of luck as there is
>no (to my knowledge) overload which takes a "basic_string<wchar_t>" as
>destination argument. We might use ntmbs, but then it rules out using
>unicode (I think...).
Again, strftime is a C library function. The locale facet
std::time_put<wchar_t> sounds like what you want. See the August
issue of Dr. Dobb's Journal, or my web page
http://www.cantrip.org/locale.html
for detailed examples.
> I know that "everything is frozen", for the time being, and that
>Unicode is another standardization effort (as is Posix), but I believe
>the need for some standard taking into account the above problems is
>real. The question then is, what can we do?
One way is to persuade the other standardization groups to recognize
the existence of C++, and specify or propose facets of their own.
The C++ locale is deliberately extensible, so there is little reason
for the ISO C++ committee to be in the business of designing facets
not needed by the rest of its own library.
--
Nathan Myers
ncm@nospam.cantrip.org http://www.cantrip.org/
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: 1998/07/08 Raw View
Hubert HOLIN <hh@ArtQuest.fr> wrote in article <35A391D2.13B550A4@ArtQuest.fr>...
> Finally looking at the "locale" facilities, I am under the impression
> that we are victim of some ill-considered carry-overs from C.
C++ locales are indeed based on C locales.
> First, there is the problem of "std::toupper" which returns just *one*
> char (or wchar_t). As someone pointed out, for the german Eszet (sp?),
> this is inconvenient as the uppercase is *two* "S" characters. We would
> have avoided this problem if we had a corresponding standard function
> taking a string as an argument, and returning a string as a result.
True enough. The concept of upper/case is a much simpler one in English
than in practically any other language. Java adds title case (to handle a grand
total of four special cases), but it is otherwise not much wiser.
> Second, locale is used to find, for instance, the time and date
> formatting. There is a formatting function "strftime" whose first
> argument (the destination) is a "char *". But if you have a chinese
> locale (simplified or traditional), then you are out of luck as there is
> no (to my knowledge) overload which takes a "basic_string<wchar_t>" as
> destination argument. We might use ntmbs, but then it rules out using
> unicode (I think...).
In C++ you should use the facet time_put<wchar_t>, which generates
a sequence of wchar_t elements to format a time. How it does so using
strftime (or whatever) is up to the implementor.
> I know that "everything is frozen", for the time being, and that
> Unicode is another standardization effort (as is Posix), but I believe
> the need for some standard taking into account the above problems is
> real. The question then is, what can we do?
a) Use what's there, which ain't bad.
b) Don't hope that Standard C or C++ will handle all your I18N problems
for you.
c) Don't put too much faith in Unicode.
d) Complain to Sun. They still haven't submitted the final Java standard.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/07/09 Raw View
P.J. Plauger<pjp@dinkumware.com> wrote:
>Hubert HOLIN <hh@ArtQuest.fr> wrote in article <35A391D2.13B550A4@ArtQuest.fr>...
>> Finally looking at the "locale" facilities, I am under the impression
>> that we are victim of some ill-considered carry-overs from C.
>
>C++ locales are indeed based on C locales.
This statement is not correct.
C++ locales are in fact *not* based on C locales.
The C++ locale implements the semantics supportable with locale
description files commonly available, and standardized by POSIX.
The standard was affected by the C locale requirements; this is
the only connection between the C++ locale and the C locale.
--
Nathan Myers
ncm@nospam.cantrip.org http://www.cantrip.org/
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Hubert HOLIN <hh@ArtQuest.fr>
Date: 1998/07/09 Raw View
Paris (U.E.), le 09/07/1998
P.J. Plauger wrote:
>
> Hubert HOLIN <hh@ArtQuest.fr> wrote in article <35A391D2.13B550A4@ArtQuest.fr>...
[SNIP]
> In C++ you should use the facet time_put<wchar_t>, which generates
> a sequence of wchar_t elements to format a time. How it does so using
> strftime (or whatever) is up to the implementor.
That's good to know. I overlooked this one.
[SNIP]
> a) Use what's there, which ain't bad.
...unless you get caught by something which "what's there" can't
handle.
> b) Don't hope that Standard C or C++ will handle all your I18N problems
> for you.
That's perfectly reasonable! I just hope that the correct (standard)
hooks are in place.
> c) Don't put too much faith in Unicode.
Why, specifically? Unicode did make some un-optimal decisions,
especially concerning Chinese, but some things do get solved. And it is
a standard which is gaining (IMHO) acceptance, if measured by the number
of platforms which support or have announced they are in the process of
supporting it.
> d) Complain to Sun. They still haven't submitted the final Java standard.
I couldn't care less about Java...
Hubert Holin
Hubert.Holin@Bigfoot.com
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]