Topic: locale-specific std::transform / supported locale list


Author: "Alexis Christoforides" <alexischr@yahoo.com>
Date: Mon, 22 Apr 2002 19:31:23 GMT
Raw View
What is the *proper* (or best) way to std::transform an international string
using the the Standard C++ Library ?
I understand the tolowering every char with an iterator will not yield
correct results.
I've tried std::transform which in this form:

transform(sbuf.begin(),sbuf.end(),sbuf.begin(),tolower);

which would compile, but is not locale-dependent. Then I tried:

 transform(sbuf.begin(),sbuf.end(),sbuf.begin(),bind2nd(tolower,state.loc));

(where state.loc is a std::locale)

which wouldn't compile, no matter what (complains about the first argument).
I am using MSVC++ 7 and the program should also compile on g++.

Is this the right way to do it? How correctly will it handle idiosyncrasies
in European languages?
Is there a better way to do it? I'm open to suggestion.

Can anyone also supply (or point to) a list of standard locales for
MSVC++/Win32 and g++/linux ? Are there any
common ones (besides "C")?


--
Alexis Christoforides                 | alexischr@
<http://alexischr.port5.com>   |  softhome.net




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Tue, 23 Apr 2002 08:29:06 GMT
Raw View
"Alexis Christoforides" <alexischr@yahoo.com> writes:

|>  What is the *proper* (or best) way to std::transform an
|>  international string using the the Standard C++ Library ?

Transform in what way?

|>  I understand the tolowering every char with an iterator will not
|>  yield correct results.

There is no way that tolower (or at least toupper) can yield correct
results.

|>  I've tried std::transform which in this form:

|>  transform(sbuf.begin(),sbuf.end(),sbuf.begin(),tolower);

|>  which would compile, but is not locale-dependent.

If it is not locale-dependant, the implementation is broken.

Whether it compiles or not depends on which headers are included.  If
<locale> is included, it shouldn't compile.  And since you have
obviously included <algorithm>, you don't know whether <locale> is
included or not.

|>  Then I tried:

|>   transform(sbuf.begin(),sbuf.end(),sbuf.begin(),bind2nd(tolower,state=
.loc));

|>  (where state.loc is a std::locale)

|>  which wouldn't compile, no matter what (complains about the first
|>  argument).

The first argument to what?  (bind2nd, I hope, and not transform.)

The first problem is that bind2nd can't be used with a pointer to a
function.  You need an adaptor, normally created with
ptr_fun. However, this won't work either, because tolower is
overloaded, and the compiler doesn't know which one to chose.  You can
force a choice by means of a static_cast, but frankly, for a one time
use, I'd just use a for loop.  If the situation occurs often, it might
be worth creating your own functional object.

You might also want to consider the toupper and tolower functions in
the ctype facet.  Except that they only work on char*, and not on
iterators into containers.

Of course, none of this solves the fundamental problem that toupper
and tolower don't work, and cannot be made to work with their current
signatures.

|>  I am using MSVC++ 7 and the program should also compile on g++.

|>  Is this the right way to do it? How correctly will it handle
|>  idiosyncrasies in European languages?

Incorrectly.

The basic problem is that converting a character to upper case may
result in more than one character.  The classic example is the German
'=DF', which in upper case should be "SS".  Except where this would
result in an ambigu=EFty with another word, which contains "ss"; in such
a case (rare), the upper case should be "SZ".  (A lot of software, and
a lot of Germans, ignore this distinction, and always use "SS".)

The result is that any attempt to change case "in place" is bound to
fail, as is any function which takes a char and returns a char.

|>  Is there a better way to do it? I'm open to suggestion.

About the only real solution is to keep everything in mixed case.

|>  Can anyone also supply (or point to) a list of standard locales
|>  for MSVC++/Win32 and g++/linux ? Are there any common ones
|>  (besides "C")?

The standard only requires "C".  At least in g++ version 3.0, that was
also the only locale available -- earlier versions had more, but they
only supported the C style locales.

I know that Dinkumware (the authors of the VC++ library) have a large
number of locales, but I don't know how many are delivered with the
compiler.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]