Topic: Why no case conversion in basic_string<>


Author: Dietmar Kuehl <dietmar.kuehl@claas-solutions.de>
Date: 1999/05/31
Raw View
Hi,
In article <7iu4lk$j62$1@bunyip.cc.uq.edu.au>,
  "Rover" <Rover@pobox.com> wrote:
> Does any know why there is an omission of case conversion functions
> in the STL's basic_string<>?

I can't tell for sure but here are some possible reasons:
- It wasn't in the proposal which was finally adopted and it wasn't
  there because the interface of 'basic_string' is already big enough.
- Case conversion needs some context, namely the locale in which the
  conversion has to appear. Which locale should be choosen?
- The C/C++ model of case conversion is broken anyway: For most
  characters there is only one case anyway (namely all the Chinese and
  Japanese letters have just one case), for some languages there are
  apparently more than two cases (I heard this from someone whom I trust
  in this respect but I haven't verified it and I can't name one), for
  some characters it is not possible to convert them from upper to lower
  case but only in the opposite direction (eg. accented French letters),
  and some letters become more than one letter as the result of a case
  conversion (eg. the German "sz").
- It is already possible to change the case of the string using several
  different methods with the current standard library.

BTW, 'basic_string' is *NOT* part of the STL, although it is part of the
C++ standard library: STL is just a part of the C++ Standard Library,
namely the stuff described in the chapters about containers, algorithms,
and iterator plus a few other things (like eg. the allocators and the
functional stuff).

> How does one perform case conversion in STL basic_string<>?

Here are two approaches to capitalize a string:

  // the STLified C approach (using the global C locale object)
  std::transform(str.begin(), str.end(), str.begin(), toupper);

  // the C++ approach using the global C++ locale object
  std::ctype<char> const& ct =
    std::use_facet<std::ctype<char> >(std::locale());
  for (std::string::iterator it = str.begin(); it != str.end(); ++it)
    *it = ct.toupper(*it);

It should be possible to use 'transform()' in the second case, too, but
I haven't figured this one out yet... Potentially, some extra adaptors
which take an object and a member function to form a function are
necessary.
--
<mailto:dietmar.kuehl@claas-solutions.de>
homepage: <http://www.informatik.uni-konstanz.de/~kuehl>


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: James.Kanze@dresdner-bank.com
Date: 1999/06/01
Raw View
In article <7iudn8$i1v$1@nnrp1.deja.com>,
  Dietmar Kuehl <dietmar.kuehl@claas-solutions.de> wrote:
>
> Hi,
> In article <7iu4lk$j62$1@bunyip.cc.uq.edu.au>,
>   "Rover" <Rover@pobox.com> wrote:
> > Does any know why there is an omission of case conversion functions
> > in the STL's basic_string<>?
>
> I can't tell for sure but here are some possible reasons:
> - It wasn't in the proposal which was finally adopted and it wasn't
>   there because the interface of 'basic_string' is already big enough.
> - Case conversion needs some context, namely the locale in which the
>   conversion has to appear. Which locale should be choosen?
> - The C/C++ model of case conversion is broken anyway: For most
>   characters there is only one case anyway (namely all the Chinese and
>   Japanese letters have just one case), for some languages there are
>   apparently more than two cases (I heard this from someone whom I
trust
>   in this respect but I haven't verified it and I can't name one), for
>   some characters it is not possible to convert them from upper to
lower
>   case but only in the opposite direction (eg. accented French
letters),
>   and some letters become more than one letter as the result of a case
>   conversion (eg. the German "sz").

It may have been from me that you heard it.  I've been trying
unsuccessfully to track down my reference; the basic idea is that there
are letters which only exist as small letters, and are replaced by a two
letter sequence as capitals.  Sort of like the Swiss German:

    small letters:         ber
    title case:         Ueber
    all caps:           UEBER

The obvious problem is the translation of    .  Some languages will
(unlike German) consider that the result of this translation is still a
single letter, although it may appear as two to us.  (Note that
TeX/LaTeX define a single letter SS, to be used as the result of
converting     to caps.)

I don't know if the languages involved actually define formally three
cases, or if it is just a computer typesetters convenience.

Whatever the actual status, of course, it does show why the C/C++ model
is broken.  Any conversion routine must treat strings, and not
individual letters.

> - It is already possible to change the case of the string using several
>   different methods with the current standard library.
>
> BTW, 'basic_string' is *NOT* part of the STL, although it is part of the
> C++ standard library: STL is just a part of the C++ Standard Library,
> namely the stuff described in the chapters about containers, algorithms,
> and iterator plus a few other things (like eg. the allocators and the
> functional stuff).
>
> > How does one perform case conversion in STL basic_string<>?
>
> Here are two approaches to capitalize a string:
>
>   // the STLified C approach (using the global C locale object)
>   std::transform(str.begin(), str.end(), str.begin(), toupper);

Don't do this; it invokes undefined behavior.  With most
implementations, it won't format your hard disk, but it *will* generally
give random results when presented anything other than US ASCII.  The
last parameter *must* be something along the lines bind_2nd( toupper ,
locale() ).  Except that as written, the "toupper" is ambiguous; you
need a cast or something to specify which function you want.  Or
alternatively, your own toupper:

    char
    toupper( char ch )
    {
        return std::toupper( (unsigned char)( ch ) ) ;
    }

Even then, this solution is only valid for a limited number of
languages.  Even Dietmar's own German isn't one.

>   // the C++ approach using the global C++ locale object
>   std::ctype<char> const& ct =
>     std::use_facet<std::ctype<char> >(std::locale());
>   for (std::string::iterator it = str.begin(); it != str.end(); ++it)
>     *it = ct.toupper(*it);
>
> It should be possible to use 'transform()' in the second case, too,
but
> I haven't figured this one out yet... Potentially, some extra adaptors
> which take an object and a member function to form a function are
> necessary.

This solution suffers from the same problem as the previous: it tries to
convert one character at a time, which just isn't possible.

--
James Kanze                         mailto:
James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter
Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86
27


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: "Rover" <Rover@pobox.com>
Date: 1999/05/31
Raw View
Hi,

Does any know why there is an omission of case conversion functions in the
STL's basic_string<>?

How does one perform case conversion in STL basic_string<>?

Rover.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]