Topic: std::tolower - which one should be visible?


Author: Dave Thompson <david.thompson1@worldnet.att.net>
Date: Mon, 1 Aug 2005 01:27:34 CST
Raw View
On Mon, 25 Jul 2005 16:06:40 CST, Maciej Sobczak <no.spam@no.spam.com>
wrote:

> kanze@gabi-soft.fr wrote:
>
> > Except that calling the standard C version of tolower with a
> > char argument is undefined behavior, of course.
>
> I don't understand this assertion. The argument of C version of tolower
> has to be representable by unsigned char or equal to EOF. Staying within
> the ASCII range [0..127], all character codes seem to fit this requirement.
>
> What's the real problem?

ASCII or slightly more generally 0..127 is OK, but there was nothing
above AFAICS saying the data is so restricted, nor is it in general.
"Plain" char in C and C++ can be signed or unsigned at the
implementation's option; _if_ it is signed _and_ any actual value
stored is negative (and != EOF) that's the problem.

The conventional solution on comp.lang.c is to cast the char value to
unsigned char, or to use a pointer to unsigned char. The latter is
less convenient with std::string, so the best solution is probably

inline char my_tolower (char x) /* or scoped:: as you wish */
{ return std::tolower( static_cast<unsigned char>(x) ); }


- David.Thompson1 at worldnet.att.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.fr
Date: Mon, 1 Aug 2005 12:24:31 CST
Raw View
Maciej Sobczak wrote:
> kanze@gabi-soft.fr wrote:

> > Except that calling the standard C version of tolower with a
> > char argument is undefined behavior, of course.

> I don't understand this assertion. The argument of C version
> of tolower has to be representable by unsigned char or equal
> to EOF. Staying within the ASCII range [0..127], all character
> codes seem to fit this requirement.

> What's the real problem?

The fact that char does not guarantee that its value is in ASCII
range.  The fact that it almost never is in practice, even when
reading supposedly text data.

The usual solution in C is to cast the char to unsigned char
before calling any of the functions in ctype.h.  The preferred
solution in C++ is, of course, to use one of the functions in
<locale>.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: no.spam@no.spam.com (Maciej Sobczak)
Date: Fri, 22 Jul 2005 15:42:30 GMT
Raw View
Hello,

#include <string>
#include <algorithm>
#include <cctype>

// later:

string a, b;
// ...
transform(a.begin(), a.end(), b.begin(), std::tolower);

Taking into account what was #included, the std::tolower should name a
single-parameter function from the standard C library, so that the above
code should work.

On the other hand, the <locale> header is allowed to be *implicitly*
included (for example by <string>), meaning that std::tolower could
possibly name also the two-parameter version that is locale-aware,
rendering the above ambiguous and ill-formed.

Is this considered to be a problem?

It can be solved by casting to the requested function type, but the
issue is whether the above code is *guaranteed* to work or not.


--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.fr
Date: 25 Jul 2005 19:10:21 GMT
Raw View
Maciej Sobczak wrote:

> #include <string>
> #include <algorithm>
> #include <cctype>

> // later:

> string a, b;
> // ...
> transform(a.begin(), a.end(), b.begin(), std::tolower);

> Taking into account what was #included, the std::tolower
> should name a single-parameter function from the standard C
> library, so that the above code should work.

Except that calling the standard C version of tolower with a
char argument is undefined behavior, of course.

> On the other hand, the <locale> header is allowed to be
> *implicitly* included (for example by <string>), meaning that
> std::tolower could possibly name also the two-parameter
> version that is locale-aware, rendering the above ambiguous
> and ill-formed.

> Is this considered to be a problem?

I imagine that it depends on who you ask.  Given that neither
the C tolower nor those in <locale> can be used directly, it
probably doesn't matter.

> It can be solved by casting to the requested function type,
> but the issue is whether the above code is *guaranteed* to
> work or not.

It is certainly not guaranteed to work, or even compile.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Maciej Sobczak <no.spam@no.spam.com>
Date: Mon, 25 Jul 2005 16:06:40 CST
Raw View
kanze@gabi-soft.fr wrote:

> Except that calling the standard C version of tolower with a
> char argument is undefined behavior, of course.

I don't understand this assertion. The argument of C version of tolower
has to be representable by unsigned char or equal to EOF. Staying within
the ASCII range [0..127], all character codes seem to fit this requirement.

What's the real problem?


--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]