Topic: Traits::compare vs signed char type vs memcmp


Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Mon, 17 May 2004 04:04:53 +0000 (UTC)
Raw View
AlbertoBarbati@libero.it (Alberto Barbati) writes:

| Gabriel Dos Reis wrote:
| > Approaching this signedness issue with a list of special cases,
| > inconsistency, and recommandation for unnecessary ugly cast use is
| > one of the most curious solution I've found proposed to this
| > problem.
|
| When did I propose a "list" of special cases, to address the
| signedness issue?

Did I name *you*?  No.

On the technical point.  The signedness of "char" is needlessly
causing more troubles than it should.

  (1) Proper use of classification functions in <cctype> needs cast to
      unsigned char.

  (2) It makes std::vector<char> not effective replacement of
      std::string, when implementors decide to implement the latter in
      very interesting way that makes its use more inefficient that it
      should be.  Proposed solution? (Use explicit comparator).


| What I proposed is just a change in the
| (over-)specification of char_traits<char>::lt. A very precise and
| delimited point, which is indeed just a library issue. I am not trying
| to address the dilemma of char signedness, which is a core language
| issue instead.

Yes, that is in the tradition of of solving a fundamental issue with a
list of special case "recommendations".

[...]

| About the casts, yes, I am recommending them because I do care about
| consistency with the C library. However, if you really don't like
| casts, there is an even simpler way:
|
| Replace in 21.1.3.1/6 the sentence:
|
| "The two-argument members assign, eq, and lt are defined identically
| to the built-in operators =, ==, and < respectively."
|
| with
|
| "The two-argument members assign and eq are defined identically to the
| built-in operators = and == respectively."

You missed the point.  The use of cast I'm talking about is not just in
implementors files (which does not bother me because they are doing
unusual things anyway).  I'm concerned with *user* codes.  Because of
the fuzzyness of the sign of char, users have to use cast just to
safely use classification functions.  That is turning logic upside
down.

| So simple!

No.  See above.

| In fact, who cares about how lt() is implemented? If you
| are writing generic code, you shouldn't. Period. In that case, what's
| important it's table 37 and the consistency with compare(). If you are
| writing code specialized for char_traits<char> you shouldn't either,
| as you could use < in the first place.

When the focus is only on lt(), one fails to appreciate the root of
the issue.  Once, the view is broaden to understand the core issue, it
appears that the above does not make much sense.

| To Gabriel, from your reply it seems that you are aware of other
| proposed solutions to this issue. I am interested in reading
| them. Could you state them more explictly or post a pointer to them?

I believe I've done so:  Effectively implement char as unsigned.

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Fri, 14 May 2004 08:05:11 +0000 (UTC)
Raw View
llewelly.at@xmission.dot.com (llewelly) writes:

[...]

| It seems there is no really good way out. Except for an implementation
|     make char unsigned?

Personally, I think that is a better and more robust out.
Remember last time you explained people why when calling a classifying
function <cctype>, they should cast it to unsigned char?

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Fri, 14 May 2004 15:01:43 +0000 (UTC)
Raw View
mckelvey@maskull.com ("James W. McKelvey") writes:

[...]

| This is the way I approach it: C++ is trying to be better than C; it

That is at least the way I see it.  But the community is large and
some people think that C++ should just carry over errors from old days
and aggrave them :-(

I think the C++ programmer who really wants to use memcmp() on
std::string can just do that.  That should not prevent us from
correcting past errors.  The proposed resolution does not seem to me
to be an improvement over existing errors.

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Fri, 14 May 2004 15:02:12 +0000 (UTC)
Raw View
pcarlini@suse.de (Paolo Carlini) writes:

| This is all very reasonable. My personal point of view is very
| similar. However, remember that C++ was born to be as compatible as
| possible
| with C and that there is an ongoing effort to *improve* that.

If the point was to have C-strings, then they were already there.

| Also, the established behavior in this area is very strong and
| pervasive. Not considering the performance side of the issue...

The "performance argument" is a fallacy:  A "signed compare()" can
also be crafted in assembly if needed.

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Fri, 14 May 2004 16:44:35 +0000 (UTC)
Raw View
mckelvey@maskull.com ("James W. McKelvey") writes:

[...]

| > Any comment?
|
| Yes, your DR should be rejected. But if logical consistency has to be
| jettisoned, it's probably the best way to do it.
|
| Accepting it will give people one more reason to laugh at C++.

I entirely agree with the general sentiment eyou expressed ven though
I would not have put it that way.
Approaching this signedness issue with a list of special cases,
inconsistency, and recommandation for unnecessary ugly cast use is
one of the most curious solution I've found proposed to this problem.

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: llewelly <llewelly.at@xmission.dot.com>
Date: Fri, 14 May 2004 12:32:34 CST
Raw View
gdr@cs.tamu.edu (Gabriel Dos Reis) writes:

> llewelly.at@xmission.dot.com (llewelly) writes:
>
> [...]
>
> | It seems there is no really good way out. Except for an implementation
> |     make char unsigned?
>
> Personally, I think that is a better and more robust out.
> Remember last time you explained people why when calling a classifying
> function <cctype>, they should cast it to unsigned char?
[snip]

I think the result was confusion. The fact that we were using an
    implementation with signed char, but all the classifying
    functions returned the same value for [SCHAR_MIN,0) whether cast
    to unsigned char or not, didn't help. I didn't think of trying a
    different locale until later.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Fri, 14 May 2004 22:51:28 +0000 (UTC)
Raw View
AlbertoBarbati@libero.it (Alberto Barbati) writes:

[...]

| 3) because of point 2 and efficiency issues, at least three library

I think "efficiency" as used in this thread is a fallacy.  A
"signed  memcmp" can be efficiently implemented as required by the
standard, too.

I believe a more consistent and better, in terms of implementation,
approach to this issue is to effectively implement plain char as
unsigned.  That has the benefit of removing unnecessary ugly casts
when you make call to classifying functions in <cctype> for example.
It makes effective current advice to use std::vector<char> where some
implementations have implemented std::string in an interesting way
that makes its use much less attractive than it should be (e.g. COW
for example).

But I can see why people would like to resolve this issue with a long
list of special cases, instead of a simple rule <g>.

[...]

| Any comment?

See above.

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: gdr@cs.tamu.edu (Gabriel Dos Reis)
Date: Fri, 14 May 2004 22:51:29 +0000 (UTC)
Raw View
mckelvey@maskull.com ("James W. McKelvey") writes:

| Paolo Carlini wrote:
| >
| > Alberto Barbati wrote:
| > > However, 21.1.3.1/6 explicitly states that lt() must be identical to the
| > > built-in operator "<" (without casts) :-(
| >
| > Exactly, :-(
| >
| > > OTOH, a DR could be issued with one of two possible solutions:
| > >
| > > 1) relax the semantic of compare() (at least for the char and wchar_t
| > > specialization)
| > > 2) remove the restriction on lt() for char and wchar_t specialization
| >
| > In my personal opinion a DR is in order: personally, I find unreasonable
| > being forced by the C++ standard to compare /the very same/ char type as
| > signed whereas all the C standard functions compare it as unsigned.
|
| Why then allow char to be signed at all, if the signs are to be ignored?

Good question.  And this is not the only case where we get that char
signedness issue.  I would rather the problem be addressed by having
implementation effectively implement char as unsigned.

This fuzziness about char does not seem to bring real benefits.  Only
problems.  Solving related issues by list of special rules would be
curious.

--
                                                        Gabriel Dos Reis
                                                         gdr@cs.tamu.edu
  Texas A&M University -- Computer Science Department
 301, Bright Building -- College Station, TX 77843-3112

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: llewelly.at@xmission.dot.com (llewelly)
Date: Sat, 15 May 2004 03:10:14 +0000 (UTC)
Raw View
gdr@cs.tamu.edu (Gabriel Dos Reis) writes:

> llewelly.at@xmission.dot.com (llewelly) writes:
>
> [...]
>
> | It seems there is no really good way out. Except for an implementation
> |     make char unsigned?
>
> Personally, I think that is a better and more robust out.
> Remember last time you explained people why when calling a classifying
> function <cctype>, they should cast it to unsigned char?

I think confusion was the result. It didn't help that though we were
    using an implementation where char was signed, all the classifying
    functions returned the same value for all of [SCHAR_MIN,0),
    whether it was cast to unsigned char or not. We only had the C
    locale installed, so I couldn't change the locale to one where
    problems arose.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: AlbertoBarbati@libero.it (Alberto Barbati)
Date: Sat, 15 May 2004 03:12:46 +0000 (UTC)
Raw View
Gabriel Dos Reis wrote:
>
> Approaching this signedness issue with a list of special cases,
> inconsistency, and recommandation for unnecessary ugly cast use is
> one of the most curious solution I've found proposed to this problem.
>

When did I propose a "list" of special cases, to address the signedness
issue? What I proposed is just a change in the (over-)specification of
char_traits<char>::lt. A very precise and delimited point, which is
indeed just a library issue. I am not trying to address the dilemma of
char signedness, which is a core language issue instead.

I just stressed an *existing practice* of library implementors to favor
consistency between char_traits<char>::compare and strcmp/memcmp even it
that means violating the semantic of table 37. They clearly must have a
good reason to do that. Is there any library implementor reading who
could comment on this?

BTW, I did a grep on those three implementations and found that function
char_traits<>::lt is never used in any of the three. I bet neither the
people who would find my proposal "one more reason to laugh at C++" have
ever used it. Anyway, I expect serious library implementors to be much
more pragmatic and to have a different reaction.

About the casts, yes, I am recommending them because I do care about
consistency with the C library. However, if you really don't like casts,
there is an even simpler way:

Replace in 21.1.3.1/6 the sentence:

"The two-argument members assign, eq, and lt are defined identically to
the built-in operators =, ==, and < respectively."

with

"The two-argument members assign and eq are defined identically to the
built-in operators = and == respectively."

So simple! In fact, who cares about how lt() is implemented? If you are
writing generic code, you shouldn't. Period. In that case, what's
important it's table 37 and the consistency with compare(). If you are
writing code specialized for char_traits<char> you shouldn't either, as
you could use < in the first place.

To Gabriel, from your reply it seems that you are aware of other
proposed solutions to this issue. I am interested in reading them. Could
you state them more explictly or post a pointer to them?

Alberto

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: llewelly.at@xmission.dot.com (llewelly)
Date: Fri, 7 May 2004 16:24:55 +0000 (UTC)
Raw View
pcarlini@suse.de (Paolo Carlini) writes:

> Alberto Barbati wrote:
>> However, 21.1.3.1/6 explicitly states that lt() must be identical to
>> the built-in operator "<" (without casts) :-(
>
> Exactly, :-(
>
>> OTOH, a DR could be issued with one of two possible solutions:
>> 1) relax the semantic of compare() (at least for the char and
>> wchar_t specialization)
>> 2) remove the restriction on lt() for char and wchar_t specialization
>
> In my personal opinion a DR is in order: personally, I find unreasonable
> being forced by the C++ standard to compare /the very same/ char type as
> signed whereas all the C standard functions compare it as unsigned.
[snip]

But from a user perspective it is unreasonable that the standard
    allow lt() and compare() to be inconsistent.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: mckelvey@maskull.com ("James W. McKelvey")
Date: Fri, 7 May 2004 16:24:56 +0000 (UTC)
Raw View
Paolo Carlini wrote:
>
> Alberto Barbati wrote:
> > However, 21.1.3.1/6 explicitly states that lt() must be identical to the
> > built-in operator "<" (without casts) :-(
>
> Exactly, :-(
>
> > OTOH, a DR could be issued with one of two possible solutions:
> >
> > 1) relax the semantic of compare() (at least for the char and wchar_t
> > specialization)
> > 2) remove the restriction on lt() for char and wchar_t specialization
>
> In my personal opinion a DR is in order: personally, I find unreasonable
> being forced by the C++ standard to compare /the very same/ char type as
> signed whereas all the C standard functions compare it as unsigned.

Why then allow char to be signed at all, if the signs are to be ignored?
The C standard functions are irrelevant here anyway.

--
What portion in the world can the artist have
Who has awakened from the common dream
But dissipation and despair?
-- William Butler Yeats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net (James Kuyper)
Date: Fri, 7 May 2004 16:52:49 +0000 (UTC)
Raw View
AlbertoBarbati@libero.it (Alberto Barbati) wrote in message news:<dXzmc.41690$Qc.1633658@twister1.libero.it>...
> James Kuyper wrote:
> > AlbertoBarbati@libero.it (Alberto Barbati) wrote in message news:<u3gmc.170690$Kc3.5429610@twister2.libero.it>...
> >
> >>According to my intepretation of table 37 alone, it would be perfectly
> >>reasonable to define char_traits<char>::lt like this:
> >>
> >>bool char_traits<char>::lt(char c, char d)
> >>{
> >>   return static_cast<unsigned char>(c) < static_cast<unsigned char>(d);
> >>}
> >
> > With that definition, if c is negative and d is not, then
> > std::char_traits<char>::lt(c,d) return false. How would you argue that
> > this counts as a conforming implementation of the specification that
> > it "yields: whether c is to be treated as less than d"?
>
> The fact is that the numeric values of the characters have nothing to do
> with the notion of "order". For example, it is perfectly reasonable to
> consider character U+A0 NO-BREAK SPACE "greater" than U+41 LATIN CAPITAL
> LETTER A, despite the fact that the first is stored with negative value
> and the second with a positive one (assuming char is 8-bit and signed).

What you're talking about is a locale-specific feature that is covered
by the collate locale category. That's not what the 'lt' member of
char_traits is for.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Fri, 7 May 2004 16:53:08 +0000 (UTC)
Raw View
llewelly wrote:
> But from a user perspective it is unreasonable that the standard
>     allow lt() and compare() to be inconsistent.

Quite to the contrary, the goal is consistency between lt() and
compare(): it could be easily achieved in many ways, for instance
as sketched by Alberto.

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Fri, 7 May 2004 16:53:20 +0000 (UTC)
Raw View
James W. McKelvey wrote:
> Why then allow char to be signed at all, if the signs are to be ignored?

The same could be asked in the case of the C standard, however.

> The C standard functions are irrelevant here anyway.

Well, do you really think so? Do you like comparing two char arrays
via memcmp then passing the arrays to a basic_string constructor and
"voila'" starting comparing /the very same/ data in a different way?

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Fri, 7 May 2004 18:06:55 +0000 (UTC)
Raw View
llewelly wrote:
> But from a user perspective it is unreasonable that the standard
>     allow lt() and compare() to be inconsistent.

Quite to the contrary, the goal is consistency between lt() and
compare(): it could be easily achieved in many ways, for instance
as sketched by Alberto.

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: llewelly.at@xmission.dot.com (llewelly)
Date: Fri, 7 May 2004 21:06:55 +0000 (UTC)
Raw View
pcarlini@suse.de (Paolo Carlini) writes:

> llewelly wrote:
>> But from a user perspective it is unreasonable that the standard
>>     allow lt() and compare() to be inconsistent.
>
> Quite to the contrary,

Not contrary at all.

> the goal is consistency between lt() and
> compare():

This is what I want to hear, thank you.

> it could be easily achieved in many ways, for instance
> as sketched by Alberto.

Alberto made two suggestions. Only the second seems to guarantee
    consistency between lt() and compare().

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Fri, 7 May 2004 21:37:31 +0000 (UTC)
Raw View
llewelly wrote:
> Alberto made two suggestions. Only the second seems to guarantee
>     consistency between lt() and compare().

Indeed, you are right. Actually, the plain char case is the one
really troublesome, since, for wchar_t, C99, 7.24.4.4,p1 holds.

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: mckelvey@maskull.com ("James W. McKelvey")
Date: Sun, 9 May 2004 03:29:17 +0000 (UTC)
Raw View
Paolo Carlini wrote:
>
> James W. McKelvey wrote:
> > Why then allow char to be signed at all, if the signs are to be ignored?
>
> The same could be asked in the case of the C standard, however.
>
> > The C standard functions are irrelevant here anyway.
>
> Well, do you really think so? Do you like comparing two char arrays
> via memcmp then passing the arrays to a basic_string constructor and
> "voila'" starting comparing /the very same/ data in a different way?
>

Suppose I use memcmp to compare SIGNED char arrays; the same arrays
passed to a basic_string constructor will compare the very same data in
a different way. Are you saying that you want
all char types to compare as if unsigned?

But memcmp doesn't work with chars anyway. It works with unsigned bytes;
so if I had to use it, I would be very careful and would know the
limitations.

std::string and char_traits are not C, but C++, and can do whatever they
are defined to do. If there is a requirement that they follow C in some
manner, I don't see it in the standard.

memcmp and strcmp are C functions that do something similar to C++
functionality; they are not exact replacements, and must be used with
care.

This is the way I approach it: C++ is trying to be better than C; it
can't do that if everything has to be done the way C does things.

Jim


--
What portion in the world can the artist have
Who has awakened from the common dream
But dissipation and despair?
-- William Butler Yeats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Sun, 9 May 2004 18:43:04 +0000 (UTC)
Raw View
Hi,

James W. McKelvey wrote:

> But memcmp doesn't work with chars anyway. It works with unsigned bytes;
> so if I had to use it, I would be very careful and would know the
> limitations.

This one *could* be a very good point ;)

Unfortunately, the C Standard *always* talks about *characters* in
7.21.4...

> std::string and char_traits are not C, but C++, and can do whatever they
> are defined to do. If there is a requirement that they follow C in some
> manner, I don't see it in the standard.

Interestingly for the historian ;) the 1995 draft had the requirement
that memcmp be used in the implementation of compare.

> memcmp and strcmp are C functions that do something similar to C++
> functionality; they are not exact replacements, and must be used with
> care.
>
> This is the way I approach it: C++ is trying to be better than C; it
> can't do that if everything has to be done the way C does things.

This is all very reasonable. My personal point of view is very similar.
However, remember that C++ was born to be as compatible as possible
with C and that there is an ongoing effort to *improve* that.

Also, the established behavior in this area is very strong and
pervasive. Not considering the performance side of the issue...

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: mckelvey@maskull.com ("James W. McKelvey")
Date: Mon, 10 May 2004 07:18:40 +0000 (UTC)
Raw View
Paolo Carlini wrote:
>
> > But memcmp doesn't work with chars anyway. It works with unsigned bytes;
> > so if I had to use it, I would be very careful and would know the
> > limitations.
>
> This one *could* be a very good point ;)
>
> Unfortunately, the C Standard *always* talks about *characters* in
> 7.21.4...

OK, but memcmp is DEFINED as taking void *, not char *.

>
> > std::string and char_traits are not C, but C++, and can do whatever they
> > are defined to do. If there is a requirement that they follow C in some
> > manner, I don't see it in the standard.
>
> Interestingly for the historian ;) the 1995 draft had the requirement
> that memcmp be used in the implementation of compare.

Interesting also that that requirement was taken out!

>
> > memcmp and strcmp are C functions that do something similar to C++
> > functionality; they are not exact replacements, and must be used with
> > care.
> >
> > This is the way I approach it: C++ is trying to be better than C; it
> > can't do that if everything has to be done the way C does things.
>
> This is all very reasonable. My personal point of view is very similar.
> However, remember that C++ was born to be as compatible as possible
> with C and that there is an ongoing effort to *improve* that.
>
> Also, the established behavior in this area is very strong and
> pervasive. Not considering the performance side of the issue...
>

Performance could be reclaimed by adding a "smemcmp" that compares bytes
as signed.

--
What portion in the world can the artist have
Who has awakened from the common dream
But dissipation and despair?
-- William Butler Yeats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: AlbertoBarbati@libero.it (Alberto Barbati)
Date: Mon, 10 May 2004 23:20:29 +0000 (UTC)
Raw View
Paolo Carlini wrote:
> llewelly wrote:
>
>> Alberto made two suggestions. Only the second seems to guarantee
>>     consistency between lt() and compare().
>
>
> Indeed, you are right. Actually, the plain char case is the one
> really troublesome, since, for wchar_t, C99, 7.24.4.4,p1 holds.
>

Correct. Let's re-cap:

1) the issue applies only to char (not wchar_t) and the root of all evil
is C99 7.21.4/1 that states explictly that for memcmp and strcmp the
values of characters have to be "interpreted as unsigned char".

2) it is reasonable for the user to expect that s1 < s2 is equivalent to
strcmp(s1.c_str(), s2.c_str()) < 0 (this seems to be the most
controversial point, however)

3) because of point 2 and efficiency issues, at least three library
implementations are already implementing char_traits<char>::compare() by
means of memcmp()

4) implementing char_traits<char>::compare() in this way makes it
impossibile to abide by 21.1.3.1/6 while respecting the semantic of
table 37. In particular:

   char a = '\xa0';
   char b = '\x41';
   std::cout
     << std::char_traits<char>::lt(a, b)
     << (std::char_traits<char>::compare(&a, &b, 1) < 0)
     << "\n";

produces the output 1 0, while it should be either 0 0 or 1 1, according
to table 37.

That said, I propose to file a DR with the following resolution:

Replace in 21.1.3.1/6 the sentence:

"The two-argument members assign, eq, and lt are defined identically to
the built-in operators =, ==, and < respectively."

with

"The two-argument members assign and eq are defined identically to the
built-in operators =, == respectively. The two-argument member lt is
defined identically to the built-in operator < applied after
interpreting the argument values as unsigned char."

Any comment?

Alberto

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: llewelly.at@xmission.dot.com (llewelly)
Date: Tue, 11 May 2004 03:46:38 +0000 (UTC)
Raw View
AlbertoBarbati@libero.it (Alberto Barbati) writes:

> Paolo Carlini wrote:
>> llewelly wrote:
>>
>>> Alberto made two suggestions. Only the second seems to guarantee
>>>     consistency between lt() and compare().
>> Indeed, you are right. Actually, the plain char case is the one
>> really troublesome, since, for wchar_t, C99, 7.24.4.4,p1 holds.
>>
>
> Correct. Let's re-cap:
>
> 1) the issue applies only to char (not wchar_t) and the root of all
>    evil is C99 7.21.4/1 that states explictly that for memcmp and
>    strcmp the values of characters have to be "interpreted as unsigned
>    char".
>
> 2) it is reasonable for the user to expect that s1 < s2 is equivalent
>    to strcmp(s1.c_str(), s2.c_str()) < 0 (this seems to be the most
>    controversial point, however)
>
> 3) because of point 2 and efficiency issues, at least three library
>    implementations are already implementing
>    char_traits<char>::compare() by means of memcmp()
>
> 4) implementing char_traits<char>::compare() in this way makes it
>    impossibile to abide by 21.1.3.1/6 while respecting the semantic of
>    table 37. In particular:
>
>    char a = '\xa0';
>    char b = '\x41';
>    std::cout
>      << std::char_traits<char>::lt(a, b)
>      << (std::char_traits<char>::compare(&a, &b, 1) < 0)
>      << "\n";
>
> produces the output 1 0, while it should be either 0 0 or 1 1,
> according to table 37.
>
> That said, I propose to file a DR with the following resolution:
>
> Replace in 21.1.3.1/6 the sentence:
>
> "The two-argument members assign, eq, and lt are defined identically
> to the built-in operators =, ==, and < respectively."
>
> with
>
> "The two-argument members assign and eq are defined identically to the
> built-in operators =, == respectively. The two-argument member lt is
> defined identically to the built-in operator < applied after
> interpreting the argument values as unsigned char."
>
> Any comment?
[snip]

It seems there is no really good way out. Except for an implementation
    make char unsigned? But for some implementations that would break
    too much code.

That said, I think your proposal is an improvement, and you should
    submit it.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: mckelvey@maskull.com ("James W. McKelvey")
Date: Tue, 11 May 2004 06:07:11 +0000 (UTC)
Raw View
Alberto Barbati wrote:
>
> Paolo Carlini wrote:
> > llewelly wrote:
> >
> >> Alberto made two suggestions. Only the second seems to guarantee
> >>     consistency between lt() and compare().
> >
> >
> > Indeed, you are right. Actually, the plain char case is the one
> > really troublesome, since, for wchar_t, C99, 7.24.4.4,p1 holds.
> >
>
> Correct. Let's re-cap:
>
> 1) the issue applies only to char (not wchar_t) and the root of all evil
> is C99 7.21.4/1 that states explictly that for memcmp and strcmp the
> values of characters have to be "interpreted as unsigned char".
>
> 2) it is reasonable for the user to expect that s1 < s2 is equivalent to
> strcmp(s1.c_str(), s2.c_str()) < 0 (this seems to be the most
> controversial point, however)
>
> 3) because of point 2 and efficiency issues, at least three library
> implementations are already implementing char_traits<char>::compare() by
> means of memcmp()
>
> 4) implementing char_traits<char>::compare() in this way makes it
> impossibile to abide by 21.1.3.1/6 while respecting the semantic of
> table 37. In particular:
>
>    char a = '\xa0';
>    char b = '\x41';
>    std::cout
>      << std::char_traits<char>::lt(a, b)
>      << (std::char_traits<char>::compare(&a, &b, 1) < 0)
>      << "\n";
>
> produces the output 1 0, while it should be either 0 0 or 1 1, according
> to table 37.
>
> That said, I propose to file a DR with the following resolution:
>
> Replace in 21.1.3.1/6 the sentence:
>
> "The two-argument members assign, eq, and lt are defined identically to
> the built-in operators =, ==, and < respectively."
>
> with
>
> "The two-argument members assign and eq are defined identically to the
> built-in operators =, == respectively. The two-argument member lt is
> defined identically to the built-in operator < applied after
> interpreting the argument values as unsigned char."
>
> Any comment?

Yes, your DR should be rejected. But if logical consistency has to be
jettisoned, it's probably the best way to do it.

Accepting it will give people one more reason to laugh at C++.

> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

--
What portion in the world can the artist have
Who has awakened from the common dream
But dissipation and despair?
-- William Butler Yeats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Wed, 5 May 2004 17:05:05 +0000 (UTC)
Raw View
Hi,

in the GNU libstdc++-v3 project we have this PR:

   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15276

which basically points out that, on machines characterized by a signed
char type, implementing traits<char>::compare in terms of memcmp is
incorrect on the face of Table 37.

The letter of the standard appear to support this, but, on the other
hand, all the implementations that I have checked, actually rely on
memcmp (and wmemcmp) for the specializations.

What do you think?

Thanks,
Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: AlbertoBarbati@libero.it (Alberto Barbati)
Date: Thu, 6 May 2004 04:30:20 +0000 (UTC)
Raw View
Paolo Carlini wrote:
> Hi,
>
> in the GNU libstdc++-v3 project we have this PR:
>
>   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15276
>
> which basically points out that, on machines characterized by a signed
> char type, implementing traits<char>::compare in terms of memcmp is
> incorrect on the face of Table 37.
>
> The letter of the standard appear to support this, but, on the other
> hand, all the implementations that I have checked, actually rely on
> memcmp (and wmemcmp) for the specializations.
>
> What do you think?

According to my intepretation of table 37 alone, it would be perfectly
reasonable to define char_traits<char>::lt like this:

bool char_traits<char>::lt(char c, char d)
{
   return static_cast<unsigned char>(c) < static_cast<unsigned char>(d);
}

With this definition, implementing char_traits<char>::compare in terms
of memcmp would be perfectly consistent.

However, 21.1.3.1/6 explicitly states that lt() must be identical to the
built-in operator "<" (without casts) :-(

With this constraint, it's clearly impossible to implement compare() in
terms of memcmp() without violating the semantic of table 37, if char is
signed. In order to avoid the explicit loop, a conforming implementation
should use a different function, hopefully resolving to a compiler
intrinsic if available.

OTOH, a DR could be issued with one of two possible solutions:

1) relax the semantic of compare() (at least for the char and wchar_t
specialization)
2) remove the restriction on lt() for char and wchar_t specialization

Alberto

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pcarlini@suse.de (Paolo Carlini)
Date: Thu, 6 May 2004 17:02:29 +0000 (UTC)
Raw View
Alberto Barbati wrote:
> However, 21.1.3.1/6 explicitly states that lt() must be identical to the
> built-in operator "<" (without casts) :-(

Exactly, :-(

> OTOH, a DR could be issued with one of two possible solutions:
>
> 1) relax the semantic of compare() (at least for the char and wchar_t
> specialization)
> 2) remove the restriction on lt() for char and wchar_t specialization

In my personal opinion a DR is in order: personally, I find unreasonable
being forced by the C++ standard to compare /the very same/ char type as
signed whereas all the C standard functions compare it as unsigned.

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net (James Kuyper)
Date: Thu, 6 May 2004 17:06:17 +0000 (UTC)
Raw View
AlbertoBarbati@libero.it (Alberto Barbati) wrote in message news:<u3gmc.170690$Kc3.5429610@twister2.libero.it>...
..
> According to my intepretation of table 37 alone, it would be perfectly
> reasonable to define char_traits<char>::lt like this:
>
> bool char_traits<char>::lt(char c, char d)
> {
>    return static_cast<unsigned char>(c) < static_cast<unsigned char>(d);
> }

With that definition, if c is negative and d is not, then
std::char_traits<char>::lt(c,d) return false. How would you argue that
this counts as a conforming implementation of the specification that
it "yields: whether c is to be treated as less than d"?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: AlbertoBarbati@libero.it (Alberto Barbati)
Date: Fri, 7 May 2004 00:42:17 +0000 (UTC)
Raw View
James Kuyper wrote:
> AlbertoBarbati@libero.it (Alberto Barbati) wrote in message news:<u3gmc.170690$Kc3.5429610@twister2.libero.it>...
>
>>According to my intepretation of table 37 alone, it would be perfectly
>>reasonable to define char_traits<char>::lt like this:
>>
>>bool char_traits<char>::lt(char c, char d)
>>{
>>   return static_cast<unsigned char>(c) < static_cast<unsigned char>(d);
>>}
>
> With that definition, if c is negative and d is not, then
> std::char_traits<char>::lt(c,d) return false. How would you argue that
> this counts as a conforming implementation of the specification that
> it "yields: whether c is to be treated as less than d"?

The fact is that the numeric values of the characters have nothing to do
with the notion of "order". For example, it is perfectly reasonable to
consider character U+A0 NO-BREAK SPACE "greater" than U+41 LATIN CAPITAL
LETTER A, despite the fact that the first is stored with negative value
and the second with a positive one (assuming char is 8-bit and signed).

That is also consistent with the C library, as, for example,
strcmp("\xa0", "\x41") returns 1.

Alberto

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]