Topic: What's the preferred manner of creating case-insensitive basic_string<> ?


Author: William Thornhill <box@flash.net>
Date: 1996/10/03
Raw View
An easy implementation of case insesitive compare... make a temp copy of the data and
OR each char with 0x20 (32 decimal) which will convert upper to lower without changing
lower.
--
William Thornhill                                    Thornhill Enterprises
Contact Info:                                    Computer Systems/Software
Voice Mail @ Dallas-Fort Worth Metro 214 593 9215     E-Mail box@flash.net
Snail Mail @ 1317 ValleyView, Mesquite, Texas  75149


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: Chelly Green <chelly@eden.com>
Date: 1996/10/03
Raw View
William Thornhill wrote:
>
> An easy implementation of case insesitive compare... make a temp copy of
> the data and OR each char with 0x20 (32 decimal) which will convert upper
> to lower without changing lower.

This, in comp.std.c++?!?

Assuming the machine uses ASCII (plenty don't), this will only work if
the strings contain only letters. If they have any non-letters in them,
it will ignore some differences! Maybe the above should be worded:

An easy implementation of case insesitive compare... make a temp copy of
the data (both strings) and call tolower() on each char, which will
convert
upper to lower without changing lower.

or just

Compare corresponding characters in each string, both filtered through
tolower() (or toupper()) first.

Not exactly rocket science here.

--
Chelly Green | chelly@eden.com | C++ - http://www.eden.com/~chelly


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: kanze@gabi-soft.fr (J. Kanze)
Date: 1996/10/04
Raw View
Chelly Green <chelly@eden.com> writes:

> William Thornhill wrote:
> >
> > An easy implementation of case insesitive compare... make a temp copy of
> > the data and OR each char with 0x20 (32 decimal) which will convert upper
> > to lower without changing lower.
>
> This, in comp.std.c++?!?

I believe that the comp.std.c++ moderators use the same standard we use
in comp.lang.c++.moderated; in particular, we will never reject a
posting on the grounds that (we think) it is technically incorrect.

> Assuming the machine uses ASCII (plenty don't), this will only work if
> the strings contain only letters. If they have any non-letters in them,
> it will ignore some differences! Maybe the above should be worded:
>
> An easy implementation of case insesitive compare... make a temp copy of
> the data (both strings) and call tolower() on each char, which will
> convert
> upper to lower without changing lower.
>
> or just
>
> Compare corresponding characters in each string, both filtered through
> tolower() (or toupper()) first.

Some recent private correspondance following up to this thread has made
me realize that it isn't as simple as it seems.  The question is: why do
you want the case insensitivity?  The functions tolower or toupper are
locale dependant; for someone working in the Middle East or North
Africa, for example, they are likely to be no-ops, since the local
languages (Arabic, Hebrew or Persian) don't have the notion of case.

If you are writing, e.g., a compiler, this might be a bit of a bother.
You probably don't want your language definition to change according to
the current locale.  So either you play games with setlocale (setting
LC_CTYPE to "C", for example), or you eschew all locale specific
functions (which means everything in ctype.h).  (I've usually used table
driven solutions in compilers, which means that there was no need for
ctype.h, but you get the idea.)
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]





Author: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 1996/10/01
Raw View
In article <324C48C1.70@delta.com> sean@delta.com (Sean L. Palmer)
writes:

|> > >>But case-sensitivity is bound to depend on the current locale.  If the
|> > >>current locale is set for French, for example, accented characters
|> > >>compare equal, where for other locale's they might not.
|> > >
|> > >I agree that case-insensitive comparisons depend on locale.
|> > >
|> > >What I was driving at is that the same program will want to compare
|> > >the same strings case-sensitively one time and case-insensitively
|> > [etc.]
|> >
|> > Actually, for clarity, I should probably be saying "exact-match
|> > comparisons vs. case-insensitive comparisons".  It only just occurred
|> > to me that James may have meant that there's yet a third category of
|> > comparisons, 'case sensitive that's not exact-match' because of
|> > treating some accented characters as equal.
|> >
|> > Still, the same program will want to compare the same strings
|> > different ways at different times (just here's yet another way to
|> > compare that requires its own function).

|> I made up a string comparison function the other day using strcoll(),
|> and it doesn't work anyway. strcoll() doesn't ignore case (my original
|> intent) and is slow.

What locale did you set?  I'm not sure what strcoll's behavior should be
for the C (the default) locale.  For other locale's, of course, the
behavior is implementation defined (as are all locale specific
behavior), but the intent is definitly to do "dictionary ordering".

With regards to performance, doing things like folding case will cause
the routine to run slower.  If a locale doesn't have case, or is case
insensitive, however, the routine should run as fast as strcmp.  The
usual implementation of changing a locale is to link in a different DLL,
I think, so the case insensitive locale's should simply pick up strcmp
under another name.

|> At least it doesn't ignore case in Borland's
|> implementation.  strlwr() doesn't work on strings (must make a temporary
|> buffer for the data).  stricmp() doesn't handle locale.  The function
|> (forgot the name) that creates a modified string which can compare
|> correctly using stricmp() loses the original string information (can't
|> print it back out).

I've never heard of strlwr or stricmp, but I know that they have added
some functions (particular for wchar_t) to C since my standard was
printed.

Strxfrm should not change the original string, but create a new copy.
If, however, strcoll is case sensitive, then the transformed string must
also maintain case information.  If information is lost, of course
(because e.g. the current locale is case insensitive), then there is no
way to restore it from the transformed string.  The intent is that you
keep the original string.

|> There just isn't an easy way to do what I need, I'm afraid.

IMHO, strcoll should do it.  Although I'm not 100% sure what you are
actually trying to do.  (To get case insensitivity, and nothing else,
I'm not sure what locale you should use.  In locale US, for example, I
think that Mc and Mac should compare equal when at the start of a word.)

|> I'd need a stricoll() function at least. For now I'm just using strlwr()
|> and the standard C locale.

Voila the problem (probably).  I think that in the standard C locale,
strcoll and strcmp should be the same.  (I could easily be wrong here,
though, since I don't have my C standard handy to verify, and I've never
used the default locale.)

Check your documentation to find out what locales are supported.  Locale
names are implementation defined, but one typical pattern is to build
them out of the ISO abbreviations for country and for the language,
e.g. us_en for the United States.  (It would be nice if something like
this were standardized, but I'm not holding my breath.)
--
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils,    tudes et r   alisations en logiciel orient    objet --
                -- A la recherche d'une activit    dans une region francophone



[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 1996/10/01
Raw View
In article <324ca812.170004527@news.interlog.com> willer@interlog.com
(Steve Willer) writes:

|> kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) wrote:

|> >To be frank, I can't quite see the use of this.  Either the string
|> >contains text (which requires case insensitivity) or it contains
|> >something else (say C++ source code) which doesn't.  However, I can see
|> >no reason to forbid it.

|> In the product I work on, an event detector, each event is defined in terms of
|> input text that matches certain patterns and possibly other requirements as
|> well. The pattern matching and extracted-variable matching can conceivably be
|> done case-sensitively or insensitively, and the product in fact allows this
|> 'sensitivity' to be configurable for each event.

|> If we were to use string and istring or whatever, then pattern matches against
|> input text might get pretty hairy. If it's case-insensitive, do we use code
|> like:

|>  if (event.case_sensitive) {
|>     retval = (GetNextMessage() == event.pattern)
|>  }
|>  else {
|>     retval = (istring(GetNextMessage() == istring(event.pattern));
|>     // GetNextMessage returns a string, so we have to cast it for the
|>     // comparison
|>  }
|>  return retval;

|> Hopefully, you don't think this is a viable option for a lot of string
|> operations in different functions.

So what is a viable option?  The above is not a viable option, but in
this case, my suggestion of switching locale's would be even worse.

FWIW: in the past, in such cases, I've generally built up DFA's to do
the matching.  The distinction in case sensitivity is only relevant when
building the DFA: if case insensitive, each letter defines two possible
transition conditions.  Done correctly, this allows testing for several
events simultaneously, returning an identifier for the event matched.
(My regular expression parser worked like this.  When you create a
regular expression, you specify the return value for a match, and you
can or several regular expressions together; a match on the or'ed RE's
returned the return code for the matched RE.)
--
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils,    tudes et r   alisations en logiciel orient    objet --
                -- A la recherche d'une activit    dans une region francophone



[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: Bo Persson <bo.persson@mbox3.swipnet.se>
Date: 1996/10/01
Raw View
Steve Willer wrote:
 >
 > kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) wrote:
 >
 > >To be frank, I can't quite see the use of this.  Either the string
 > >contains text (which requires case insensitivity) or it contains
 > >something else (say C++ source code) which doesn't.  However, I can
see
 > >no reason to forbid it.
 >
 > In the product I work on, an event detector, each event is defined in
terms of
 > input text that matches certain patterns and possibly other
requirements as
 > well. The pattern matching and extracted-variable matching can
conceivably be
 > done case-sensitively or insensitively, and the product in fact
allows this
 > 'sensitivity' to be configurable for each event.
 >
 > If we were to use string and istring or whatever, then pattern
matches against
 > input text might get pretty hairy. If it's case-insensitive, do we
use code
 > like:
 >
 >  if (event.case_sensitive) {
 >     retval = (GetNextMessage() == event.pattern)
 >  }
 >  else {
 >     retval = (istring(GetNextMessage() == istring(event.pattern));
 >      // GetNextMessage returns a string, so we have to cast it for
the
 >     // comparison
 >  }
 >  return retval;
 >
> Hopefully, you don't think this is a viable option for a lot of string
> operations in different functions.
>

It is the *event* that is case sensitive or not. The string is just
a representation of the input. Let the event decide how to detect
at match, like:

   retval = event.matches(GetNextMessage());


Bo Persson
bo.persson@mbox3.swipnet.se


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: kanze@gabi-soft.fr (J. Kanze)
Date: 1996/09/26
Raw View
herbs@cntc.com (Herb Sutter) writes:

> On 24 Sep 1996 12:25:30 PDT, herbs@cntc.com (Herb Sutter) wrote:
> >>But case-sensitivity is bound to depend on the current locale.  If the
> >>current locale is set for French, for example, accented characters
> >>compare equal, where for other locale's they might not.
> >
> >I agree that case-insensitive comparisons depend on locale.
> >
> >What I was driving at is that the same program will want to compare
> >the same strings case-sensitively one time and case-insensitively
> [etc.]
>
> Actually, for clarity, I should probably be saying "exact-match
> comparisons vs. case-insensitive comparisons".  It only just occurred
> to me that James may have meant that there's yet a third category of
> comparisons, 'case sensitive that's not exact-match' because of
> treating some accented characters as equal.

It's not a third category.  The very notion of case is locale-dependant.
(Japanese and Arabic, for example, have no case.)  The classical way of
doing a case-insensitve comparison is to convert everything to the same
case, then compare.  In French, this will give different results
depending on whether you convert to upper or to lower, since converting
an accented character to upper normally returns the unaccented versions.
And of course, there are cases where the conversion fails, e.g. the
German "es-zet" (which only exists in lower case, and is replaced by two
s's in upper case).  What do you do then?  (Standards related question:
can a function in ctype throw an exception?)

On the other hand, you can create a number of new categories depending
on how you treat white space.  (I would hope that a lexical ordering
would treat a '\t' and a ' ' the same, and any sequence of more than one
of them as if it were just one, for example.)

--
James Kanze           (+33) 88 14 49 00          email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs Bourgeois, 67000 Strasbourg, France
Conseils en informatique industrielle --
                            -- Beratung in industrieller Datenverarbeitung
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]





Author: sean@delta.com (Sean L. Palmer)
Date: 1996/09/27
Raw View

> >>But case-sensitivity is bound to depend on the current locale.  If the
> >>current locale is set for French, for example, accented characters
> >>compare equal, where for other locale's they might not.
> >
> >I agree that case-insensitive comparisons depend on locale.
> >
> >What I was driving at is that the same program will want to compare
> >the same strings case-sensitively one time and case-insensitively
> [etc.]
>
> Actually, for clarity, I should probably be saying "exact-match
> comparisons vs. case-insensitive comparisons".  It only just occurred
> to me that James may have meant that there's yet a third category of
> comparisons, 'case sensitive that's not exact-match' because of
> treating some accented characters as equal.
>
> Still, the same program will want to compare the same strings
> different ways at different times (just here's yet another way to
> compare that requires its own function).

I made up a string comparison function the other day using strcoll(),
and it doesn't work anyway. strcoll() doesn't ignore case (my original
intent) and is slow.  At least it doesn't ignore case in Borland's
implementation.  strlwr() doesn't work on strings (must make a temporary
buffer for the data).  stricmp() doesn't handle locale.  The function
(forgot the name) that creates a modified string which can compare
correctly using stricmp() loses the original string information (can't
print it back out).

There just isn't an easy way to do what I need, I'm afraid.

I'd need a stricoll() function at least. For now I'm just using strlwr()
and the standard C locale.



[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: willer@interlog.com (Steve Willer)
Date: 1996/09/28
Raw View
kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) wrote:

>To be frank, I can't quite see the use of this.  Either the string
>contains text (which requires case insensitivity) or it contains
>something else (say C++ source code) which doesn't.  However, I can see
>no reason to forbid it.

In the product I work on, an event detector, each event is defined in terms of
input text that matches certain patterns and possibly other requirements as
well. The pattern matching and extracted-variable matching can conceivably be
done case-sensitively or insensitively, and the product in fact allows this
'sensitivity' to be configurable for each event.

If we were to use string and istring or whatever, then pattern matches against
input text might get pretty hairy. If it's case-insensitive, do we use code
like:

 if (event.case_sensitive) {
    retval = (GetNextMessage() == event.pattern)
 }
 else {
    retval = (istring(GetNextMessage() == istring(event.pattern));
    // GetNextMessage returns a string, so we have to cast it for the
    // comparison
 }
 return retval;

Hopefully, you don't think this is a viable option for a lot of string
operations in different functions.


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: kanze@gabi-soft.fr (J. Kanze)
Date: 1996/09/23
Raw View
herbs@cntc.com (Herb Sutter) writes:

> 1. Case-sensitivity should be a property of the comparison function,
> not of the string object.  Sometimes you want to compare the same
> strings case-sensitively and case-insensitively.

    [...]
> It came up a month or two ago in this group, but my cursory reading of
> the thread wasn't enough to get a good feel for what the committee
> wanted.  There was discussion about case comparisons being
> locale-driven (which I submit is probably just as bad as (1.) above,
> since you want case-sensitivity to be a property of the comparison
> function and not the string object itself or the current locale).

But case-sensitivity is bound to depend on the current locale.  If the
current locale is set for French, for example, accented characters
compare equal, where for other locale's they might not.

--
James Kanze           (+33) 88 14 49 00          email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs Bourgeois, 67000 Strasbourg, France
Conseils en informatique industrielle --
                            -- Beratung in industrieller Datenverarbeitung


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: herbs@cntc.com (Herb Sutter)
Date: 1996/09/24
Raw View
On 23 Sep 1996 14:44:13 GMT, kanze@gabi-soft.fr (J. Kanze) wrote:
>herbs@cntc.com (Herb Sutter) writes:
>> 1. Case-sensitivity should be a property of the comparison function,
>> not of the string object.  Sometimes you want to compare the same
>> strings case-sensitively and case-insensitively.
>    [...]
>> It came up a month or two ago in this group, but my cursory reading of
>> the thread wasn't enough to get a good feel for what the committee
>> wanted.  There was discussion about case comparisons being
>> locale-driven (which I submit is probably just as bad as (1.) above,
>> since you want case-sensitivity to be a property of the comparison
>> function and not the string object itself or the current locale).
>
>But case-sensitivity is bound to depend on the current locale.  If the
>current locale is set for French, for example, accented characters
>compare equal, where for other locale's they might not.

I agree that case-insensitive comparisons depend on locale.

What I was driving at is that the same program will want to compare
the same strings case-sensitively one time and case-insensitively
another time.  Hence the case sensitivity must not be a property of
the string object; it has to be a property of the comparison function.

Perhaps an example would help.  With plain-C char-array strings, you
can do this easily:

    void f( char* sz )
    {
        if( strcmp( sz, szAnotherString ) != 0 )
            // do something

        // ... later ...
        if( stricmp( sz, szYetAnotherString ) == 0 )
            // do something else
    }

With standard C++ strings:

    void f( string s )
    {
        if( s != anotherString )
            // do something

        // ... later ...
        if( s ?? yetAnotherString )
            // do something else
    }

What do I write in for "??"?  It can't be "==" because that's
case-sensitive.  All I can do is fall back on C-isms:

        if( stricmp( s.c_str(), yetAnotherString.c_str() ) == 0 )

(Note incidentally that this is not really a perfect solution, since
string objects may contain nulls whereas C-style strings may not.
However, it's good enough for most uses.)

That's what I meant when I pointed out that making string s be of a
new string type (basic_string instantiated with a 'traits' that's
case-insensitive) doesn't help... in one place, s must be compared
case-sensitively, and in another, it must not.  Hence:

>> 1. Case-sensitivity should be a property of the comparison function,
>> not of the string object.  Sometimes you want to compare the same
>> strings case-sensitively and case-insensitively.

If anyone can show me a better recourse than 'stricmp(s1.c_str(),
s2.c_str()' I'd be overjoyed to hear about it.  I can't think of a
better one, even assuming the liberty of making changes to the
language. :-(


---
Herb Sutter (herbs@cntc.com)

Current Network Technologies Corp.
3100 Ridgeway, Suite 42, Mississauga ON Canada L5L 5M5
Tel 416-805-9088  Fax 905-608-2611
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]





Author: herbs@cntc.com (Herb Sutter)
Date: 1996/09/25
Raw View
On 24 Sep 1996 12:25:30 PDT, herbs@cntc.com (Herb Sutter) wrote:
>>But case-sensitivity is bound to depend on the current locale.  If the
>>current locale is set for French, for example, accented characters
>>compare equal, where for other locale's they might not.
>
>I agree that case-insensitive comparisons depend on locale.
>
>What I was driving at is that the same program will want to compare
>the same strings case-sensitively one time and case-insensitively
[etc.]

Actually, for clarity, I should probably be saying "exact-match
comparisons vs. case-insensitive comparisons".  It only just occurred
to me that James may have meant that there's yet a third category of
comparisons, 'case sensitive that's not exact-match' because of
treating some accented characters as equal.

Still, the same program will want to compare the same strings
different ways at different times (just here's yet another way to
compare that requires its own function).

---
Herb Sutter (herbs@cntc.com)

Current Network Technologies Corp.
3100 Ridgeway, Suite 42, Mississauga ON Canada L5L 5M5
Tel 416-805-9088  Fax 905-608-2611


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: sean@delta.com (Sean L. Palmer)
Date: 1996/09/20
Raw View

Has anyone done this? Isn't it a PITA?

Besides all the type incompatibilities between istring (case-insensitive)
and string that result, you have to define many functions to completely
replace everything that must check case and you still have all the overhead
of checking each time it's needed to worry about.
Either that or you can uppercase or lowercase each character or string as
it's inserted, which forces you to define more functions.

Can it be done by just making a replacement for string_char_traits or do
you have to make a new char type which is case-insensitive?

I've run into many problems with this and wondered if anyone knows what the
intent of the committee was for dealing with doing this. (i.e. the
'correct' way)




[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: herbs@cntc.com (Herb Sutter)
Date: 1996/09/20
Raw View
<disclaimer> Please note this is a quick off-the-cuff post, and I
haven't proofread it.  There are probably errors.  </disclaimer>

On 20 Sep 1996 14:28:03 GMT, sean@delta.com (Sean L. Palmer) wrote:
>Besides all the type incompatibilities between istring (case-insensitive)
>and string that result, you have to define many functions to completely
>replace everything that must check case and you still have all the overhead
>of checking each time it's needed to worry about.
>Either that or you can uppercase or lowercase each character or string as
>it's inserted, which forces you to define more functions.
>
>Can it be done by just making a replacement for string_char_traits or do
>you have to make a new char type which is case-insensitive?

Yeah, we did it by providing a string_char_traits with toupper()'d
implementations for .eq(), .ne(), and .lt().  It worked as we
expected, which is why we decided not to use it :-), for several
reasons:

1. Case-sensitivity should be a property of the comparison function,
not of the string object.  Sometimes you want to compare the same
strings case-sensitively and case-insensitively.

2. Several of our functions return strings (e.g., string
IntToStr(int)), and we'd have to templatise some to get the right kind
of return value.  That might also force client code to specify the
desired instantiation (e.g., istring s = IntToStr<istring>(10)) which
is was deemed deplorably user-unfriendly.

3. To make istrings as easy to use as strings (and work
interoperably), we would have to define a global conversion from
istring to string, which isn't always what you want and can break code
besides (e.g., when it would add a second level of user-defined
conversions).  Even if you did so, note that you couldn't just define
conversions both ways since this would often cause ambiguities.

4. You can't have a case-sensitive and case-insensitive version of
operator==(string,string) anyway, which means that we'd still be stuck
with a global comparison function (e.g.,
StrCompareInsensitive(s1,s2)).  That means you can't get the intuitive
feel of "s1 == s2" anyway and so you're really no better off than
using stricmp directly.  It's also another name for programmers to
learn and use.

What did we end up using?  Simple:

    stricmp( s1.c_str(), s2.c_str() );

Works fine, and is intuitive.  I always mandate that bounds-checked
versions of C library calls must be used (in this case, that would
mean strnicmp) unless the code is provably correct in the unchecked
case; here string objects' .c_str() results must be always safely
null-term'd (unless you've already tromped on your heap, in which case
you're about to die anyway) so stricmp is acceptable.

If you like, there's syntactic sugar:

    inline int stricmp( const string& s1, const string& s2 )
        { return stricmp( s1.c_str(), s2.c_str() ); }

Of course, if you provide this, you should also provide the following
for efficiency to prevent expensive char*-to-string conversions (and I
think you really should prevent them altogether as shown below since
you should force programmers to use the strnicmp versions if bald
char*'s are involved):

    // in the cpp where the client code can't see it
    class unsafe_operation {};

    inline int stricmp( const char* s1, const string& s2 )
        { throw unsafe_operation; /* or as above */ }

    inline int stricmp( const string& s1, const char* s2 );
        { throw unsafe_operation; /* or as above */ }

But that's another discussion.

>I've run into many problems with this and wondered if anyone knows what the
>intent of the committee was for dealing with doing this. (i.e. the
>'correct' way)

It came up a month or two ago in this group, but my cursory reading of
the thread wasn't enough to get a good feel for what the committee
wanted.  There was discussion about case comparisons being
locale-driven (which I submit is probably just as bad as (1.) above,
since you want case-sensitivity to be a property of the comparison
function and not the string object itself or the current locale).

---
Herb Sutter (herbs@cntc.com)

Current Network Technologies Corp.
3100 Ridgeway, Suite 42, Mississauga ON Canada L5L 5M5
Tel 416-805-9088  Fax 905-608-2611


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 1996/09/20
Raw View
In article <01bba686$4a80bcc0$0a2920cc@landspeeder.delta.com>
sean@delta.com (Sean L. Palmer) writes:

|> Has anyone done this? Isn't it a PITA?

|> Besides all the type incompatibilities between istring (case-insensitive)
|> and string that result, you have to define many functions to completely
|> replace everything that must check case and you still have all the overhead
|> of checking each time it's needed to worry about.
|> Either that or you can uppercase or lowercase each character or string as
|> it's inserted, which forces you to define more functions.

|> Can it be done by just making a replacement for string_char_traits or do
|> you have to make a new char type which is case-insensitive?

I'm not sure what the intent of the committee is in this regard, but I
think I would look into the locale facets.  I know that in C, the way to
do this was to use strcoll, instead of strcmp, which depends on what
locale is set.

This said, what do you really mean by "a case-insensitive" string?
String is, in itself, just a container for the characters you put in it.
It is neither case-sensitive nor case-insensitive.  From my point of
view (and I think that this corresponds to what the standard says,
though from a completely different point of view), operators == and !=
compare two strings for exact equality; i.e.: they contain exactly the
same characters, and the operators <, >, <= and >= define an ordering
that is at least partially arbitrary, but that may be useful when
putting strings in associative containers like map and set (which
require a complete ordering of the contained objects).

If you are concerned about processing strings as text, rather than as a
simple container of char's, then you almost certainly need to provide
locale specific comparison functions.  (In French, for example, accented
characters compare equal to the unaccented ones, but in the Scandinavian
languages, they don't.)  This should be in <locale>.  (Although I've not
looked, knowing some of the people who worked on this, I'd be very
surprised if it wasn't.)
--
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils,    tudes et r   alisations en logiciel orient    objet --
                -- A la recherche d'une activit    dans une region francophone



[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]