Topic: String class proposed by ANSI committee


Author: bill_law@taligent.com (William A. Law)
Date: Thu, 2 Mar 1995 18:40:14 GMT
Raw View
In article <D4q1p8.5uy@lcpd2.SanDiegoCA.ATTGIS.COM>,
swf@elsegundoca.ncr.com wrote:

> True, but this convenience has been found to cause more bugs than it
> is worth.

Can you offer some concrete evidence in support of this?  It is my
impression that most string classes in fact provide such a conversion
operator.  That would seem to contradict your statement.  What percentage
of the string classes in existence don't have a conversion operator?

Thanks for your concern, but I'll take the convenience.

Bill Law




Author: Greg Wilkins <gregw@ind.tansu.com.au>
Date: 28 Feb 1995 00:17:05 GMT
Raw View
fjh@munta.cs.mu.OZ.AU (Fergus Henderson) wrote:
>
> stevep@silcom.com (Steven L. Pearson) writes:
>
> [Re: converting `string' to `const char *']
>
> >Anyone know why they call an explicit member function instead overloading
> >const char*? It's a lot more convenient to pass a string directly to functions
> >expecting a const char* than having to call a member function. TIA.
>
> Basically because an implicit conversion of that nature is an
> invitation to make subtle but dangerous mistakes.
>
> The new lifetime-of-temporaries rule fixes that particular problem,
> but similar ones exist:
>
>  string foo() { return "foo"; }
>  main() {
>   Set<const char *> s;
>   s.insert(foo());
>   if (s.contains("foo")) { // undefined behaviour
>    // ...
>   }
>  }
>
> Requiring you to use `c_str()' doesn't prevent these problems, but it
> does make them visible, and so the programmer is much more likely to
> think about them.  Here the problem is with
>
>   s.insert(foo().c_str());
>
> because foo() is a temporary, and so the pointer returned by c_str()
> becomes invalid at the end of the expression.

But the problem you describe is not a problem of conversion, it is
a problem of returning temporaries, and taking references/pointers
to them.  Code such as:

 string& fooRef = foo();
 string* fooPtr = &foo();
        void bar(const string&);
        bar(foo());

is equally problomatic and does not involve a conversion. And while
such code is fairly obvious, the same problem can be hidden as
follows:

 string foo() { return "foo"; }
 main()
 {
  Set<string> s;
  s.insert(foo()); // Error with some Set implementation
  }

If the Set implementation takes references to the inserted items,
then the above is an error.

I think the whole idea on returning temporaries is a bad one, unless
your string class implements reference counting and lazy copies.

However, this does not answer the original question about explicit
conversion of the standard string to a const char*.
I agree with the original poster, that the conversion operator
is a more natural way of achieving conversion.

-------------------------------------------------------------------------------
Greg Wilkins:Consultant for Object Oriented Pty. Ltd. (OOPL)|You're not Dorothy
       Site @Telecom, Intelligent Network Development       |I'm not Toto!
       Snail:P.O. Box 1826,North Sydney,NSW.2089, Australia |And this
       Email:gregw@ind.tansu.com.au (gregw@oose.com.au)     |definitely is
       Fax  :(+61 2) 3953225 or OOPL Office:(+61 2) 9565089 |not Kansas!
       Phone:(+61 2) 3953461 or OOPL Office:(+61 2) 9571092 |  -Fleischman
-------------------------------------------------------------------------------





Author: swf@elsegundoca.ncr.com (Stan Friesen)
Date: Tue, 28 Feb 1995 17:53:31 GMT
Raw View
In article <stevep.5.004336C2@silcom.com>, stevep@silcom.com (Steven L. Pearson) writes:
|>
|> >It is still correct.  The member function is called `c_str()'.
|>
|> Anyone know why they call an explicit member function instead overloading
|> const char*?

My how this goes in circles.  This whole thread was *started* by that
issue.

The reason is that it was deemed too prone to error to allow *implicit*
conversion of String's to 'char *'s.  That is, the compiler tends to
use the implicit conversion in places that the programmer finds non-
intuitive,and thus does not expect.

Also, experience in C++ has shown that loops in the conversion graph can
cause problems of various sorts.  Thus, to be stable, one has to choose
between the constructor "String(const char *)" and the conversion operator
"operator const char *()".

|> It's a lot more convenient to pass a string directly to functions
|> expecting a const char* than having to call a member function. TIA.
|>
True, but this convenience has been found to cause more bugs than it
is worth.

There are more detailed discussions of some of the issues with implicit
conversion in "C++ Strategies and Tactics" by Robert Murray and in
"Effective C++" by Scott Meyers.

Both strongly recommend caution and restraint in defining implicit conversions.

--
swf@elsegundoca.attgis.com  sarima@netcom.com

The peace of God be with you.




Author: stevep@silcom.com (Steven L. Pearson)
Date: Tue, 28 Feb 1995 21:52:14 GMT
Raw View
In article <9505704.23189@mulga.cs.mu.OZ.AU> fjh@munta.cs.mu.OZ.AU (Fergus Henderson) writes:
>From: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
>Subject: Re: String class proposed by ANSI committee
>Date: Sat, 25 Feb 1995 17:14:18 GMT

>stevep@silcom.com (Steven L. Pearson) writes:

>[Re: converting `string' to `const char *']

>>Anyone know why they call an explicit member function instead overloading
>>const char*? It's a lot more convenient to pass a string directly to functions
>>expecting a const char* than having to call a member function. TIA.

>Basically because an implicit conversion of that nature is an
>invitation to make subtle but dangerous mistakes.

>With the old lifetime-of-temporaries rule it was particularly dangerous:

>        string foo() { return "foo"; }
>        main() { cout << foo(); }
>                // undefined behaviour
>                // sometimes dumps core on some implementations

>The new lifetime-of-temporaries rule fixes that particular problem,
>but similar ones exist:

>        string foo() { return "foo"; }
>        main() {
>                Set<const char *> s;
>                s.insert(foo());
>                if (s.contains("foo")) {        // undefined behaviour
>                        // ...
>                }
>        }
>
>Requiring you to use `c_str()' doesn't prevent these problems, but it
>does make them visible, and so the programmer is much more likely to
>think about them.  Here the problem is with

>                s.insert(foo().c_str());

>because foo() is a temporary, and so the pointer returned by c_str()
>becomes invalid at the end of the expression.

Sorry for the long include.

I guess I don't see what this has to do with implicit casts. I mean,
foo().c_str() doesn't tell me any more about its temporary-ness than does
foo().

It sounds like there's general agreement on this point. I'm going to move this
thread over to comp.lang.c++, and see if someone can explain it to me in baby
talk <G>. I welcome your further comments. Thanks.

Regards,

Steve





Author: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
Date: Tue, 28 Feb 1995 14:49:34 GMT
Raw View
Greg Wilkins <gregw@ind.tansu.com.au> writes:

>fjh@munta.cs.mu.OZ.AU (Fergus Henderson) wrote:
>>
>> stevep@silcom.com (Steven L. Pearson) writes:
>>
>> [Re: converting `string' to `const char *']
>> >Anyone know why they call an explicit member function instead overloading
>> >const char*?
>>
>> Basically because an implicit conversion of that nature is an
>> invitation to make subtle but dangerous mistakes.
[...]
>But the problem you describe is not a problem of conversion, it is
>a problem of returning temporaries, and taking references/pointers
>to them.  Code such as:
>
> string& fooRef = foo();
> string* fooPtr = &foo();
>        void bar(const string&);
>        bar(foo());
>
>is equally problomatic and does not involve a conversion.

No, that code is not equally problematic.
Let's go through it step at a time:

 string& fooRef = foo();

Here, the lifetime of the temporary object is the same as the lifetime
of `fooRef', so there's no danger here.

 string* fooPtr = &foo();

This one is simply illegal - you can't take the address of a non-lvalue.
Since the error is detected at compile time, this is not dangerous.

        void bar(const string&);
        bar(foo());

That one works fine, presuming bar() doesn't try to squirrel away a pointer
or reference to its argument in some global variable.

>And while
>such code is fairly obvious, the same problem can be hidden as
>follows:
>
> string foo() { return "foo"; }
> main()
> {
>  Set<string> s;
>  s.insert(foo()); // Error with some Set implementation
>  }
>
>If the Set implementation takes references to the inserted items,
>then the above is an error.

I agree that a set implementation whose insert() member takes
a reference and saves another reference/pointer to that object
without making a copy is dangerous.  Requiring an explict
conversion from `string' to `const char *' doesn't eliminate _all_
possible dangers.  But it does eliminate a good number of them.

--
Fergus Henderson - fjh@munta.cs.mu.oz.au
all [L] (programming_language(L), L \= "Mercury") => better("Mercury", L) ;-)




Author: stevep@silcom.com (Steven L. Pearson)
Date: Sat, 25 Feb 1995 05:22:28 GMT
Raw View
In article <D4IAJA.5B6@research.att.com> ark@research.att.com (Andrew Koenig) writes:
>From: ark@research.att.com (Andrew Koenig)
>Subject: Re: String class proposed by ANSI committee
>Date: Fri, 24 Feb 1995 13:23:34 GMT

>In article <stevep.5.004336C2@silcom.com> stevep@silcom.com (Steven L. Pearson) writes:

>> Anyone know why they call an explicit member function instead overloading
>> const char*? It's a lot more convenient to pass a string directly to functions
>> expecting a const char* than having to call a member function. TIA.

>Converting a string to a character pointer is intrinsically an
>unsafe operation because the memory to which the pointer points
>will be freed at some point as a side effect of doing something
>to the string.  As a general rule, it is wise not to allow
>unsafe operations as implicit conversions.

Umm.. looks like there's two separate issues here. My question was, given that
the member function and the cast accomplish the exact same thing, i.e., expose
the internal representation of the string, why bother with a member function?
This is a question of convenience in usage. Am I wrong about them doing the
same thing?

As to allowing implicit conversions: I thought the whole point of overloading
cast operators for was to allow the programmer to specify safe conversions.
Perhaps you could direct me to a good textbook treatment of this idea, if you
know of one. Thanks.

Regards,

Steve








Author: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
Date: Sat, 25 Feb 1995 17:14:18 GMT
Raw View
stevep@silcom.com (Steven L. Pearson) writes:

[Re: converting `string' to `const char *']

>Anyone know why they call an explicit member function instead overloading
>const char*? It's a lot more convenient to pass a string directly to functions
>expecting a const char* than having to call a member function. TIA.

Basically because an implicit conversion of that nature is an
invitation to make subtle but dangerous mistakes.

With the old lifetime-of-temporaries rule it was particularly dangerous:

 string foo() { return "foo"; }
 main() { cout << foo(); }
  // undefined behaviour
  // sometimes dumps core on some implementations

The new lifetime-of-temporaries rule fixes that particular problem,
but similar ones exist:

 string foo() { return "foo"; }
 main() {
  Set<const char *> s;
  s.insert(foo());
  if (s.contains("foo")) { // undefined behaviour
   // ...
  }
 }

Requiring you to use `c_str()' doesn't prevent these problems, but it
does make them visible, and so the programmer is much more likely to
think about them.  Here the problem is with

  s.insert(foo().c_str());

because foo() is a temporary, and so the pointer returned by c_str()
becomes invalid at the end of the expression.

--
Fergus Henderson - fjh@munta.cs.mu.oz.au
all [L] (programming_language(L), L \= "Mercury") => better("Mercury", L) ;-)




Author: stevep@silcom.com (Steven L. Pearson)
Date: 24 Feb 95 07:28:28 GMT
Raw View
In article <9504916.19178@mulga.cs.mu.OZ.AU> fjh@munta.cs.mu.OZ.AU (Fergus Henderson) writes:
>From: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
>Subject: Re: String class proposed by ANSI committee
>Date: Sat, 18 Feb 1995 05:54:40 GMT

>matt@physics2.berkeley.edu (Matt Austern) writes:

>It is still correct.  The member function is called `c_str()'.

Anyone know why they call an explicit member function instead overloading
const char*? It's a lot more convenient to pass a string directly to functions
expecting a const char* than having to call a member function. TIA.

Steve

P.S. Yes, I'm a total newbie. If it's dumb question, ignore it. If I'm wasting
bandwidth, let me know by private email.

>--
>Fergus Henderson - fjh@munta.cs.mu.oz.au
>all [L] (programming_language(L), L \= "Mercury") => better("Mercury", L) ;-)




Author: ark@research.att.com (Andrew Koenig)
Date: Fri, 24 Feb 1995 13:23:34 GMT
Raw View
In article <stevep.5.004336C2@silcom.com> stevep@silcom.com (Steven L. Pearson) writes:

> Anyone know why they call an explicit member function instead overloading
> const char*? It's a lot more convenient to pass a string directly to functions
> expecting a const char* than having to call a member function. TIA.

Converting a string to a character pointer is intrinsically an
unsafe operation because the memory to which the pointer points
will be freed at some point as a side effect of doing something
to the string.  As a general rule, it is wise not to allow
unsafe operations as implicit conversions.
--
    --Andrew Koenig
      ark@research.att.com




Author: lubell@nist.gov (Josh Lubell)
Date: 16 Feb 1995 14:16:29 GMT
Raw View
Thanks to Jason Merrill and Stan Friesen for responding to my inquiry
regarding the status of the ANSI-proposed String class.  My
understanding is that

1) The String class proposal is likely to go through some more changes
before it becomes standardized.

2) String will be a template class, and it will be part of the
standard library.

3) It is currently proposed that conversion to char* not be included
in the String class's functionality.

4) The String class in GNU's libg++ class library has a superset of
the earlier proposed String class (before it was decided to make
String a template class).

Given my understanding, I plan to use the String class from libg++ in
my implementation of the STEP (ISO 10303) Data Access Interface.  I
hope, however, the the language committee reconsiders its plans to
drop char* conversion from the standard.  I think that most
applications using a String class (mine included) would want to be
able to get at the characters inside a string.

--Josh
--
____________________________________________________________________
                                   Josh Lubell | lubell@cme.nist.gov
National Institute of Standards and Technology |
    Manufacturing Systems Integration Division |
                       A127 Metrology Building | Voice:(301)975-3563
                    Gaithersburg, MD 20899 USA | Fax:  (301)869-0917




Author: mav@gaia.cc.gatech.edu (Maurizio Vitale)
Date: 16 Feb 1995 18:41:55 GMT
Raw View
In article <LUBELL.95Feb16091629@villars.nist.gov> lubell@nist.gov (Josh Lubell) writes:


   Given my understanding, I plan to use the String class from libg++ in
   my implementation of the STEP (ISO 10303) Data Access Interface.  I
   hope, however, the the language committee reconsiders its plans to
   drop char* conversion from the standard.  I think that most
   applications using a String class (mine included) would want to be
   able to get at the characters inside a string.

You can get to the "characters" that make up the strings, only no
operator char* is supplied. basic_string::c_str returns a pointer to
the initial element of an array of length len+1. You can also
read/write at specified position using basic_string::get_at and
basic_string::put_at.
--
    Maurizio Vitale
 _______________
|        _      |\   e-mail: mav@cc.gatech.edu     | How many times can
|  /|/| '_) | ) | |  voice:  (404) 881-6083 (home) | a man turn his head,
| | | |_(_|_|/  | |          (404) 853-9382 (work) | and pretend that he
|_______________| |                                | just doesn't see ?
 \_______________\|  fax:    (404) 853-9378        |  - Bob Dylan




Author: jason@cygnus.com (Jason Merrill)
Date: Thu, 16 Feb 1995 20:10:59 GMT
Raw View
>>>>> Josh Lubell <lubell@nist.gov> writes:

> 1) The String class proposal is likely to go through some more changes
> before it becomes standardized.

Yes.

> 2) String will be a template class, and it will be part of the
> standard library.

Yes.

> 3) It is currently proposed that conversion to char* not be included
> in the String class's functionality.

Yes.

> 4) The String class in GNU's libg++ class library has a superset of
> the earlier proposed String class (before it was decided to make
> String a template class).

No.  The String class in libg++ was written in 1988, long before the
standard string class was proposed.

The string class in libstdc++ (part of the libg++-2.6.2 distribution) is an
implementation of the first templatized string proposal.

Jason




Author: matt@physics2.berkeley.edu (Matt Austern)
Date: 17 Feb 1995 00:05:30 GMT
Raw View
In article <JASON.95Feb16121059@phydeaux.cygnus.com> jason@cygnus.com (Jason Merrill) writes:

> > 3) It is currently proposed that conversion to char* not be included
> > in the String class's functionality.
>
> Yes.

This is slightly ambiguous.  My understanding was that there would not
be a conversion operator to char*, but that there would be a member
function returning a const char* that you could explicitly call if you
need to get access to the underlying characters.

Is this still correct, or does the latest working paper propose to
have no such member function?
--

                               --matt




Author: jason@cygnus.com (Jason Merrill)
Date: Fri, 17 Feb 1995 10:02:12 GMT
Raw View
>>>>> Matt Austern <matt@physics2.berkeley.edu> writes:

> This is slightly ambiguous.  My understanding was that there would not
> be a conversion operator to char*, but that there would be a member
> function returning a const char* that you could explicitly call if you
> need to get access to the underlying characters.

This is still the case.  In particular:

  21.1.1.4.11  basic_string::c_str                   [lib.string::c.str]

  const charT* c_str() const;

  Returns:
    A  pointer  to  the initial element of an array of length size() + 1
    whose first size() elements equal the corresponding elements of  the
    string  controlled by *this and whose last element is a null charac-
    ter specified by traits::eos().
  Requires:
    The program shall not alter any of the values stored in  the  array.
    Nor  shall  the  program treat the returned value as a valid pointer
    value after any subsequent call to a non-const  member  function  of
    the class basic_string that designates the same object as this.
  Notes:
    Uses traits::eos().

  21.1.1.4.12  basic_string::data                     [lib.string::data]

  const charT* data() const;

  Returns:
    c_str() if size() is nonzero, otherwise a null pointer.
  Requires:
    The  program shall not alter any of the values stored in the charac-
    ter array.  Nor shall the program treat  the  returned  value  as  a
    valid pointer value after any subsequent call to a non- const member
    function of basic_string that designates the same object as this.

Jason




Author: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
Date: Sat, 18 Feb 1995 05:54:40 GMT
Raw View
matt@physics2.berkeley.edu (Matt Austern) writes:

>jason@cygnus.com (Jason Merrill) writes:
>
>> > 3) It is currently proposed that conversion to char* not be included
>> > in the String class's functionality.
>>
>> Yes.
>
>This is slightly ambiguous.  My understanding was that there would not
>be a conversion operator to char*, but that there would be a member
>function returning a const char* that you could explicitly call if you
>need to get access to the underlying characters.
>
>Is this still correct, or does the latest working paper propose to
>have no such member function?

It is still correct.  The member function is called `c_str()'.

--
Fergus Henderson - fjh@munta.cs.mu.oz.au
all [L] (programming_language(L), L \= "Mercury") => better("Mercury", L) ;-)