Topic: Non-const overload of std::string::data()


Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Thu, 31 May 2007 19:30:03 GMT
Raw View
Hi Everybody,

(this is a follow-up of thread "std::string::data()" in
comp.lang.c++.moderated)

The latest draft (N2284) includes wording from issue #530, which adds a
requirement to std::basic_string: the string elements must be stored
contiguously in memory. It has been noticed in the cited thread that
such a requirement effectively makes it possible to use the expression
&s[0] to portably obtain a non-const pointer to the string internal buffer.

basic_string has a const data() member, but not a non-const overload.
However, obtaining a non-const pointer to the internal buffer is perhaps
the #1 FAQ about basic_string. Whatever rationale there was for not
providing the non-const overload has now been superseded by #530, IMHO.
I say we should just provide it.

Notice that issue #464 (which has also been included in the latest
draft) adds to std::vector both const and non-const overloads of the
member data(). One of the reasons for this addition is precisely to
avoid the use of the &v[0] idiom. This makes the lack of a non-const
data() in basic_string even more embarrassing.

Any comments?

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: greghe@pacbell.net (Greg Herlihy)
Date: Sat, 2 Jun 2007 03:32:00 GMT
Raw View
On 5/31/07 12:30 PM, in article e%e7i.23624$%k.105776@twister2.libero.it,
"Alberto Ganesh Barbati" <AlbertoBarbati@libero.it> wrote:

> (this is a follow-up of thread "std::string::data()" in
> comp.lang.c++.moderated)
>=20
> The latest draft (N2284) includes wording from issue #530, which adds a
> requirement to std::basic_string: the string elements must be stored
> contiguously in memory. It has been noticed in the cited thread that
> such a requirement effectively makes it possible to use the expression
> &s[0] to portably obtain a non-const pointer to the string internal buf=
fer.

Not quite: according to N2284 &s[0] points to a character buffer that is
neither modifiable (see =A721.3.4) nor internal to the std::string object=
 (see
Table 39). Furthermore, there is no requirement that a std::string even h=
as
to maintain an internal character buffer as such in the first place.

> basic_string has a const data() member, but not a non-const overload.
> However, obtaining a non-const pointer to the internal buffer is perhap=
s
> the #1 FAQ about basic_string. Whatever rationale there was for not
> providing the non-const overload has now been superseded by #530, IMHO.
> I say we should just provide it.

What for? A non-const data() overload would create the false impression t=
hat
the non-const data() overload could be used to modify the string itself -=
 a
sheer impossibility in light of a std::string's design guarantees.
Specifically: in order to support std::string implementations that use
reference-counting - every modification to a std::string object must be
mediated by its class interface. So, at the very least, a non-const
std::string data() method would break every existing reference-counted
std::string implementation, by bypassing std::string's interface.

> Notice that issue #464 (which has also been included in the latest
> draft) adds to std::vector both const and non-const overloads of the
> member data(). One of the reasons for this addition is precisely to
> avoid the use of the &v[0] idiom. This makes the lack of a non-const
> data() in basic_string even more embarrassing.

On the contrary, imposing a specific internal data structure on a
std::string implementation - and then requiring that the interface provid=
e
direct public access to this internal representation - would be a colossa=
l
design embarrassment for C++ - and one that the language could probably
never be able to live down completely. By tossing the principles of data
encapsulation, public interfaces and object-oriented design out of the
window, a std::string object would be little more than - and no better th=
an
- an ordinary C struct after all.

A C++ program should use the class that best fits their needs. So if a
program needs a sequence of characters, it should use a std::vector<char>=
,
likewise for a stream or buffer of characters, a program should use a
std::stringstream. But if a program needs a string class that transcends
those two narrow concepts of a "string", then it should use a class objec=
t
that is just as transcendent: std::string.

Greg


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Markus Schoder <a3vr6dsg-usenet@yahoo.de>
Date: Sat, 2 Jun 2007 09:38:39 CST
Raw View
On Sat, 02 Jun 2007 03:32:00 +0000, Greg Herlihy wrote:
> On 5/31/07 12:30 PM, in article
> e%e7i.23624$%k.105776@twister2.libero.it, "Alberto Ganesh Barbati"
> <AlbertoBarbati@libero.it> wrote:
>
>> (this is a follow-up of thread "std::string::data()" in
>> comp.lang.c++.moderated)
>>
>> The latest draft (N2284) includes wording from issue #530, which adds a
>> requirement to std::basic_string: the string elements must be stored
>> contiguously in memory. It has been noticed in the cited thread that
>> such a requirement effectively makes it possible to use the expression
>> &s[0] to portably obtain a non-const pointer to the string internal
>> buffer.
>
> Not quite: according to N2284 &s[0] points to a character buffer that is
> neither modifiable (see   21.3.4) nor internal to the std::string object
> (see Table 39). Furthermore, there is no requirement that a std::string
> even has to maintain an internal character buffer as such in the first
> place.

Since s[0] = c does modify the string I cannot see how &s[0] could point
to anything but the internal buffer of the string.

>> basic_string has a const data() member, but not a non-const overload.
>> However, obtaining a non-const pointer to the internal buffer is
>> perhaps the #1 FAQ about basic_string. Whatever rationale there was for
>> not providing the non-const overload has now been superseded by #530,
>> IMHO. I say we should just provide it.
>
> What for? A non-const data() overload would create the false impression
> that the non-const data() overload could be used to modify the string
> itself - a sheer impossibility in light of a std::string's design
> guarantees. Specifically: in order to support std::string
> implementations that use reference-counting - every modification to a
> std::string object must be mediated by its class interface. So, at the
> very least, a non-const std::string data() method would break every
> existing reference-counted std::string implementation, by bypassing
> std::string's interface.

This would be handled as currently the s[0] = c case. Reference counted
implementations typically make a unique copy when non-const operator[] is
invoked -- non-const data() can do the same.

--
Markus Schoder

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "sebor@roguewave.com" <sebor@roguewave.com>
Date: Sun, 3 Jun 2007 10:12:32 CST
Raw View
On Jun 2, 9:38 am, Markus Schoder <a3vr6dsg-use...@yahoo.de> wrote:
> On Sat, 02 Jun 2007 03:32:00 +0000, Greg Herlihy wrote:
> > On 5/31/07 12:30 PM, in article
> > e%e7i.23624$%k.105...@twister2.libero.it, "Alberto Ganesh Barbati"
> > <AlbertoBarb...@libero.it> wrote:
>
> >> (this is a follow-up of thread "std::string::data()" in
> >> comp.lang.c++.moderated)
>
> >> The latest draft (N2284) includes wording from issue #530, which adds a
> >> requirement to std::basic_string: the string elements must be stored
> >> contiguously in memory. It has been noticed in the cited thread that
> >> such a requirement effectively makes it possible to use the expression
> >> &s[0] to portably obtain a non-const pointer to the string internal
> >> buffer.
>
> > Not quite: according to N2284 &s[0] points to a character buffer that is
> > neither modifiable (see    21.3.4) nor internal to the std::string object
> > (see Table 39). Furthermore, there is no requirement that a std::string
> > even has to maintain an internal character buffer as such in the first
> > place.
>
> Since s[0] = c does modify the string I cannot see how &s[0] could point
> to anything but the internal buffer of the string.
>
> >> basic_string has a const data() member, but not a non-const overload.
> >> However, obtaining a non-const pointer to the internal buffer is
> >> perhaps the #1 FAQ about basic_string. Whatever rationale there was for
> >> not providing the non-const overload has now been superseded by #530,
> >> IMHO. I say we should just provide it.
>
> > What for? A non-const data() overload would create the false impression
> > that the non-const data() overload could be used to modify the string
> > itself - a sheer impossibility in light of a std::string's design
> > guarantees. Specifically: in order to support std::string
> > implementations that use reference-counting - every modification to a
> > std::string object must be mediated by its class interface. So, at the
> > very least, a non-const std::string data() method would break every
> > existing reference-counted std::string implementation, by bypassing
> > std::string's interface.
>
> This would be handled as currently the s[0] = c case. Reference counted
> implementations typically make a unique copy when non-const operator[] is
> invoked -- non-const data() can do the same.

Correct. Non-const string accessors disable reference counting so that
code like in the snippet below behaves as expected:

    std::string a = "abc";
    char &c = a [0];
    std::string::iterator i = a.begin () + 1;
    std::string b = a;
    c = 'X';
    *i = 'Y';
    assert ("XYc" == a);
    assert ("abc" == b);


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Sun, 3 Jun 2007 18:53:49 GMT
Raw View
Greg Herlihy ha scritto:
> On 5/31/07 12:30 PM, in article e%e7i.23624$%k.105776@twister2.libero.i=
t,
> "Alberto Ganesh Barbati" <AlbertoBarbati@libero.it> wrote:
>=20
>> (this is a follow-up of thread "std::string::data()" in
>> comp.lang.c++.moderated)
>>
>> The latest draft (N2284) includes wording from issue #530, which adds =
a
>> requirement to std::basic_string: the string elements must be stored
>> contiguously in memory. It has been noticed in the cited thread that
>> such a requirement effectively makes it possible to use the expression
>> &s[0] to portably obtain a non-const pointer to the string internal bu=
ffer.
>=20
> Not quite: according to N2284 &s[0] points to a character buffer that i=
s
> neither modifiable (see =C2=A721.3.4) nor internal to the std::string o=
bject (see
> Table 39). Furthermore, there is no requirement that a std::string even=
 has
> to maintain an internal character buffer as such in the first place.

I'm sorry but =C2=A721.3.4 of N2284 is about capacity(), resize() etc. an=
d I
don't see how is it relevant, while table 39 is about allocators not
strings. Are you sure you referring to N2284? Would you care check your
references, please?

I base my claim on these statements:

=C2=A721.3.5/1: if s is not const and non-empty, s[0] returns a *modifiab=
le*
reference to *begin() and p =3D &s[0] is therefore a pointer that can be
dereference to modify the first element of the sequence.

=C2=A721.3.1/3: guarantees that p + n is the same as &*(begin() + n) so p=
[n]
can be used to modify the nth element of the sequence

>> basic_string has a const data() member, but not a non-const overload.
>> However, obtaining a non-const pointer to the internal buffer is perha=
ps
>> the #1 FAQ about basic_string. Whatever rationale there was for not
>> providing the non-const overload has now been superseded by #530, IMHO.
>> I say we should just provide it.
>=20
> What for? A non-const data() overload would create the false impression=
 that
> the non-const data() overload could be used to modify the string itself=
 - a
> sheer impossibility in light of a std::string's design guarantees.
> Specifically: in order to support std::string implementations that use
> reference-counting - every modification to a std::string object must be
> mediated by its class interface. So, at the very least, a non-const
> std::string data() method would break every existing reference-counted
> std::string implementation, by bypassing std::string's interface.

Adding non-const data() wouldn't break anything more than =C2=A721.3.5/1
already does. The standard already says that references, pointers and
iterators are invalidated by a call to data(). Simply calling data() can
just mark the string as unsharable. This must *already* occur for
operator[], because of this:

void foo()
{
  std::string s1 =3D "hello, world";
  char& c1 =3D s1[0];
  std::string s2(s1); // doesn't invalidate c1
  c1 =3D 'j'; // shall modify s1 but not s2
}

The example above shows that s1 and s2 can't share the same buffer. This
can be obtained even with a ref-counted implementation, by having
operator[] mark s1 as unsharable.

So why not data()?

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: pete@versatilecoding.com (Pete Becker)
Date: Sun, 3 Jun 2007 20:21:16 GMT
Raw View
Alberto Ganesh Barbati wrote:
>=20
> I'm sorry but =C2=A721.3.4 of N2284 is about capacity(), resize() etc. =
and I
> don't see how is it relevant, while table 39 is about allocators not
> strings. Are you sure you referring to N2284? Would you care check your
> references, please?
>=20

To avoid confusion, cite the tags, not the section numbers. Section=20
numbers can change with each draft. The subsection entitled=20
"basic_string capacity" is numbered 21.3.4 in the current draft, but=20
that can change. Its tag is [string.capacity], and that won't change.

--=20

 -- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and=20
Reference." (www.petebecker.com/tr1book)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]