Topic: basic_string contiguity question


Author: "Paul Archard" <paul@spammenot.com>
Date: 1999/06/28
Raw View
Hi all,

I've searched all the docs I can find but I can't seem to get a _definitive_
answer on this...

Can one assume that the data in a basic_string is stored contiguously?  So,
for instance, is the following guaranteed to work for any basic_string s ...

    for (int n = 0; n < s.size(); n++)
    {
        assert( s.data() + n ==  (&(s.at(n))))
    }

In addition to this, is it acceptable to pass a "non-const
basic_string::iterator" to a function that wants a "non-const char *"?

In an ideal world every API would use strings, but we have a lot of cases
where we need to call functions (such as "read()") that want to write into a
buffer pointed to by a "char *".  Such cases either involve substituting
"basic_string::begin() + n", which intuitively feels "naughty", or creating
a temporary buffer and calling "copy()" all over the place, which is far
from efficient.

If the above approach is not portable, or not officially supported, what is
the "correct" way of doing it?

Thanks in advance for any input on this,

Paul Archard
< p_a_r_c_h @ w_o_r_k_f_i_r_e . c_o_m >
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: sbnaran@localhost.localdomain (Siemel Naran)
Date: 1999/06/29
Raw View
On 28 Jun 99 22:58:04 GMT, Paul Archard <paul@spammenot.com> wrote:

>Can one assume that the data in a basic_string is stored contiguously?  So,
>for instance, is the following guaranteed to work for any basic_string s ...

No.  An implementation may store a std::string as an array of substrings.


>    for (int n = 0; n < s.size(); n++)
>    {
>        assert( s.data() + n ==  (&(s.at(n))))
>    }

This code may always work.  I'm not sure.  The member functions data and
c_str perhaps force the class to create C style representations.  And I
don't know the difference between these two functions.


>In addition to this, is it acceptable to pass a "non-const
>basic_string::iterator" to a function that wants a "non-const char *"?

This is not acceptable.


>In an ideal world every API would use strings, but we have a lot of cases
>where we need to call functions (such as "read()") that want to write into a
>buffer pointed to by a "char *".  Such cases either involve substituting
>"basic_string::begin() + n", which intuitively feels "naughty", or creating
>a temporary buffer and calling "copy()" all over the place, which is far
>from efficient.

Maybe one ought not even use std::string at all.


>If the above approach is not portable, or not officially supported, what is
>the "correct" way of doing it?

--
----------------------------------
Siemel B. Naran (sbnaran@uiuc.edu)
----------------------------------
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Hyman Rosen <hymie@prolifics.com>
Date: 1999/06/29
Raw View
"Paul Archard" <paul@spammenot.com> writes:
> Can one assume that the data in a basic_string is stored contiguously?  So,
> for instance, is the following guaranteed to work for any basic_string s ...
>     for (int n = 0; n < s.size(); n++)
>     {
>         assert( s.data() + n ==  (&(s.at(n))))
>     }
> In addition to this, is it acceptable to pass a "non-const
> basic_string::iterator" to a function that wants a "non-const char *"?

No and no, according to the standard.

> In an ideal world every API would use strings, but we have a lot of cases
> where we need to call functions (such as "read()") that want to write into a
> buffer pointed to by a "char *".  Such cases either involve substituting
> "basic_string::begin() + n", which intuitively feels "naughty", or creating
> a temporary buffer and calling "copy()" all over the place, which is far
> from efficient.

You might try using vector<char> instead of string. Although the standard
does not say that vector elements are contiguous, committee members seem
to have agreed that this was an oversight, and such a requirement will be
issued as a correction at some point. I believe all known implementations
do, in fact, use contiguous storage.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James Kuyper <kuyper@wizard.net>
Date: 1999/06/29
Raw View
Paul Archard wrote:
>
> Hi all,
>
> I've searched all the docs I can find but I can't seem to get a _definitive_
> answer on this...
>
> Can one assume that the data in a basic_string is stored contiguously?
No. There's nothing you can do to find out, either. What data() returns
may be a pointer to a copy of that data, and not the actual data store
itself.

However, so long as you don't invalidate the pointer returned by data(),
what you're asking about will work. Section 21.3.6 p3 says that data()
"returns a pointer to the initial element of an array whose first size()
elements equal the corresponding elements of the string controlled by
*this." Furthermore, the const versions of at(pos) or operator[](pos)
are guaranteed to return data()[pos] (for valid values of pos).

The non-const versions, on the other hand, invalidate the pointer
returned by the previous call to data(), which means the implementation
is free, among other things, to deallocate the memory it points to.

> In addition to this, is it acceptable to pass a "non-const
> basic_string::iterator" to a function that wants a "non-const char *"?

No. basic_string::iterator is neither guaranteed to be "char *", nor is
is it required to be convertible to "char *". If you need writable
memory that's guaranteed contiguous, you should use an array, or a
std::valarray<>.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: "Larry Brasfield" <larrybr@seanet.com>
Date: 1999/06/29
Raw View
Paul Archard <paul@spammenot.com> wrote in message news:9eSd3.170$eq2.906691@bunson.tor.sfl.net...
> Hi all,
>
> I've searched all the docs I can find but I can't seem to get a _definitive_
> answer on this...

I can give you one.  Nowhere in the C++
standard is any guarantee made that
basic_string will store its data in a
contiguous sequence.

> Can one assume that the data in a basic_string is stored contiguously?

No.  One cannot assume that with any
basis except hope or quess work.

> So, for instance, is the following guaranteed to work for any basic_string s ...
>
>     for (int n = 0; n < s.size(); n++)
>     {
>         assert( s.data() + n ==  (&(s.at(n))))
>     }

That is guaranteed to work because
the data() method is guaranteed to
return a (const) pointer to size()
consecutive elements which are
a copy of the string content.  This
fact, however, does not guarantee
that the data in a basic_string is
stored contiguously.  In fact, when
data() is called, any outstanding
content iterators and pointers are
potentially invalidated according
to the standard.

> In addition to this, is it acceptable to pass a "non-const
> basic_string::iterator" to a function that wants a "non-const char *"?

That depends on how the pointer
is to be used.  If no arithmetic will
be done on it (such as ++ or --),
then it is OK to take the address
of the character referenced by the
iterator and use it until certain
operations invalidate the iterator.

> In an ideal world every API would use strings, but we have a lot of cases
> where we need to call functions (such as "read()") that want to write into a
> buffer pointed to by a "char *".  Such cases either involve substituting
> "basic_string::begin() + n", which intuitively feels "naughty",

That's not naughty because its
iterator supports random access
and is required to do so by the
C++ standard.

> or creating
> a temporary buffer and calling "copy()" all over the place, which is far
> from efficient.

Presumably, the content has to be
copied out of a filesystem buffer
anyway.  Library routines such as
string extract and getline are free
to be efficiently cognizant of tricks
that depend on the implementation.
If I were concerned about speed, I
would be sure to use the iterator
or sequence based creation and
insertion/append methods.  Other
than that, I don't think there is any
safe solution of the kind you seek.

> If the above approach is not portable, or not officially supported, what is
> the "correct" way of doing it?

Hard to say.  I limit myself to the
class methods and avoid doing
anything with pointers that might
be extracted except pass them
to const char * expecters making
short term use of them.

> Thanks in advance for any input on this,
You're welcome in retrospect.

--
Larry Brasfield
Above opinions may be mine alone.
X-Replace-Address
(Humans may reply at unundered larry_br@sea_net.com )
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James.Kanze@dresdner-bank.com
Date: 1999/06/30
Raw View
In article <slrn7ng478.efb.sbnaran@localhost.localdomain>,
  sbnaran@uiuc.edu wrote:
> On 28 Jun 99 22:58:04 GMT, Paul Archard <paul@spammenot.com> wrote:

    [...]
> >    for (int n = 0; n < s.size(); n++)
> >    {
> >        assert( s.data() + n ==  (&(s.at(n))))
> >    }

> This code may always work.  I'm not sure.  The member functions data
and
> c_str perhaps force the class to create C style representations.  And
I
> don't know the difference between these two functions.

The functions data and c_str do force the class to allocate a continuous
buffer with a copy of the characters.  They do not force it to maintain
this as the unique internal representation.  The implementation may
allocate the buffer on data, but continue to use its "native"
representation for all other functions, and free the buffer on the first
call to a function which invalidates the results of data.

--
James Kanze                         mailto:
James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter
Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86
27


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James.Kanze@dresdner-bank.com
Date: 1999/06/30
Raw View
In article <OBTd3.23688$%U.5409@news.rdc1.wa.home.com>,
  "Larry Brasfield" <larrybr@seanet.com> wrote:
> Paul Archard <paul@spammenot.com> wrote in message
news:9eSd3.170$eq2.906691@bunson.tor.sfl.net...
> > Hi all,
> >
> > I've searched all the docs I can find but I can't seem to get a
_definitive_
> > answer on this...
>
> I can give you one.  Nowhere in the C++
> standard is any guarantee made that
> basic_string will store its data in a
> contiguous sequence.
>
> > Can one assume that the data in a basic_string is stored
contiguously?
>
> No.  One cannot assume that with any
> basis except hope or quess work.
>
> > So, for instance, is the following guaranteed to work for any
basic_string s ...
> >
> >     for (int n = 0; n < s.size(); n++)
> >     {
> >         assert( s.data() + n ==  (&(s.at(n))))
> >     }
>
> That is guaranteed to work because
> the data() method is guaranteed to
> return a (const) pointer to size()
> consecutive elements which are
> a copy of the string content.

Note the word "copy".  The expression &s.at(n) will probably still
return a pointer to the original.  So there is no guarantee that this
will work.

> This
> fact, however, does not guarantee
> that the data in a basic_string is
> stored contiguously.  In fact, when
> data() is called, any outstanding
> content iterators and pointers are
> potentially invalidated according
> to the standard.

Which in practice probably means that the above expression contains
undefined behavior.  The compiler is free to evaluate the operands to
the == in whichever order it pleases.  If it evaluates the &s.at(n)
first, then s.data() invalidates the resulting pointer, the comparison
results in undefined behavior.

> > In addition to this, is it acceptable to pass a "non-const
> > basic_string::iterator" to a function that wants a "non-const char
*"?
>
> That depends on how the pointer
> is to be used.  If no arithmetic will
> be done on it (such as ++ or --),
> then it is OK to take the address
> of the character referenced by the
> iterator and use it until certain
> operations invalidate the iterator.

Let's not forget that [] is also pointer arithmetic.  In fact, in every
case I've seen where a function wanted a char*, rather than a simple
char, it was because it really wanted a pointer to the first element of
an array.  Whereas the result of &*iter is a pointer to a single char.

> > In an ideal world every API would use strings, but we have a lot of
cases
> > where we need to call functions (such as "read()") that want to
write into a
> > buffer pointed to by a "char *".  Such cases either involve
substituting
> > "basic_string::begin() + n", which intuitively feels "naughty",
>
> That's not naughty because its
> iterator supports random access
> and is required to do so by the
> C++ standard.

Again: if the function just wants a single char, you're OK.  If it
expects a pointer to the first element of a char[], you're not.

> > or creating
> > a temporary buffer and calling "copy()" all over the place, which is
far
> > from efficient.
>
> Presumably, the content has to be
> copied out of a filesystem buffer
> anyway.

What makes you suppose that?  All sorts of system library functions
require names (user id, passwords, hostname...).

> Library routines such as
> string extract and getline are free
> to be efficiently cognizant of tricks
> that depend on the implementation.
> If I were concerned about speed, I
> would be sure to use the iterator
> or sequence based creation and
> insertion/append methods.  Other
> than that, I don't think there is any
> safe solution of the kind you seek.
>
> > If the above approach is not portable, or not officially supported,
what is
> > the "correct" way of doing it?
>
> Hard to say.  I limit myself to the
> class methods and avoid doing
> anything with pointers that might
> be extracted except pass them
> to const char * expecters making
> short term use of them.

Good policy, but the const char* expecters should only get a pointer
returned from c_str() or data() -- most, if not all, of them will expect
a contiguous representation of a string behind the pointer, and not just
a single character.

--
James Kanze                         mailto:
James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter
Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86
27


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]