Topic: std::string vs std::rope


Author: allan_w@my-dejanews.com (Allan W)
Date: Thu, 14 Nov 2002 22:43:28 +0000 (UTC)
Raw View
> allan_w@my-dejanews.com (Allan W) wrote
> > It isn't too difficult to get this right, but my
> > understanding is that a lot of otherwise good compilers come with
> > libraries that get this wrong.

kanze@gabi-soft.de (James Kanze) wrote
> Are you sure?

No, I'm not.

> I last heard about it being a problem something like
> seven years ago.

That's about the time I heard about it. I don't remember it being a problem
with our compiler at the time -- just something we heard about "some
compilers." In my mind, this marked it as a dangerous practice -- and it's
something I've been avoiding all these years.

In the meantime, compilers may have improved. Since I've been avoiding
the practice, I can't tell you if it's still dangerous.

> > As a result, many shops have a rule not to allow code such as the
> > above. Even if it works on the current compiler, it isn't portable --
> > even if the standard says it should be.
>
> If the error exists, it isn't trivial to avoid.  Especially if it also
> exists with iterators.

"Trivial?" Maybe not -- but it's not rocket science either. Simply don't
retain a pointer or reference to an element in a string (or any other
collection). Instead, retain a reference to the string (or collection)
itself and the index (or key or whatever). You can use this to get the
same information.

So instead of the OP's version:


f() {
   string s("abc");
   string t;
   char & c(s[1]);

   t = s; // Data typically shared between s and t.
   c = 'z';     // How many strings does this modify?
   if (t[1] == 'z') {
        printf("wrong\n");
   } else {     printf("right\n");
   }
}

I would use

    f() {
        string s("abc");
        string t;

        t = s;       // Data MIGHT be shared between s and t
        s[1] = 'z';  // But this should "unlink" them.
        if (t[1] == 'z') std::cout << "wrong!\n";
        else             std::cout << "right.\n";
    }

Implementations that fail to recognize a reference to a char inside the
string, still generally recognize the use of the index operator -- and
handle it appropriately.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: allan_w@my-dejanews.com (Allan W)
Date: Fri, 8 Nov 2002 23:00:26 +0000 (UTC)
Raw View
andys@evo6.com.NoSpam (Andy Sawyer) wrote
>  koji@open-tech.jp ("Koji Eguchi") wrote:
> > f() {
> >    string s("abc");
> >    string t;
> >    char & c(s[1]);
> >
> >    t = s; // Data typically shared between s and t.
> >    c = 'z';     // How many strings does this modify?
> >    if (t[1] == 'z') {
> >         printf("wrong\n");
> >    } else {
>       printf("right\n");
> >    }
> > }
> > but I'm interested in how other programmers deal with this before
> > changing my habit. It could possibly be the case where this problem
> > with std::basic_string has been rectified already... Please share
> > your experience/knowledge on this.
>
> This isn't a flaw in the specification of basic_string, but it may be
> a flaw in your library's implementation. Whilst basic_string is
> designed to make a reference-counting implementation possible, it
> certainly isn't mandated. It's trivial for a non-reference counting
> implementation to behave correctly in the above case. It's also
> possible (but less trivial) for a reference-counting implementation to
> behave correctly.

You're right, it isn't too difficult to get this right, but my
understanding is that a lot of otherwise good compilers come with
libraries that get this wrong. As a result, many shops have a rule not
to allow code such as the above. Even if it works on the current
compiler, it isn't portable -- even if the standard says it should be.

So far, that rule has been sufficient. We haven't had to use rope or
any other third-party package.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 13 Nov 2002 19:36:48 +0000 (UTC)
Raw View
koji@open-tech.jp ("Koji Eguchi") wrote in message
news:<aqdqku$93s4i$1@ID-168794.news.dfncis.de>...

> In years of C++ practice, I've been generally using std::string object
> for handling string values rather than null-terminated C pointer, and
> as far as my knowledge there hasn't been really a problem with
> it. Recently though I found an article on std::basic_string for its
> serious design problem with lifetimes of reference to internal
> characters and for that, it's not guaranteed to work 100% expectedly
> especially in multi threaded applications. (some code like this)

Nothing is guaranteed by the standard to work in a multi-threaded
environment.  If the compiler supported multi-threading as an extension
(and may do), then I would expect the standard library to work correctly
in that environment, with as a definition of correctly being "like
int".  (Typically, this means that if the object can be modified by any
thread, then all accesses must be protected.)

> f() {
>    string s("abc");
>    string t;
>    char & c(s[1]);

>    t = s; // Data typically shared between s and t.

Not in a conforming implementation.

>    c = 'z';     // How many strings does this modify?

Only s.

>    if (t[1] == 'z') {
>         printf("wrong\n");
>    } else {
>      printf("right\n");
>    }
> }

Have you tried this with some recent implementations.  If I recall
correctly, there was one (or a few) very early implementations which had
this problem, but both the problem and the solution have been known for
a long time now, and I doubt that the problem persists in any
reasonablly modern implementation.

At any rate, the standard specifies the semantics, and says that this
must work.  How the implementation achieves it is its business, not
yours.

> The same article suggests to use std::rope instead as one of
> workaround, but I'm interested in how other programmers deal with this
> before changing my habit.

I suspect most deal with it by ignoring it.  It's not a problem, except
with some very early, very broken libraries.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 13 Nov 2002 19:37:09 +0000 (UTC)
Raw View
allan_w@my-dejanews.com (Allan W) wrote in message
news:<7f2735a5.0211081428.4079bc19@posting.google.com>...
> andys@evo6.com.NoSpam (Andy Sawyer) wrote
> >  koji@open-tech.jp ("Koji Eguchi") wrote:
> > > f() {
> > >    string s("abc");
> > >    string t;
> > >    char & c(s[1]);

> > >    t = s; // Data typically shared between s and t.
> > >    c = 'z';     // How many strings does this modify?
> > >    if (t[1] == 'z') {
> > >         printf("wrong\n");
> > >    } else {
>     printf("right\n");
> > >    }
> > > }
> > > but I'm interested in how other programmers deal with this before
> > > changing my habit. It could possibly be the case where this
> > > problem with std::basic_string has been rectified
> > > already... Please share your experience/knowledge on this.

> > This isn't a flaw in the specification of basic_string, but it may
> > be a flaw in your library's implementation. Whilst basic_string is
> > designed to make a reference-counting implementation possible, it
> > certainly isn't mandated. It's trivial for a non-reference counting
> > implementation to behave correctly in the above case. It's also
> > possible (but less trivial) for a reference-counting implementation
> > to behave correctly.

> You're right, it isn't too difficult to get this right, but my
> understanding is that a lot of otherwise good compilers come with
> libraries that get this wrong.

Are you sure?  I last heard about it being a problem something like
seven years ago.  I've never noticed it being a problem with any library
I've used.  I've never tested explicitly for it, of course, and it's not
the sort of code I write, but I would imagine that a library which got
this wrong would also have problems with non-const iterators.

> As a result, many shops have a rule not to allow code such as the
> above. Even if it works on the current compiler, it isn't portable --
> even if the standard says it should be.

If the error exists, it isn't trivial to avoid.  Especially if it also
exists with iterators.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: koji@open-tech.jp ("Koji Eguchi")
Date: Thu, 7 Nov 2002 21:12:49 +0000 (UTC)
Raw View
Hi,

In years of C++ practice, I've been generally using std::string object for
handling string values rather than null-terminated C pointer, and as far as
my knowledge there hasn't been really a problem with it. Recently though I
found an article on std::basic_string for its serious design problem with
lifetimes of reference to internal characters and for that, it's not
guaranteed to work 100% expectedly especially in multi threaded
applications. (some code like this)

f() {
   string s("abc");
   string t;
   char & c(s[1]);

   t = s; // Data typically shared between s and t.
   c = 'z';     // How many strings does this modify?
   if (t[1] == 'z') {
        printf("wrong\n");
   } else {
        printf("right\n");
   }
}

The same article suggests to use std::rope instead as one of workaround, but
I'm interested in how other programmers deal with this before changing my
habit. It could possibly be the case where this problem with
std::basic_string has been rectified already... Please share your
experience/knowledge on this.

Koji



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: andys@evo6.com.NoSpam (Andy Sawyer)
Date: Fri, 8 Nov 2002 00:39:33 +0000 (UTC)
Raw View
In article <aqdqku$93s4i$1@ID-168794.news.dfncis.de>,
 on Thu, 7 Nov 2002 21:12:49 +0000 (UTC),
 koji@open-tech.jp ("Koji Eguchi") wrote:

> Hi,
>
> In years of C++ practice, I've been generally using std::string object for
> handling string values rather than null-terminated C pointer, and as far as
> my knowledge there hasn't been really a problem with it. Recently though I
> found an article on std::basic_string for its serious design problem with
> lifetimes of reference to internal characters and for that, it's not
> guaranteed to work 100% expectedly especially in multi threaded
> applications.

Nothing in standard C++ is guaranteed to work in multi-threaded
applications.

> (some code like this)
>
> f() {
>    string s("abc");
>    string t;
>    char & c(s[1]);
>
>    t = s; // Data typically shared between s and t.
>    c = 'z';     // How many strings does this modify?
>    if (t[1] == 'z') {
>         printf("wrong\n");
>    } else {
>         printf("right\n");
>    }
> }

To be honest, I find this code somewhat pathological. Whilst I might
expect to find it in a library test suite, I'd be shocked to find it
in production code.

> The same article suggests to use std::rope instead as one of
> workaround,

That's fine if you don't care about portability - std::rope isn't
standard and doesn't really belong in namespace std.

> but I'm interested in how other programmers deal with this before
> changing my habit. It could possibly be the case where this problem
> with std::basic_string has been rectified already... Please share
> your experience/knowledge on this.

This isn't a flaw in the specification of basic_string, but it may be
a flaw in your library's implementation. Whilst basic_string is
designed to make a reference-counting implementation possible, it
certainly isn't mandated. It's trivial for a non-reference counting
implementation to behave correctly in the above case. It's also
possible (but less trivial) for a reference-counting implementation to
behave correctly.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: francis.glassborow@ntlworld.com (Francis Glassborow)
Date: Fri, 8 Nov 2002 17:01:02 +0000 (UTC)
Raw View
In article <aqdqku$93s4i$1@ID-168794.news.dfncis.de>, Koji Eguchi
<koji@open-tech.jp> writes
>Hi,
>
>In years of C++ practice, I've been generally using std::string object for
>handling string values rather than null-terminated C pointer, and as far as
>my knowledge there hasn't been really a problem with it. Recently though I
>found an article on std::basic_string for its serious design problem with
>lifetimes of reference to internal characters and for that, it's not
>guaranteed to work 100% expectedly especially in multi threaded
>applications. (some code like this)
>
>f() {
>   string s("abc");
>   string t;
>   char & c(s[1]);
>
>   t = s;      // Data typically shared between s and t.
>   c = 'z';     // How many strings does this modify?
>   if (t[1] == 'z') {
>        printf("wrong\n");
>   } else {
>        printf("right\n");
>   }
>}
>
>The same article suggests to use std::rope instead as one of workaround, but
>I'm interested in how other programmers deal with this before changing my
>habit. It could possibly be the case where this problem with
>std::basic_string has been rectified already... Please share your
>experience/knowledge on this.
>
>Koji

I do not think the problem has been formally corrected and I believe it
is a general problem wherever a programmer elects to use a
reference/pointer to internal data of any object. However most of the
recent implementations of string avoid reference counting and hence
sharing of data representations.

Nonetheless such references/pointers will always make code more fragile
and as such need more justification than simple convenience. I think
that applies to rope as well as to string. (rope maybe stable against
appending, but what about for operations that shorten it? Such as the
assignment of an empty rope)


>

--
Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]