Thread

Topic: Standard implementation of basic_string

Author: Alberto Ganesh Barbati <AlbertoBarbati@libero.it>
Date: Thu, 19 Jan 2006 19:16:09 CST Raw View

Earl Purple wrote:
> std::transform( s.begin(), s.end(), s.begin(), ::tolower );
>
> will my output iterators become invalidated? And if not because there
> are two sets of iterators, will my changes be lost? (I'm writing to a
> copy, not the string I think I'm writing to).

The order of evaluation of function parameters is unspecified, but as
it's largely inessential to the discussion, let's suppose, for
simplicity, that it is left-to-right. Then the first call to s.begin()
will ensure, possibly by performing a copy, that the string is unshared
and then return an iterator to such string. The following calls to
s.end() and s.begin() will just return iterators to the same string as
it is now unshared. So far so good, no problem with single-threading.

However, the problem arises in a multi-threading environment, if another
thread holds a reference to s and that thread happens to perform a lazy
copy of s between the first call to s.begin() and the call to s.end().
(Notice that the "other" thread did not try to modify s!) Now s.end()
must again unshare the string, but it must do so without invalidating
the iterator already returned by s.begin()!

An even worse case is if the "other" thread tries to make a lazy copy
during the execution of std::transform (I would doubt the design of the
application in that case, but it might happen). In order to avoid a
disaster, each time the iterator is dereferenced I must check if the
string is still unshared and possibly copy it if it doesn't.

Yes, all of this can be done, actually, but for sure it doesn't come for
free. In particular, the iterator can't be a (simple wrapper over) a
char* as it could be with non-ref-counted implementations, so each
iteration will probably bear some additional overhead.

Now the question: what happens most often? A string copy or an iteration
over the string characters? The answer of course is "it depends". That's
why statements like "ref-counted strings are fast" do not have absolute
value because it really depends on the precise use-case. Sometimes they
are faster than other implementations, sometimes they are not.
Personally, I prefer non-ref-counted implementations, but that's just my
opinion.

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Fri, 20 Jan 2006 04:31:33 GMT Raw View

In article <1137596025.336161.21920@o13g2000cwo.googlegroups.com>,
 atgraham@gmail.com wrote:

> So I'll
> just ask the question: Is returning std::string from a function
> considered bad practice, as in the case of std::vector or std::map?  A
> reference-counted implementation has a big advantage over the
> alternative in this case, since returning std::string from a function
> is relatively guilt-free.

I'm going to assume when you say return by value, you mean from a local
variable with automatic storage, as opposed to from a member variable,
static variable or global variable:

std::string
foo()
{
    std::string s;
    ...
    return s;
}

Many compilers today will optimize this return away, constructing s in
the caller's preferred location (and this is true for any type, not just
std::string).  So if you're concerned about returning a heavy object by
value, you might want to see if the cost is actually free (this is
called RVO - Return Value Optimization).

That being said, if the client is assigning to an existing object
instead of constructing a new one, then a copy (assignment) will still
be performed:

std::string s;
s = foo();

Also, and unfortunately RVO is only allowed, but not mandated.  So if
you need a guaranteed cheap return across any compiler, you can not
depend on it.

It is my hope that C++0X will fix this with "move semantics".  With this
added feature, even if the compiler does not implement RVO, it will be
compelled to return the std::string by moving it instead of by copying
it.  And if the client is assigning from the function call, move
assignment (instead of copy assignment) will kick in.  Moving a
std::string essentially amounts to a shallow copy (or pointer
twiddling).  So the return will be free if RVO kicks in, and otherwise
be cheap - for truly enjoyable guilt-free return by value. :-)

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Fri, 20 Jan 2006 09:46:04 CST Raw View

Alberto wrote:
> However, the problem arises in a multi-threading environment, if another
> thread holds a reference to s and that thread happens to perform a lazy
> copy of s between the first call to s.begin() and the call to s.end().
> (Notice that the "other" thread did not try to modify s!) Now s.end()
> must again unshare the string, but it must do so without invalidating
> the iterator already returned by s.begin()!

But presumably you have user-side locked s, so no other thread can access
s anyway. This kind of locking is necessary for any STL container, so it's
not really about the weird iterator invalidation semantics: Even if you're
having a string implementation without COW, it's not likely to be thread
safe without user-side locks.

Jens

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Fri, 20 Jan 2006 15:51:01 GMT Raw View

Bo wrote:

> At the user code level, you don't know whether the internal
> representation is shared or not. So you need an internal lock.

Even if it's shared: The use count can be incremented atomically I
presume, and we modify the the representation only if it's not shared, in
which case we locked it already at the use-level. All other accesses are
read, which don't need any locking. So what's the problem here?

> So, when the user-level string is actually shared (or just might be)
> you end up with two locks.  :-)

That's also what Sutter thinks, but I don't get it. :)

Jens

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Fri, 20 Jan 2006 18:14:29 GMT Raw View

Anders schrieb:
> But I must warn you, in multi-thread safe (but artificial) tests the
> reference counted boost::shared_ptr<std::string> had almost always
> worse performance than the non reference counted std::string, even for
> long strings. Incrementing and decrementing the use-count in a thread
> safe way was slower than making a copy of the string, for strings
> shorter than ~4000 characters.

Though I think in the case of shared_ptr, this is because you have to
counts rather than one (a weak and a strong one). One count alone could be
incremented atomically and the boost people found a way around that
problem, so they can get rid of the locking in a future version I think.

Jens

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Bo Persson" <bop@gmb.dk>
Date: Sat, 21 Jan 2006 22:07:02 CST Raw View

"Jens Theisen" <jth01@arcor.de> skrev i meddelandet
news:9mC8ulEVciB@jens-theisen.de...
> Bo wrote:
>
>> At the user code level, you don't know whether the internal
>> representation is shared or not. So you need an internal lock.
>
> Even if it's shared: The use count can be incremented atomically I
> presume,

An atomic increment isn't free either, especially not on multi CPU
hardware. Enforcing cache coherency over 128 Itaniums, for example,
might take quite a long time.

> and we modify the the representation only if it's not shared, in
> which case we locked it already at the use-level. All other accesses
> are
> read, which don't need any locking. So what's the problem here?

The user code might know that the string is not shared, but the
internal code does not.

There might also be two strings, one in each thread, that share their
internal representation. The user code cannot know that, and will
belive that locks are not needed.

>
>> So, when the user-level string is actually shared (or just might
>> be)
>> you end up with two locks.  :-)
>
> That's also what Sutter thinks, but I don't get it. :)

It might be that the thought isn't originally mine.  :-)

Ok, so you might "get away" with one user level lock and one expensive
atomic increment. On some hardware, the atomic increment might require
a lock anyway, or some memory write fence operation. None of which is
free.

So we end up with the question: How much work are we really prepared
to do to avoid copying some strings? Is it really worth it?

Often, it seems that it is not!

Bo Persson

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: bop@gmb.dk ("Bo Persson")
Date: Mon, 23 Jan 2006 23:40:00 GMT Raw View

"Alberto Ganesh Barbati" <AlbertoBarbati@libero.it> skrev i
meddelandet news:r6Vzf.136561$65.3843520@twister1.libero.it...
> Earl Purple wrote:
>> std::transform( s.begin(), s.end(), s.begin(), ::tolower );
>>
>> will my output iterators become invalidated? And if not because
>> there
>> are two sets of iterators, will my changes be lost? (I'm writing to
>> a
>> copy, not the string I think I'm writing to).
>
> The order of evaluation of function parameters is unspecified, but
> as
> it's largely inessential to the discussion, let's suppose, for
> simplicity, that it is left-to-right. Then the first call to
> s.begin()
> will ensure, possibly by performing a copy, that the string is
> unshared
> and then return an iterator to such string. The following calls to
> s.end() and s.begin() will just return iterators to the same string
> as
> it is now unshared. So far so good, no problem with
> single-threading.
>
> However, the problem arises in a multi-threading environment, if
> another
> thread holds a reference to s and that thread happens to perform a
> lazy
> copy of s between the first call to s.begin() and the call to
> s.end().

The other thread is not allowed to do that, so the first call to
begin()/end() will have to mark the internal representation
unshareable. That will force the other thread to make a full copy of
the string.

Note that this will lose any benefits our first thread has from not
doing a copy. The adminstrative overhead will be there always, whether
there really is a second thread, or not.

Also, even if all strings are used in the same thread, some of them
will be marked unshareable anyway, just in case.

Reference counting as such might be a good idea, but std::string
having begin(), end() and operator[] returning iterators or references
to the internal representation, reduces the opportunites for actually
sharing the internal string while, at the same time, it adds to the
administrative overhead to keep track of the string's state.

Bo Persson

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Mon, 23 Jan 2006 23:42:15 GMT Raw View

Bo wrote:

> An atomic increment isn't free either, especially not on multi CPU
> hardware. Enforcing cache coherency over 128 Itaniums, for example,
> might take quite a long time.

Sounds plausible, though I still think that under a more normal
environment it should be much quicker than locking. However, that's not
what Sutter was talking about. He really meant a proper lock.

>> That's also what Sutter thinks, but I don't get it. :)

> It might be that the thought isn't originally mine.  :-)

> Ok, so you might "get away" with one user level lock and one expensive
> atomic increment. On some hardware, the atomic increment might require
> a lock anyway, or some memory write fence operation. None of which is
> free.

I'm actually not so much concerned about the speed. I would really like to
know if I missed an important point, since we're using a string
implementation without internal locking in multi-threaded code - and I
really hope it's safe. :)

> So we end up with the question: How much work are we really prepared
> to do to avoid copying some strings? Is it really worth it?

Yes, I can well believe that it's not worth it depending on the code
that's using the string library.

Jens

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Michiel.Salters@tomtom.com
Date: Mon, 23 Jan 2006 17:45:34 CST Raw View

Jens Theisen wrote:
> Alberto wrote:
> > However, the problem arises in a multi-threading environment, if another
> > thread holds a reference to s and that thread happens to perform a lazy
> > copy of s between the first call to s.begin() and the call to s.end().
> > (Notice that the "other" thread did not try to modify s!) Now s.end()
> > must again unshare the string, but it must do so without invalidating
> > the iterator already returned by s.begin()!
>
> But presumably you have user-side locked s, so no other thread can access
> s anyway. This kind of locking is necessary for any STL container, so it's
> not really about the weird iterator invalidation semantics: Even if you're
> having a string implementation without COW, it's not likely to be thread
> safe without user-side locks.

The problem is not locking s. The problem is locking s and any other
string
object which happens to share its representation. Just replace "a lazy
copy
of s" with "a lazy copy of a copy of s" in Alberto's example. Clearly
you
wouldn't lock s just because you're copying old_s?

HTH,
Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Wed, 25 Jan 2006 04:42:47 GMT Raw View

Michiel.Salters wrote:

> The problem is not locking s. The problem is locking s and any other
> string
> object which happens to share its representation. Just replace "a lazy
> copy
> of s" with "a lazy copy of a copy of s" in Alberto's example. Clearly
> you
> wouldn't lock s just because you're copying old_s?

That's a different problem, yes. There was already a discussion about it
in a different branch of this thread. It's documented in Sutter's More
Exceptional C++.

But as I already said, I have a problem with Sutter's argument.

Presuming the use count's increment is atomical, how should there be a
concurrent write access to a read access (or even another write access)?

On a use count of greater than 1, a write operation will create a new
copy, and that's locked on user side. On a use count of 1, user side
locking is sufficient.

Can you say something to this?

Jens

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: atgraham@gmail.com
Date: Tue, 17 Jan 2006 09:22:34 CST Raw View

Is it a requirement of the standard that std::basic_string is a
reference-counted container, or is this just the way some (most?)
implementers have decided to do it?

For example, when I'm coding for gcc, I almost always pass std::strings
around by value, because they're reference counted and I can expect it
to be very fast, since it won't copy the string unless I modify it.
Can I expect this behavior in every standards-compliant compiler, or
should I start passing by reference in order to be more portable (in
terms of efficiency)?

If the standard does not require it, how common are
non-reference-counted implementations of std::basic_string?  Are there
any good reasons for vendors not to provide a reference-counted
implementation?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ferdi.Smit@cwi.nl (Ferdi Smit)
Date: Tue, 17 Jan 2006 17:53:58 GMT Raw View

atgraham@gmail.com wrote:
> Is it a requirement of the standard that std::basic_string is a
> reference-counted container, or is this just the way some (most?)
> implementers have decided to do it?
>

It's not a requirement.

> For example, when I'm coding for gcc, I almost always pass std::strings
> around by value, because they're reference counted and I can expect it
> to be very fast, since it won't copy the string unless I modify it.
> Can I expect this behavior in every standards-compliant compiler, or
> should I start passing by reference in order to be more portable (in
> terms of efficiency)?

Why not pass by const-reference, after all, that's what you usually
really want, don't you? Even copying/destroying a reference counted
string takes time; passing a const-ref hardly takes any.

> If the standard does not require it, how common are
> non-reference-counted implementations of std::basic_string?  Are there
> any good reasons for vendors not to provide a reference-counted
> implementation?

Two cons that I can think of are thread-safety issues, and in some cases
efficiency. Very small strings might not work well with reference
counting. In fact, I vaguely remember a discussion about the whole
reference counted string in some book... I think it was one of the
"(More) Efficient/Effective C++" books. You might want to look that up.

--
Regards,

Ferdi Smit (M.Sc.)
Email: Ferdi.Smit@cwi.nl
Room: C0.07  Phone: 4229
INS3 Visualization and 3D Interfaces
CWI Amsterdam, The Netherlands

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: no.spam@no.spam.com (Maciej Sobczak)
Date: Tue, 17 Jan 2006 18:47:00 GMT Raw View

atgraham@gmail.com wrote:

> Is it a requirement of the standard that std::basic_string is a
> reference-counted container

No. It's just allowed.

> For example, when I'm coding for gcc, I almost always pass std::strings
> around by value

Why? Why not by const ref?

> Can I expect this behavior in every standards-compliant compiler

No, although it's not the "behavior" issue (the correctness of your
program should not depend on this) - it's the performance issue you
might have to worry about.

> Are there
> any good reasons for vendors not to provide a reference-counted
> implementation?

Yes, the ref-counting becomes fragile in the presence of multiple threads.


--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Wed, 18 Jan 2006 03:53:53 GMT Raw View

Ferdi schrieb:
> Two cons that I can think of are thread-safety issues, and in some cases
> efficiency. Very small strings might not work well with reference
> counting. In fact, I vaguely remember a discussion about the whole
> reference counted string in some book... I think it was one of the
> "(More) Efficient/Effective C++" books. You might want to look that up.

You're either thinking of the small string optimisation where the string
is stored in the string object itself (rather than a dynamicly allocated
impl) which would be even more efficient and is used in addition to
reference counting for bigger strings; or of the discussion about the odd
iterator invalidation semantics of std::string, which are due to allowing
reference counting. But that's one shouldn't rely on other semantics there
just because you're implementation might not do reference counting and can
guarantee more for obvious reasons.

I can't immediately see what the threading argument is, does somebody
know?

Jens


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Wed, 18 Jan 2006 03:55:15 GMT Raw View

atgraham@gmail.com wrote:
> Is it a requirement of the standard that std::basic_string is a
> reference-counted container, or is this just the way some (most?)
> implementers have decided to do it?

It is not a requirement and in fact there are non-ref-counted
implementations around, for example the one shipped with VC++ (aka
Dinkumware). In fact, I have the impression that non-ref-counted
implementations are indeed more popular than ref-counted ones, but I
don't know enough implementations to sustain such argument.

> For example, when I'm coding for gcc, I almost always pass std::strings
> around by value, because they're reference counted and I can expect it
> to be very fast, since it won't copy the string unless I modify it.
> Can I expect this behavior in every standards-compliant compiler, or
> should I start passing by reference in order to be more portable (in
> terms of efficiency)?

Ref-counted implementations might be cheaper to copy (and I stress
"might") but there are other costs that other implementation don't have
to pay. So you should not rely on this kind of assumption to measure a
program performances. You'd better off using const references where
possible, thus by-passing the problem.

> If the standard does not require it, how common are
> non-reference-counted implementations of std::basic_string?  Are there
> any good reasons for vendors not to provide a reference-counted
> implementation?

In addition to threading issues, already mentioned by other posters,
ref-counted implementations have problem handling this case:

  std::string a("bar");
  std::string b(a); // ref-counted copy?
  char& c = a[2];
  c = 'z'; // should change only a, not b
  assert(a == "baz");
  assert(b == "bar");

in order to have the last line keep b intact for a ref-counted
implementation, you need operator[] to check if it's time to perform a
copy. The check must occur even if the reference is not used to write,
because the implementation has no way to detect what the reference will
be used for. Thus operator[] has some inevitable overhead that a
non-ref-counted implementation don't have. Moreover, when the copy
occurs, operator[] doesn't perform in constant time. I know that the
standard doesn't require it, yet it's something a programmer might
expect (at least std::vector provides such guarantee...)

A similar argument applies to iterators. A non-ref-counted
implementation can implement begin() and end() in constant time while
providing iterators that are (as fast as) bare pointers. Ref-counted
implementations cannot: either you allow begin() and end() to copy the
string (thus failing the constant time guarantee) or you store a pointer
to the string object into the iterator and check for a copy on every
modify attempt (thus putting some overhead on each dereference operation).

HTH,

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: roberts.noah@gmail.com
Date: Tue, 17 Jan 2006 21:53:44 CST Raw View

atgraham@gmail.com wrote:
> Is it a requirement of the standard that std::basic_string is a
> reference-counted container, or is this just the way some (most?)
> implementers have decided to do it?

Implementations are free to choose.
>
> For example, when I'm coding for gcc, I almost always pass std::strings
> around by value, because they're reference counted and I can expect it
> to be very fast, since it won't copy the string unless I modify it.
> Can I expect this behavior in every standards-compliant compiler, or
> should I start passing by reference in order to be more portable (in
> terms of efficiency)?

No, you cannot expect reference counted strings.  Those strings could
be getting copied every time.
>
> If the standard does not require it, how common are
> non-reference-counted implementations of std::basic_string?

MSVC++ for one does not use reference counting...they use a short
string optimization technique and strings are copied every time they
are assigned (from what I understand anyway).

  Are there
> any good reasons for vendors not to provide a reference-counted
> implementation?

Because they want to use a different set of algorithms.  There are
pluses and minuses for every representation.  You have to weigh how
things are going to be used most often.  GCC chose one way, Dinkumware
chose another.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Ferdi Smit" <Ferdi.Smit@cwi.nl>
Date: Wed, 18 Jan 2006 09:29:38 CST Raw View

"Jens Theisen" <jth01@arcor.de> wrote in message
news:9m07TBpkciB@jens-theisen.de...
> You're either thinking of the small string optimisation where the string
> is stored in the string object itself (rather than a dynamicly allocated
> impl) which would be even more efficient and is used in addition to
> reference counting for bigger strings; or of the discussion about the odd
> iterator invalidation semantics of std::string, which are due to allowing
> reference counting. But that's one shouldn't rely on other semantics there
> just because you're implementation might not do reference counting and can
> guarantee more for obvious reasons.
>
> I can't immediately see what the threading argument is, does somebody
> know?

It's the obvious issue when the same string is copied/deleted in multiple
threads at the same time. The reference count is subject to race conditions,
ie. the count appears zero to one thread, so it prepares to delete it, while
at the same time another thread starts running and makes a copy just in
time. The first thread now deletes the string (with ref count != 0 at this
moment), and the second is left with a dangling pointer. This might not be
the exact order of events, but you get the idea.

A solution would be to use locking, but it's hard to write a standard
library that makes assumptions about the operating system. In any case, it
would slow things down for other applications that do not need locking.

I looked it up, and the discussion about a reference counted string and
reference counting in general can be found in 'More effective C++' item 29,
pages 183-213. Needless to say I'm not going to repeat 30 pages of
discussion here. Also see "More exceptional C++" Appendix A: "Optimizations
that aren't (in a multi-threaded world)". p247-261.

--
Regards,

Ferdi Smit


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: atgraham@gmail.com
Date: Wed, 18 Jan 2006 09:45:04 CST Raw View

> in order to have the last line keep b intact for a ref-counted
> implementation, you need operator[] to check if it's time to perform a
> copy. The check must occur even if the reference is not used to write,
> because the implementation has no way to detect what the reference will
> be used for. Thus operator[] has some inevitable overhead that a
> non-ref-counted implementation don't have. Moreover, when the copy
> occurs, operator[] doesn't perform in constant time. I know that the
> standard doesn't require it, yet it's something a programmer might
> expect (at least std::vector provides such guarantee...)

I understand that every non-const method of std::string must incur
overhead to determine if a dup() is required.  However,
reference-counting is a lazy-copy implementation where the actual
string copy doesn't happen until it's absolutely required by the
context.  Yes, operator[] can be a linear-time operation, but operator=
is constant time, and operator= will happen at least as many times as
operator[] will incur linear time overhead, for a total net savings.
This shifting around of workloads does make the performance less
predictable and thus is less desirable in some cases.

But I guess this argument can be made for any object or data structure.
 Sometimes object assignment doesn't require that a physical copy be
made.  Conventional use of a data structure can vary drastically
depending on whether or not it's reference-counted, and one place where
this is most relevant is in objects returned from functions.  So I'll
just ask the question: Is returning std::string from a function
considered bad practice, as in the case of std::vector or std::map?  A
reference-counted implementation has a big advantage over the
alternative in this case, since returning std::string from a function
is relatively guilt-free.

Thanks for your input.  My coding style will change (for the better)
because of this conversation.  That was my aim.

Aaron

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Bo Persson" <bop@gmb.dk>
Date: Wed, 18 Jan 2006 11:36:05 CST Raw View

<atgraham@gmail.com> skrev i meddelandet
news:1137596025.336161.21920@o13g2000cwo.googlegroups.com...
>
> I understand that every non-const method of std::string must incur
> overhead to determine if a dup() is required.  However,
> reference-counting is a lazy-copy implementation where the actual
> string copy doesn't happen until it's absolutely required by the
> context.

No, i might also incur extra copies, when it cannot determine if a
copy is needed. Returning a reference or iterator to string element,
makes that string unsharable until that reference or iterator is
invalidated.

> Yes, operator[] can be a linear-time operation, but operator=
> is constant time, and operator= will happen at least as many times
> as
> operator[] will incur linear time overhead, for a total net savings.

No, after using an operator[], operator= might become linear. You
never know!

Try this

string a = "abc";
char& c = a[1];

string b = a;
char& d = b[1];

d = 'x';

Now, does c and d refer to same character? Of couse not. When does the
unsharing take place?

Are you prepared to handle a std::bad_alloc when the copy-on-write is
attempted? Where does it happen?

> This shifting around of workloads does make the performance less
> predictable and thus is less desirable in some cases.

Exactly.  :-)


Reference counting might be an optimization for applications that
often pass very long strings by value. How common is that really?

>
> But I guess this argument can be made for any object or data
> structure.
> Sometimes object assignment doesn't require that a physical copy be
> made.  Conventional use of a data structure can vary drastically
> depending on whether or not it's reference-counted, and one place
> where
> this is most relevant is in objects returned from functions.  So
> I'll
> just ask the question: Is returning std::string from a function
> considered bad practice, as in the case of std::vector or std::map?
> A
> reference-counted implementation has a big advantage over the
> alternative in this case, since returning std::string from a
> function
> is relatively guilt-free.

Do you need to do this frequently, in performance criticial sections
of the program?


Bo Persson



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Wed, 18 Jan 2006 15:39:20 CST Raw View

atgraham@gmail.com wrote:
> I understand that every non-const method of std::string

Nothing in the standard prohibits std::string from having
mutable members, so in principle every method call on a
string requires locking when the same string can be accessed
simultaneously from multiple threads.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hyrosen@mail.com (Hyman Rosen)
Date: Thu, 19 Jan 2006 05:48:36 GMT Raw View

Hyman Rosen wrote:
> Nothing in the standard prohibits std::string from having
> mutable members

And in fact, even without mutable members, nothing stops
const member functions from manipulating other parts of
the global program state, again requiring locking for
simultaneous access.

Note that this locking is required in *your* program. If
you want your code to be portable (if that means anything
when threads are involved), you cannot assume implementations
won't do that.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Thu, 19 Jan 2006 16:02:37 GMT Raw View

Ferdi wrote:
> A solution would be to use locking, but it's hard to write a standard
> library that makes assumptions about the operating system. In any case, it
> would slow things down for other applications that do not need locking.

Surely you'll do user-side locking anyway on a thread-unsafe type.
std::vector does not use reference counting but does that make it thread
safe?

However, I found some discussion in you're second reference which tries to
show another aspect: There it's claimed that you need internal locking in
the string implementation in order to have synchronisation that couldn't
be provided by user-side locking alone (though you need that in addition
to it).

I can't quite understand the argument there though. Since if the
implementations use count greater than 1, you're copying it anyway and if
not, user-side locking will lock out any other thread.

I'd really like to know wether I missed something (likely).

Jens

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jth01@arcor.de (Jens Theisen)
Date: Thu, 19 Jan 2006 16:02:36 GMT Raw View

atgraham wrote:
> I understand that every non-const method of std::string must incur
> overhead to determine if a dup() is required.

And of course you can always use the const ones:

eg. const_cast< std::string const& >(my_string)[i]

Jens


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Bo Persson" <bop@gmb.dk>
Date: Thu, 19 Jan 2006 12:17:20 CST Raw View

"Jens Theisen" <jth01@arcor.de> skrev i meddelandet
news:9m87wof+ciB@jens-theisen.de...
> Ferdi wrote:
>> A solution would be to use locking, but it's hard to write a
>> standard
>> library that makes assumptions about the operating system. In any
>> case, it
>> would slow things down for other applications that do not need
>> locking.
>
> Surely you'll do user-side locking anyway on a thread-unsafe type.
> std::vector does not use reference counting but does that make it
> thread
> safe?
>
> However, I found some discussion in you're second reference which
> tries to
> show another aspect: There it's claimed that you need internal
> locking in
> the string implementation in order to have synchronisation that
> couldn't
> be provided by user-side locking alone (though you need that in
> addition
> to it).
>
> I can't quite understand the argument there though. Since if the
> implementations use count greater than 1, you're copying it anyway
> and if
> not, user-side locking will lock out any other thread.

At the user code level, you don't know whether the internal
representation is shared or not. So you need an internal lock.

At the internal level, you don't know whether there is a use-level
lock, so you have to lock always, just in case. Even for thread local
strings!


So, when the user-level string is actually shared (or just might be)
you end up with two locks.  :-)


Bo Persson


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Anders Dalvander" <google@dalvander.com>
Date: Thu, 19 Jan 2006 12:37:21 CST Raw View

atgraham@gmail.com wrote:
> Is it a requirement of the standard that std::basic_string is a
> reference-counted container, or is this just the way some (most?)
> implementers have decided to do it?

It is not a requirement that std::basic_string is reference counted,
but you can always wrap your std::basic_string in a
std::tr1::shared_ptr (or boost::shared_ptr if your compiler doesn't
implement std::tr1). Agreed: It is not the same thing, but similar.

But I must warn you, in multi-thread safe (but artificial) tests the
reference counted boost::shared_ptr<std::string> had almost always
worse performance than the non reference counted std::string, even for
long strings. Incrementing and decrementing the use-count in a thread
safe way was slower than making a copy of the string, for strings
shorter than ~4000 characters.

Cheers,
Anders

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Earl Purple" <earlpurple@gmail.com>
Date: Thu, 19 Jan 2006 14:02:34 CST Raw View

Alberto Ganesh Barbati wrote:
>
>
> In addition to threading issues, already mentioned by other posters,
> ref-counted implementations have problem handling this case:
>
>   std::string a("bar");
>   std::string b(a); // ref-counted copy?
>   char& c = a[2];
>   c = 'z'; // should change only a, not b
>   assert(a == "baz");
>   assert(b == "bar");
>
<snip>
> A similar argument applies to iterators. A non-ref-counted
> implementation can implement begin() and end() in constant time while
> providing iterators that are (as fast as) bare pointers. Ref-counted
> implementations cannot: either you allow begin() and end() to copy the
> string (thus failing the constant time guarantee) or you store a pointer
> to the string object into the iterator and check for a copy on every
> modify attempt (thus putting some overhead on each dereference operation).
>
> HTH,
>
> Ganesh
>

I also wonder whether this could be dangerous using ref-counted
strings:

I have a string s, I want to convert it to lower-case so I innocently
do:

std::transform( s.begin(), s.end(), s.begin(), ::tolower );

will my output iterators become invalidated? And if not because there
are two sets of iterators, will my changes be lost? (I'm writing to a
copy, not the string I think I'm writing to).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: bop@gmb.dk ("Bo Persson")
Date: Fri, 20 Jan 2006 00:38:01 GMT Raw View

"Earl Purple" <earlpurple@gmail.com> skrev i meddelandet
news:1137694618.664584.204790@g43g2000cwa.googlegroups.com...

> I also wonder whether this could be dangerous using ref-counted
> strings:
>
> I have a string s, I want to convert it to lower-case so I
> innocently
> do:
>
> std::transform( s.begin(), s.end(), s.begin(), ::tolower );
>
> will my output iterators become invalidated? And if not because
> there
> are two sets of iterators, will my changes be lost? (I'm writing to
> a
> copy, not the string I think I'm writing to).

You might be writing to a copy, but you don't know that.

Any unsharing of the internal representation must be performed before
returning the first non-const iterator. After that you are guaranteed
to have your own copy of the string representation.


Bo Persson


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]