Topic: Can basic_string::operator= throw an exception?


Author: jkanze@otelo.ibmmail.com
Date: 1998/11/10
Raw View
In article <71vr6r$441$1@shell7.ba.best.com>,
  ncm@nospam.cantrip.org (Nathan Myers) wrote:

> Any carefully-implemented string, whether reference-counted or
> not, needs only a single allocation.

I'm not sure that I get what you are trying to say.  It is, of course,
possible to implement a string, reference counted or deep copy, with
only one allocation for a new string.  Whether this is a good implementation
or not probably depends on the use; I don't think that one allocation, as
such, is an interesting goal.

It is a simple optimization over the
trivial implementation, or course, and I've used the trick in my own
String class for at least the last 7 or 8 years.  But I've never really
tried to tune my String class; I rather suspect that the optimal solution
would involve using a fixed length allocator (which can be very fast) for
the StringImpl class, maintaining the active StringImpl's in a linked
list, and using copying garbage collection for the actual text data --
the linked list of StringImpl's allows the garbage collector to find all
of the pointers very quickly, and the copying will reduce fragmentation to
zero, and permit a trivial allocation strategy 99% of the time.

> However, either uses
> rather more storage than 12 bytes if there are any characters
> in the string.  Consider the pseudocode:
>
>   Copy-on-write:
>
>     struct String_representation {
>       int reference_count;
>       size_t capacity;
>       size_t size;
>       // characters stored at this+1
>     };

It depends on your String class.  My String class was const; the only
function which modified it was operator= (which of course changed the
StringImpl).  So there was no need for a capacity member.  (Like Java,
I used a StringBuffer class to build strings.  Except that mine was
much simpler than Java's, and had a built-in 128 byte buffer, which meant
that 99% of the time, in my applications, it never needed dynamic memory.)
On the other hand, if you are implementing the semantics of std::string,
you will need some sort of lock to prevent sharing once an iterator
or a reference into the string has been created.  (This could conceivably
be stuffed into the same word as the reference count, but such tricks would
result in a noticeable slowing down of the class.)

--
James Kanze                         mailto: kanze@gabi-soft.fr

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Martin von Loewis <loewis@informatik.hu-berlin.de>
Date: 1998/11/06
Raw View
Richard Browne <richb@pobox.com.au> writes:

> Just curious then. Does the standard even mention reference counting, or is
> it purely an implementation thing? I imagine all implementations would use
> reference counting, it's such an obvious optimisation. I know this doesn't
> guarantee anything, but it's an assumption I think I'd be comfortable with.

[lib.basic.string]/6 has a (non-normative) Note:

>> [Note: These rules are formulated to allow, but not require, a
>> reference counted implemenation. A reference counted implementation
>> must have the same semantics as a non=ADreference counted
>> implementation. ... ]

It then has an example that if you modify one string, other strings
should not change (copy-on-write).

Why would you require reference counting? In the standard, you'd first
have to define what you consider reference counting. For example,
garbage collection might also qualify as a valid implementation.

If you are concerned with the perfomance of assignment operations:
This would usually be specified in terms of computational complexity
statements, instead of data structure requirements. For strings, there
are no guarantees about complexity.

Please note that there is a problem with reference-counted
implementations: thread-safety. If an application is multithreaded,
and accesses strings often, but assigns to them rarely, a copying
implementation is more efficient than a reference-counting one.

Hope this helps,
Martin


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: hsutter@peerdirect.com (Herb Sutter)
Date: 1998/11/06
Raw View
On 06 Nov 98 06:25:47 GMT, Richard Browne <richb@pobox.com.au> wrote:
>Just curious then. Does the standard even mention reference counting, or is
>it purely an implementation thing?

The standard basic_string is deliberately designed to allow (but not
require) implementations that use reference-counting.

>I imagine all implementations would use
>reference counting, it's such an obvious optimisation.

Whether reference-counted (copy-on-write, or COW) strings are actually
an optimization is far from obvious. Whether COW wins or loses depends
entirely on how strings are used in a given program, and in many
real-world programs COW is a pessimization, more often so when thread
safety is required.

For more information, see the Guru of the Week #45 Solution thread on
comp.lang.c++.moderated, in case you've missed all the fun and ire of
this recent (and still continuing) debate about COW. :)

Herb


---
Herb Sutter (mailto:hsutter@peerdirect.com)

PeerDirect Inc.     2695 North Sheridan Way, Suite 150
www.peerdirect.com  Mississauga Ontario Canada L5K 2N6


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: william.kempf@firstdatacorp.com
Date: 1998/11/06
Raw View
In article <364237EF.5DC1E8CE@pobox.com.au>,
  Richard Browne <richb@pobox.com.au> wrote:
> Martin von Loewis wrote:
> > Coming to your question: Neither the copy constructor nor the
> > assignment operator have an exception specification. This means that
> > the assignment may throw any exception.
>
> Just curious then. Does the standard even mention reference counting, or is
> it purely an implementation thing? I imagine all implementations would use
> reference counting, it's such an obvious optimisation. I know this doesn't
> guarantee anything, but it's an assumption I think I'd be comfortable with.

No, and they shouldn't mention it.  The standard should never discuss how
something should be implemented, just what the public interface and expected
behaviors would be.

Further, I would _not_ assume that an implementation uses reference counting.
Reference counting is not always as easy to implement as you think.
Especially not with classes such as basic_string that give you direct access
to the data, even through iterators.  Many implementations don't use
reference counting for basic_string, and many that do are broken.  You can
find a lot of discussion on this topic by searching DejaNews.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/11/06
Raw View
In article <364237EF.5DC1E8CE@pobox.com.au>,
  Richard Browne <richb@pobox.com.au> wrote:
> Just curious then. Does the standard even mention reference counting, or is
> it purely an implementation thing?

It is purely an implementation thing. In fact, due to the way that the
standard is written, it is a very difficult pure-implementation thing.
(There have been proposals to make it a bit easier, but most of them
were too late to be considered for the standard.)

> I imagine all implementations would use
> reference counting, it's such an obvious optimisation.

It is obvious, and for most programs it's either an optimization or
else it's total effect is neutral. That's why lots of vendors have
been struggling with this, trying to do copy-on-write and still be
conforming.

However, it's not a win for every single program. For programs that
have a large number of short unique strings -- for instance, a
symbol parser for C++ -- the reference counts just become extra
overhead that slow things down a bit, sometimes in unpredictable
places.

> I know this doesn't
> guarantee anything, but it's an assumption I think I'd be comfortable with.

The safe thing to do is to assume that strings *MIGHT* be reference-
counted, without actually relying on it. For instance, let's say that
you have a system which heavily uses a three-state variable: true,
false, or unknown. If you assume that strings are reference-counted,
you might be tempted to code this:
    #define ASIZE(x) (sizeof(x)/sizeof(x[0]))
    typedef string three;
    three three_true("true");
    three three_false("false");
    three three_unknown("unknown");
    extern three status[5000];
    // ...
        for (int i=0; i<ASIZE(status); ++i) status[i]=three_unknown;
    // ...
Your thinking goes something like this: Since strings are reference-
counted, each duplicate string is nothing more than a pointer to
the reference.

But that would be a bad idea, because on some systems strings are
not reference-counted. So each string holds a pointer (let's assume
that's 4 bytes) to it's own 8 bytes of unique memory. Allocating a
new three object on the heap requires 2 allocations totalling 12
bytes.

If you assume that strings MIGHT be reference-counted, you're more
likely to bite the bullet -- go ahead and implement class three.
There were lots of operations already defined for string that you
must now define for operator three -- operator==, operator<<,
and so on -- but the result is not only smaller and faster, but
also easier to understand. In the end, this saves you money.

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/11/07
Raw View
<AllanW@my-dejanews.com> wrote:
>The safe thing to do is to assume that strings *MIGHT* be reference-
>counted, without actually relying on it.
>    // ...
>Your thinking goes something like this: Since strings are reference-
>counted, each duplicate string is nothing more than a pointer to
>the reference.
>
>But that would be a bad idea, because on some systems strings are
>not reference-counted. So each string holds a pointer (let's assume
>that's 4 bytes) to it's own 8 bytes of unique memory. Allocating a
>new three object on the heap requires 2 allocations totalling 12
>bytes.

Any carefully-implemented string, whether reference-counted or
not, needs only a single allocation.  However, either uses
rather more storage than 12 bytes if there are any characters
in the string.  Consider the pseudocode:

  Copy-on-write:

    struct String_representation {
      int reference_count;
      size_t capacity;
      size_t size;
      // characters stored at this+1
    };
    struct String {
      char* string;
      String_representation& rep()
        { return ((String_representation*)string)[-1]; }
    };

  Plain:

    struct String_representation {
      char* string;
      size_t size;
      size_t capacity;
    };

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James Kuyper <kuyper@wizard.net>
Date: 1998/11/07
Raw View
Richard Browne wrote:
>
> Martin von Loewis wrote:
>
> > This is not the assignment operator, this is the copy constructor.
>
> Ah yes, thanks.
>
> > Coming to your question: Neither the copy constructor nor the
> > assignment operator have an exception specification. This means that
> > the assignment may throw any exception.
>
> Just curious then. Does the standard even mention reference counting, or is
> it purely an implementation thing? I imagine all implementations would use
> reference counting, it's such an obvious optimisation. I know this doesn't
> guarantee anything, but it's an assumption I think I'd be comfortable with.

Section 21.3, paragraph 6 says:
| These rules are formulated to allow, but not require, a reference
| counted implementation. A reference counted implementation must have
| the same semantics as a non-reference counted implementation. Example:
|
| string s1("abc");
|
| string::iterator i = s1.begin();
| string s2 = s1;
|
| *i = 'a'; // Must modify only s1
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Valentin Bonnard <bonnardv@pratique.fr>
Date: 1998/11/07
Raw View
<AllanW@my-dejanews.com> writes:

> The safe thing to do is to assume that strings *MIGHT* be reference-
> counted, without actually relying on it.

Or to write code which don't depends on such implementation details
in any ways...

> For instance, let's say that
> you have a system which heavily uses a three-state variable: true,
> false, or unknown. If you assume that strings are reference-counted,
> you might be tempted to code this:
>     #define ASIZE(x) (sizeof(x)/sizeof(x[0]))
>     typedef string three;
>     three three_true("true");
>     three three_false("false");
>     three three_unknown("unknown");
>     extern three status[5000];
>     // ...
>         for (int i=0; i<ASIZE(status); ++i) status[i]=three_unknown;
>     // ...
[...]
> If you assume that strings MIGHT be reference-counted, you're more
> likely to bite the bullet -- go ahead and implement class three.
> There were lots of operations already defined for string that you
> must now define for operator three -- operator==, operator<<,
> and so on -- but the result is not only smaller and faster, but
> also easier to understand. In the end, this saves you money.

Or write:

enum three { three_true, three_false, three_unknown };

and you get operator=, operator< for free.

--

Valentin Bonnard                mailto:bonnardv@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Valentin Bonnard <bonnardv@pratique.fr>
Date: 1998/11/07
Raw View
Richard Browne <richb@pobox.com.au> writes:

> Martin von Loewis wrote:

> > Coming to your question: Neither the copy constructor nor the
> > assignment operator have an exception specification. This means that
> > the assignment may throw any exception.
>
> Just curious then. Does the standard even mention reference counting, or is
> it purely an implementation thing?

Well, we made efforts to make reference counting possible. So,
while it's theorically purelly an implementation issue, if you
read between the lines, you will see the words 'reference
counting' written in the standard. (Given strong garanties
about the interface of an abstract type, like the ones you
find in the STL, there is often just one set of isomorphic
implementations. With std::string there are two: not reference
counted, and reference counted, with many small variants.)

Independance between the interface and the implementation is
a joke in the STL. But I am not and I have never criticised
that.

For example:

std::string a = "a", b = a;
const int& r = static_cast<const std::string&> (b)[0];
b[0] = 'b';
cout << (r == b);

This code would print true with any usual containner, but
is undefined for string and will print false on a traditionnal
ref counted implementation. This is because r refers to a[0],
and &a[0] != &b[0] after the assignement.

> I imagine all implementations would use
> reference counting, it's such an obvious optimisation.

No, many won't, many don't, and it isn't obvious at all
and may slow down the program.

> I know this doesn't guarantee anything, but it's an assumption
> I think I'd be comfortable with.

Yes, like assuming there are far and near pointers, and int is
16 bits...

--

Valentin Bonnard                mailto:bonnardv@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Richard Browne <richb@pobox.com.au>
Date: 1998/11/05
Raw View
What does the standard have to say about this:

void thing(const std::string& s1)
{
    std::string s2 = s1;
}

Can this function throw an exception, or does the standard guarantee
that the assignment will work (ie. that basic_string uses reference
counting)....?

Thanks standard-gurus.



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Martin von Loewis <loewis@informatik.hu-berlin.de>
Date: 1998/11/05
Raw View
Richard Browne <richb@pobox.com.au> writes:

> What does the standard have to say about this:
>
> void thing(const std::string& s1)
> {
>     std::string s2 = s1;
> }
>
> Can this function throw an exception, or does the standard guarantee
> that the assignment will work (ie. that basic_string uses reference
> counting)....?

This is not the assignment operator, this is the copy constructor.

Coming to your question: Neither the copy constructor nor the
assignment operator have an exception specification. This means that
the assignment may throw any exception.

As usual, implementations may make stronger guarantees.

Regards,
Martin


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Richard Browne <richb@pobox.com.au>
Date: 1998/11/06
Raw View
Martin von Loewis wrote:

> This is not the assignment operator, this is the copy constructor.

Ah yes, thanks.

> Coming to your question: Neither the copy constructor nor the
> assignment operator have an exception specification. This means that
> the assignment may throw any exception.

Just curious then. Does the standard even mention reference counting, or is
it purely an implementation thing? I imagine all implementations would use
reference counting, it's such an obvious optimisation. I know this doesn't
guarantee anything, but it's an assumption I think I'd be comfortable with.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]