Topic: string/cow/algorithm


Author: herbs@cntc.com (Herb Sutter)
Date: 1998/08/30
Raw View
ncm@nospam.cantrip.org (Nathan Myers) wrote:
>Assuming atomic increment, decrement-and-test, and compare
>instructions for the reference count, no locking need ever be
>done in a COW implementation (except that done in the memory
>allocator, used during a deep-copy).  To see this, imagine a
>string representation object which has one reference.  Clearly
>there are no thread issues here, and you can operate freely on
>the string.

Actually there are thread issues, and what you describe will manifest as
intermittent core dumps -- and, because they're intermittent, ones that
will be difficult to reproduce and difficult to debug.

I originally wrote a long reply here, but instead I'm going to save it
and make this question GotW #43. I should have it posted in a day or
two.

Herb

---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp  2695 North Sheridan Way, Suite 150
www.cntc.com www.peerdirect.com    Mississauga Ontario Canada L5K 2N6


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/31
Raw View
In article <35e84662.881936868@nr1.toronto.istar.net>,
  herbs@cntc.com (Herb Sutter) wrote:
>
> ncm@nospam.cantrip.org (Nathan Myers) wrote:
> >Assuming atomic increment, decrement-and-test, and compare
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >instructions for the reference count, no locking need ever be
   ^^^^^^^^^^^^
> >done in a COW implementation (except that done in the memory
> >allocator, used during a deep-copy).  To see this, imagine a
> >string representation object which has one reference.  Clearly
> >there are no thread issues here, and you can operate freely on
> >the string.
>
> Actually there are thread issues, and what you describe will manifest as
> intermittent core dumps -- and, because they're intermittent, ones that
> will be difficult to reproduce and difficult to debug.
>
> I originally wrote a long reply here, but instead I'm going to save it
> and make this question GotW #43. I should have it posted in a day or
> two.

Nathan Myers is correct. Perhaps you missed the assumptions?

These are not part of the C++ standard (but then again, neither are
locks). But if they do exist, and are used correctly, they can be
used to *implement* a lock. The fact that the integer is called
"reference_count" or something similar is irrelevant.

It's strange wording to say "assuming [these low-level primitives
exist] [then] no locking need ever be done", but if we interpret
this to mean that "no O/S-supported or other locking need ever be
done" then it certainly is defensible.

You owe Mr. Myers an honorary Guru designation. Oops- wrong group!

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/31
Raw View
[posted and mailed]

Herb Sutter<herbs@cntc.com> wrote:
>
>ncm@nospam.cantrip.org (Nathan Myers) wrote:
>>Assuming atomic increment, decrement-and-test, and compare
>>instructions for the reference count, no locking need ever be
>>done in a COW implementation (except that done in the memory
>>allocator, used during a deep-copy).  To see this, first imagine
>>a string representation object which has one reference.  Clearly
>>there are no thread issues here, and you can operate freely on
>>the string.
>
>Actually there are thread issues, and what you describe will manifest as
>intermittent core dumps -- and, because they're intermittent, ones that
>will be difficult to reproduce and difficult to debug.

What I described will *not* manifest as "intermittent core dumps".

Remember we were comparing CoW against always-deep-copy.  Clearly
if two threads are operating on the same visible string object,
you have a conflict, but that is the same for either implementation
strategy.  The differences arise when we have a single representation
object shared among more than one visible string object, so that
one does not know whether one is operating on a shared object.

A representation object with one reference is exactly the same as string
which is not shared, so any "intermittent core dumps" you imagine for
this case would occur for either representation.

It is the responsibility of users to manage access to visible objects.
It is the responsibility of the string implementation to make sure
that is sufficient.  (This is  the same for either strategy.)

The difference between the two strategies occurs when there *are*
more than one reference.  In that case it is clear that no string
object dares modify the shared representation, so any modification
must be preceded by a copy.  Further (as has already been mentioned
in this thread) the copy with a single reference must not have its
reference count incremented as long as there are valid references
to its elements still existing.

But none of this requires a lock, as long as checking the reference
count for <=0 is an atomic operation.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: saroj@bear.com
Date: 1998/09/01
Raw View
In article <6seoh1$rgk$1@nnrp1.dejanews.com>,
  AllanW@my-dejanews.com wrote:

> In article <35e84662.881936868@nr1.toronto.istar.net>,
>   herbs@cntc.com (Herb Sutter) wrote:
> >
> > ncm@nospam.cantrip.org (Nathan Myers) wrote:
> > >Assuming atomic increment, decrement-and-test, and compare
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >instructions for the reference count, no locking need ever be
>    ^^^^^^^^^^^^
> > >done in a COW implementation (except that done in the memory
> > >allocator, used during a deep-copy).  To see this, imagine a
> > >string representation object which has one reference.  Clearly
> > >there are no thread issues here, and you can operate freely on
> > >the string.
> >
> > Actually there are thread issues, and what you describe will manifest as
> > intermittent core dumps -- and, because they're intermittent, ones that
> > will be difficult to reproduce and difficult to debug.
> >
> > I originally wrote a long reply here, but instead I'm going to save it
> > and make this question GotW #43. I should have it posted in a day or
> > two.

> Nathan Myers is correct. Perhaps you missed the assumptions?

> These are not part of the C++ standard (but then again, neither are
> locks). But if they do exist, and are used correctly, they can be
> used to *implement* a lock. The fact that the integer is called
> "reference_count" or something similar is irrelevant.

> It's strange wording to say "assuming [these low-level primitives
> exist] [then] no locking need ever be done", but if we interpret
> this to mean that "no O/S-supported or other locking need ever be
> done" then it certainly is defensible.


The fact that on some O/S platforms, a single instruction can be used
to compare and swap the ref-count, can mislead one to think that there
is no other impact. Has anybody considered how it interacts with
memory cache? In aggressive SMP systems, each processor has its own
cache and for performance reasons, they do not flush the cache to the
main memory until needed. Lock/unlock force the cache to be flushed, so
that memory is 'visible' to other threads. Now in a COW implementation,
every mutating operation of a string will do lock/unlock. A non-COW
string needs locking only for making a copy (that can also be avoided
if you use thread-specific storage).

  * What is the impact on processor cache?
  * Is compare-and-swap sufficient for testing
    tri-state reference count (>1, 1, < 0 for non-shared) ?
  * Is locking the string ref count at the correct granularity level?
    (I rarely lock a single string. Instead, I lock a bigger data
     structure containing many strings.)

Any comments?

Thank you,
Saroj Mahapatra

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: herbs@cntc.com (Herb Sutter)
Date: 1998/09/02
Raw View
Mea culpa. To borrow a phrase, Sutter hates it when he's wrong.

The following two replies clearly point out the flaw in my reasoning:

phalpern@truffle.ma.ultranet.com (Pablo Halpern) wrote:
>If the reference count indicates that the buffer is not shared,
>the only way it can be incremented is by copying the only string that
>references it. If one thread is trying to read (or copy) the string
>while another is modifying it, you have a BUG IN THE PROGRAM, not a bug
>in the string class. If two threads access a shared string, they should
>use a mutex lock EXTERNAL TO THE STRING ITSELF.

ncm@nospam.cantrip.org (Nathan Myers) wrote:
>Remember we were comparing CoW against always-deep-copy.  Clearly
>if two threads are operating on the same visible string object,
>you have a conflict, but that is the same for either implementation
>strategy.
[...]
>It is the responsibility of users to manage access to visible objects.
>It is the responsibility of the string implementation to make sure
>that is sufficient.  (This is  the same for either strategy.)

Thank you. I had one misapprehension, and this disabuses me of it.

Specifically: 'If the reference count is one, there are no thread issues
and you can go on to mutate the string at will.' The imaginary problem
that I thought I saw was that, after the reference count was checked but
before the mutation was complete, a copy ctor or assignment could run on
another thread and take a copy, which would share the representation and
cause nasty problems. Although this is true, the key observation is
simply that the other thread could not do so without referring to the
same visible string object, in which case it's the calling code's own
lookout "to manage access to visible objects" whether COW is used or
not. (If thread A is changing string s1, and at the same time thread B
tries to execute something like "s2 = s1;", then clearly the calling
code must serialize those operations as usual.)

Having been thus enlightened, I am now of the opinion that for COW
objects in a MT environment where getting and setting the reference
count is atomic:

1. Read-only operations NEVER need to consider the reference count.
(This relies completely on #3.)

2. If the reference count is 1, possibly-mutating operations NEVER need
a lock.

3. If the reference count is greater than 1, possibly-mutating
operations need to take their own copy, and then drop into case #2. I
see two possible implementations of "if refs > 1, take your own copy":

   a) Always lock the copying operation. (This always incurs the
      expense of a lock, but never performs an unnecessary copy.)

      if( refs > 1 ) {
          lock.acquire();
          if( refs > 1 ) {
              // do the work of taking a copy
              --refs;
          }
          lock.release();
      }

   b) Never lock the "take a copy" operation. (This allows race
      conditions whereby two strings sharing a buffer may both take
      a copy at the same time, causing one unnecessary copy and
      requiring cleanup of the now-unreferenced original buffer.)

      if( refs > 1 ) {
          // do the work of taking a copy
          if( --refs < 1 ) {
              // delete original buffer
          }
      }

   Whether (a) or (b) would be more efficient would depend entirely
   on the likelihood of the race condition succeeding.

This requires no additional locks beyond the availability of "atomic
get" and "atomic set" operations on the reference count integer. I
believe that only 3(b) would additionally require an "atomic
set-and-get".

If getting and setting the reference count cannot be done atomically,
then locks will still be needed for every possibly mutating operation,
as I originally pointed out.

Better?

Followup question (I will do my own research, but comments are welcome):
How efficient is a built-in "integer atomic get/set" operation compared
to a Win32 critical section (NOT a mutex, those are more expensive)? My
observation is that you still need to read the reference count in EVERY
possibly-mutating operation, hence there would still be a MT performance
penalty for every possibly-mutating operation after all if the "atomic
read" isn't highly efficient.

Herb


---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp  2695 North Sheridan Way, Suite 150
www.cntc.com www.peerdirect.com    Mississauga Ontario Canada L5K 2N6
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/09/02
Raw View
<saroj@bear.com> wrote:
>  AllanW@my-dejanews.com wrote:
>>   herbs@cntc.com (Herb Sutter) wrote:
>> > ncm@nospam.cantrip.org (Nathan Myers) wrote:
>> > >Assuming atomic increment, decrement-and-test, and compare
>>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> > >instructions for the reference count, no locking need ever be
>>    ^^^^^^^^^^^^
>> > >done in a COW implementation (except that done in the memory
>> > >allocator, used during a deep-copy).  To see this, imagine a
>> > >string representation object which has one reference.  Clearly
>> > >there are no thread issues here, and you can operate freely on
>> > >the string.
>> >
>> > Actually there are thread issues, and what you describe will manifest as
>> > intermittent core dumps -- and, because they're intermittent, ones that
>> > will be difficult to reproduce and difficult to debug.

>> Nathan Myers is correct. Perhaps you missed the assumptions?
>
>> These are not part of the C++ standard (but then again, neither are
>> locks). But if they do exist, and are used correctly, they can be
>> used to *implement* a lock. The fact that the integer is called
>> "reference_count" or something similar is irrelevant.

>The fact that on some O/S platforms, a single instruction can be used
>to compare and swap the ref-count, can mislead one to think that there
>is no other impact. Has anybody considered how it interacts with
>memory cache? In aggressive SMP systems, each processor has its own
>cache and for performance reasons, they do not flush the cache to the
>main memory until needed. Lock/unlock force the cache to be flushed, so
>that memory is 'visible' to other threads. Now in a COW implementation,
>every mutating operation of a string will do lock/unlock. A non-COW
>string needs locking only for making a copy (that can also be avoided
>if you use thread-specific storage).

Sorry, not correct.  You only change the reference count when you
change the number of owners, not on every mutating operation.  This
means that copying changes the reference count, and invalidates the
cache line(s) which contain it, but most mutating operations happen
on strings with only one owner.  (The first mutating operation makes
a copy, and then that and later operations work on the copy.)

>  * What is the impact on processor cache?

This may not be usefully subject to speculation.  However, measurements
on good implementations should be available soon.  Note, however, that
only the affected cache lines on *other* CPUs are invalidated by the
operation, not the one on the CPU changing the refcount.  Therefore,
if yours is the only CPU operating on the string, there is no effect.

This is most likely to affect the empty string, because its reference
count will bounce around vigorously.

>  * Is compare-and-swap sufficient for testing
>    tri-state reference count (>1, 1, < 0 for non-shared) ?

Interestingly, this three-way comparison is not needed in
real implementations, so it doesn't matter.  Compare-and-swap
is needed only when decrementing the pointer, and then you
only care if it is time to delete it.  When checking to see
whether to deep-copy, all you need to know is if the count
is >1, which translates to the value being >0.  A simple atomic
test instruction yields that, and doesn't invalidate the cache.

>  * Is locking the string ref count at the correct granularity level?
>    (I rarely lock a single string. Instead, I lock a bigger data
>     structure containing many strings.)

You don't lock the ref count unless your target machine lacks atomic
incr & decr/test instructions.  In that case, then yes that is the
only possible granularity level, because it is the only level
under control of the string class.  Remember this is not controlling
access to the string itself, but to its internal representation,
which may be shared with other strings you don't know about, looked
at by other threads you don't know about.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/09/03
Raw View
In article <35ee5ff3.1150564574@nr1.toronto.istar.net>, herbs@cntc.com
says...

[ ... ]

> Followup question (I will do my own research, but comments are welcome):
> How efficient is a built-in "integer atomic get/set" operation compared
> to a Win32 critical section (NOT a mutex, those are more expensive)?

This is getting marginal for topicality, but to some extent it's
probably more or less universal: critical sections in Win32 are
considerably cheaper to enter IF you're the only one entering them.
However, if the critical section has already been entered, they're
actually somewhat more expensive than a mutex.  Worse, tests have
shown that critical sections have a significant problem when a large
number of threads are present, even when they're not constantly
contending over the critical section.  Finally, the relative speed can
get quite a bit worse on a multiprocessor machine.

Therefore, the relative speed is hard to quantify.  However, at least
in my experience, the integer atomic get/set (InterlockedIncrement and
InterlockedDecrement under Win32) are considerably more _dependably_
fast, even if they're not always significantly faster.

I'd expect that at least part of the situation with a critical section
applies to almost any environment using kernel threads: the problem is
that when a thread tries to enter a critical section that's already in
use, it normally wants to got to sleep and allow other threads to run.
The majority of the speed gain for using a critical section in the
first place (compared to using a mutex) is simply because it doesn't
involve a change to kernel mode.  When the thread sleeps, that causes
a change to kernel mode anyway.  If you're using some manner of non-
kernel threads, the thread can sleep and allow other threads in the
same process to run without switching to kernel mode.

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Marco Dalla Gasperina <marcodg@pacifier.com>
Date: 1998/09/03
Raw View
I've been reading this thread with great facination... sort of
a 'Clash of Titans'.  Yet, it seems to me that the discussion
is mostly a storm in a teacup.

My experience has been that sharing such a low level type
as a string between threads is not done.  The reason being
that each mutating operation will need to acquire a lock
if the string is shared but that the mutating operations
are rarely done singly.

This means that the string must be locked *across* the
mutating calls in order to maintain some invariant.  In order
to maintain that invariant, the string is encapsulated in
a class which will have the correct semantics and ensure that
clients that share the encapsulating object have a correct
view of the object at any time.

However...

Since reading this thread, I've been toying with an idea
of separating mutating/non-mutating methods into separate
classes.  You can get a read/write-handle to an object by
querying it for read/write access.

The read/write handle acts as a proxy to the object and maintains
a lock on the object until the handle goes out of scope.

You can play some games in the locking strategy like
granting unlimited read handles if there are no write
handles out there.

This means that we probably need to get rid of things like
operator[]() where you can't tell if it's a read or write (I
never liked operator[]() on a string anyway... seemed too
array-ish).

Clients would use it something like this:

void foo( String& s )
{
 String::Write_Access w = s.getWriteAccess();
 // this works because we're in the same thread
 String::Read_Access  r = s.getReadAccess();
 for ( int i=0; i<r.length(); ++i )
 {
  // change all \ to /
  if ( r.getAt(i) == '\\' ) w.setAt(i,'/');
 }
}

marco



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: "J. Van Damme" <ecogreen.joris.vandamme@ping.be>
Date: 1998/09/04
Raw View
Marco Dalla Gasperina <marcodg@pacifier.com> schreef in artikel
<35eeb7e0.0@news.pacifier.com>...
> Since reading this thread, I've been toying with an idea
> of separating mutating/non-mutating methods into separate
> classes.  You can get a read/write-handle to an object by
> querying it for read/write access.
>
> The read/write handle acts as a proxy to the object and maintains
> a lock on the object until the handle goes out of scope.
>
> You can play some games in the locking strategy like
> granting unlimited read handles if there are no write
> handles out there.

Seems to me that, beside narrowing it down to a single operation in the
data, rather than sequences of operation on the data, you are talking about
excactly the same thing I was talking about few days ago. However, since
than I read about this same thing a couple of times in other entries in
this group, and one seems to call these things <read-write-locks>. I don't
think a Win32 system has these things, I think the people using them or
perhaps Unix men.

The read-write-lock Win32 misses is implementable, though. And I should
think there must be numerous implementations. The one I did was in this
news few days ago, a Delphi implementation (Re: Use of critical section). I
did not know the thing seems to exist in other environments, at that time
(I was foolish enough to think I had an idea. Oh well).


Joris Van Damme
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: herbs@cntc.com (Herb Sutter)
Date: 1998/08/26
Raw View
ncm@nospam.cantrip.org (Nathan Myers) wrote:
>When making a deep copy, a lock is required in either case, because
>the memory allocator is MT-safe.However, a COW string makes fewer
>deep copies.

Of course. Agreed, in the common case.

>>2. The string must additionally acquire a lock for at least each possibly
>>mutating operation -- meaning not just copy operations, but also accessors
>>and mutators. Hence this locking is far more frequent than copies in the
>>common case.
>
>Wrong.  It must acquire a lock to make a deep copy before doing
>the first mutating operation on a shared string.  Once the copy
>is made, no further locking is needed.A  Strings that have not been
>shallow-copied (and thus are not shared) need neither be locked
>nor copied.
[...]
>I'm not very interested in seeing those results on bad implementations,
>but will enjoy seeing comparisons of my implementation to others'. :-)

I'm unconvinced, but willing to be proved ignorant (hey, it won't be the
first or last time). I'd be interested to see MT-safe COW code with
locking requirements as light as you describe. Would you post it, or
send it privately? I'll also email you some material I have written on
the subject.

Herb


---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp  2695 North Sheridan Way, Suite 150
www.cntc.com www.peerdirect.com    Mississauga Ontario Canada L5K 2N6


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: saroj@bear.com
Date: 1998/08/22
Raw View
In article <35dfe616.47271311@news.ma.ultranet.com>,
  phalpern@truffle.ma.ultranet.com (Pablo Halpern) wrote:
>
> AllanW@my-dejanews.com wrote:
>
[snipped]

> What we can do *now* is to use COW implementations that punt (do a
> deep-copy and mark unsharable) when the caller uses (non-const)
> operator[], begin(), or end().  My programs rarely use these features of
> stings.  Most common operations on strings is copying, creating
> substrings, replacing substrings, concatonating and displaying, all of
> which can be done without marking a string buffer as non-sharable.

substring, replacing substring, concatenation - COW will not help you.
If at all, it will be worse than non-COW, because most string
representations are contiguous. For a SGI STL rope, these are precisely
the things for which ref-count is worth the price, provided you have
a large rope.

displaying - no difference.

copying - already pointed out.

> Replacing single characters is much rarer and can either be done using
> replace(), or else suffer the performance penalty for using operator[]
> or non-const iterators.
>
> Thus a COW implementation can be just as efficient as you want, if use
> use it *as if* the operations mentioned were forbidden.
>

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/24
Raw View
Herb Sutter<herbs@cntc.com> wrote:
>Nathan Myers <ncm@nospam.cantrip.org> writes:
>> the reference-counted version is
>> *always* more efficient than the always-copying version.
>
>This is true for code built for single-threaded use only. The
>reference-counted version is almost always less efficient (usually by an
>order of magnitude or two, literally) in code built for possible
>multi-threaded use, even if a particular program that calls it happens to
>be single-threaded.

Modern architectures can operate on a reference count in one
instruction.  Allocating memory for another copy takes a long
time, and involves a lock itself.  I think Herb's remark above
is unsupportable for the common case.  I don't doubt there are
environments where it is true, but is certainly is not universal,
nor common.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: herbs@cntc.com (Herb Sutter)
Date: 1998/08/25
Raw View
ncm@nospam.cantrip.org (Nathan Myers) wrote:
>Herb Sutter<herbs@cntc.com> wrote:
>>The reference-counted version is almost always less efficient (usually
>>by an order of magnitude or two, literally) in code built for possible
>>multi-threaded use, even if a particular program that calls it happens
>>to be single-threaded.
>
>Modern architectures can operate on a reference count in one
>instruction.  Allocating memory for another copy takes a long
>time, and involves a lock itself.  I think Herb's remark above
>is unsupportable for the common case.  I don't doubt there are
>environments where it is true, but is certainly is not universal,
>nor common.

My remark is not only supportable and common, but correct. :-)

Operating on the reference count is immaterial; I assume that's fast. The
number of locks required is material, because locks are expensive:

1. The string still has to acquire the memory manager's lock when actually
making a copy. Any additional COW-induced overhead is in addition to that
lock, not instead of it.

2. The string must additionally acquire a lock for at least each possibly
mutating operation -- meaning not just copy operations, but also accessors
and mutators. Hence this locking is far more frequent than copies in the
common case.

#1 is minor; #2 dominates. In the common case, copy-on-write is an
order-of-magnitude-or-two (usually closer to two in my common experience)
pessimization in code built for possible multi-threaded use, EVEN IF the
program that calls it is single-threaded.

I'll happily post performance results if you'll agree to buy me dinner at
the Santa Cruz meeting a few weeks from now if I'm right. :-)

Herb


P.S.: See also Murray's "C++ Strategies and Tactics" (pages 70-72) for
statistics on the real utility of COW in even single-threaded programs, MT
overhead aside.


---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp  2695 North Sheridan Way, Suite 150
www.cntc.com www.peerdirect.com    Mississauga Ontario Canada L5K 2N6


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: phalpern@truffle.ma.ultranet.com (Pablo Halpern)
Date: 1998/08/25
Raw View
herbs@cntc.com (Herb Sutter) wrote:

>Operating on the reference count is immaterial; I assume that's fast. The
>number of locks required is material, because locks are expensive:
>
>1. The string still has to acquire the memory manager's lock when actually
>making a copy. Any additional COW-induced overhead is in addition to that
>lock, not instead of it.

Of course. But since COW requires fewer copies, this should be a net win
over performing deep copies every time.

>2. The string must additionally acquire a lock for at least each possibly
>mutating operation -- meaning not just copy operations, but also accessors
>and mutators. Hence this locking is far more frequent than copies in the
>common case.

I don't think this is true. Before modifying a buffer, the buffer must
not be shared (definition of COW).  An unshared buffer need not be mutex
locked.  If the reference count indicates that the buffer is not shared,
the only way it can be incremented is by copying the only string that
references it. If one thread is trying to read (or copy) the string
while another is modifying it, you have a BUG IN THE PROGRAM, not a bug
in the string class. If two threads access a shared string, they should
use a mutex lock EXTERNAL TO THE STRING ITSELF.

I don't think it is a good specification for strings that every
operation be considered atomic.  If you design every method of every
class to be atomic, your MT program is likely to get bogged down in
mutex locking, even for unshared resources.  It may allow sloppy
programs to work correctly, but it is not the way to go if you want
efficiency.  Strings in which every opration is logically atomic should
be a different class (e.g. inter_thread_string).

-------------------------------------------------------------
Pablo Halpern                   phalpern@truffle.ultranet.com

I am self-employed. Therefore, my opinions *do* represent
those of my employer.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: phalpern@truffle.ma.ultranet.com (Pablo Halpern)
Date: 1998/08/25
Raw View
saroj@bear.com wrote:

>
>substring, replacing substring, concatenation - COW will not help you.
>If at all, it will be worse than non-COW, because most string
>representations are contiguous.

I didn't mean to imply that any non-const operation is accelerated by
COW.  Rather, the above-mentioned operations do not expose references
into the buffer and thus do not require the modified string buffer to be
marked unshareable (unline non-const operator[], begin() and end()).
This means you can modify a string using these operations and then copy
the result without doing a deep-copy.

-------------------------------------------------------------
Pablo Halpern                   phalpern@truffle.ultranet.com

I am self-employed. Therefore, my opinions *do* represent
those of my employer.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/25
Raw View
 Herb Sutter<herbs@cntc.com> wrote:
>
>ncm@nospam.cantrip.org (Nathan Myers) wrote:
>>Herb Sutter<herbs@cntc.com> wrote:
>>>The reference-counted version is almost always less efficient (usually
>>>by an order of magnitude or two, literally) in code built for possible
>>>multi-threaded use, even if a particular program that calls it happens
>>>to be single-threaded.
>>
>>Modern architectures can operate on a reference count in one
>>instruction.  Allocating memory for another copy takes a long
>>time, and involves a lock itself.  I think Herb's remark above
>>is unsupportable for the common case.  I don't doubt there are
>>environments where it is true, but is certainly is not universal,
>>nor common.
>
>My remark is not only supportable and common, but correct. :-)

Sorry, not right.

>Operating on the reference count is immaterial; I assume that's fast. The
>number of locks required is material, because locks are expensive:
>
>1. The string still has to acquire the memory manager's lock when actually
>making a copy. Any additional COW-induced overhead is in addition to that
>lock, not instead of it.

When making a deep copy, a lock is required in either case, because
the memory allocator is MT-safe.However, a COW string makes fewer
deep copies.


>2. The string must additionally acquire a lock for at least each possibly
>mutating operation -- meaning not just copy operations, but also accessors
>and mutators. Hence this locking is far more frequent than copies in the
>common case.

Wrong.  It must acquire a lock to make a deep copy before doing
the first mutating operation on a shared string.  Once the copy
is made, no further locking is needed.A  Strings that have not been
shallow-copied (and thus are not shared) need neither be locked
nor copied.

>#1 is minor; #2 dominates. In the common case, copy-on-write is an
>order-of-magnitude-or-two (usually closer to two in my common experience)
>pessimization in code built for possible multi-threaded use, EVEN IF the
>program that calls it is single-threaded.

I don't doubt that bad implementations of MT-safe strings are possible.
I don't doubt that you have used them.   Measurements of those that
don't do a single-instruction atomic reference-count operation, and
instead share a single lock for all strings will reveal little.

>I'll happily post performance results if you'll agree to buy me dinner at
>the Santa Cruz meeting a few weeks from now if I'm right. :-)

I'm not very interested in seeing those results on bad implementations,
but will enjoy seeing comparisons of my implementation to others'. :-)

Of course a single-instruction reference-count operation isn't (as)
portable, so comparison of such implementations to portable ones
isn't fair, strictly speaking -- but performance demands are never
fair.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Valentin Bonnard <bonnardv@pratique.fr>
Date: 1998/08/18
Raw View
Nathan Myers <ncm@nospam.cantrip.org> writes:

> the reference-counted version is
> *always* more efficient than the always-copying version.

A text manipulation program written in pre-STL style
using only operator[] would be significantly slower
except if the optimiser is clever enough to understand
that the string is always locked.

Both cow and non cow strings make sens.

--

Valentin Bonnard                mailto:bonnardv@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jkanze@otelo.ibmmail.com
Date: 1998/08/20
Raw View
In article <35D9AE1C.7553@pratique.fr>,
  bonnardv@pratique.fr wrote:
>
> Nathan Myers <ncm@nospam.cantrip.org> writes:
>
> > the reference-counted version is
> > *always* more efficient than the always-copying version.
>
> A text manipulation program written in pre-STL style
> using only operator[] would be significantly slower
> except if the optimiser is clever enough to understand
> that the string is always locked.

Independantly of this issue, Nathan's statement is remarkably
naive (as are almost all statements with both "always" and
"efficient" in them).  Rob Murray has some discussion about
this in his book (something with Strategies and Tactics in the
title, highly recommended); he actually ran some tests with
his implementation, rather than just speculating.

Reference counting has a definite cost, if only in setting up
the count in the constructor.  If you never copy any strings, you
pay for reference counting without using it.  In practice, of course,
it's almost impossible to write anything useful without copying a
string now and then; Rob Murray's benchmarks measured how much copying
was necessary (for his implementation) to break even.

> Both cow and non cow strings make sens.

Agreed.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: herbs@cntc.com (Herb Sutter)
Date: 1998/08/20
Raw View
Nathan Myers <ncm@nospam.cantrip.org> writes:
> the reference-counted version is
> *always* more efficient than the always-copying version.

This is true for code built for single-threaded use only. The
reference-counted version is almost always less efficient (usually by an
order of magnitude or two, literally) in code built for possible
multi-threaded use, even if a particular program that calls it happens to
be single-threaded.

If a string object can be shared across processes, then you could also
replace "-threaded" with "-process" everywhere in the above paragraph.

[I'm assuming you're talking about strings; I didn't see the original
post, only a quoted version.]

Herb

---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp  2695 North Sheridan Way, Suite 150
www.cntc.com www.peerdirect.com    Mississauga Ontario Canada L5K 2N6


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: saroj@bear.com
Date: 1998/08/20
Raw View
In article <6rhf1q$f35$1@nnrp1.dejanews.com>,
  jkanze@otelo.ibmmail.com wrote:
>
> In article <35D9AE1C.7553@pratique.fr>,
>   bonnardv@pratique.fr wrote:
> >
> > Nathan Myers <ncm@nospam.cantrip.org> writes:
> >
> > > the reference-counted version is
> > > *always* more efficient than the always-copying version.
> >
> > A text manipulation program written in pre-STL style
> > using only operator[] would be significantly slower
> > except if the optimiser is clever enough to understand
> > that the string is always locked.
>
> Independantly of this issue, Nathan's statement is remarkably
> naive (as are almost all statements with both "always" and
> "efficient" in them).  Rob Murray has some discussion about
> this in his book (something with Strategies and Tactics in the
> title, highly recommended); he actually ran some tests with
> his implementation, rather than just speculating.
>

Well, I also disagree with Nathan. I did not argue, because I
generally agree with his opinions. Thread gurus like Dave
Butenhof (of Digital and now Compaq) (His Posix Threads book
greatly recommended) or David Holmes can probably
explain more eloquently than me how mutex locking and unlocking
are not cheap; they can cause (unnecessary) flushing of cache
and a program that uses less locking than another is generally better.
compare and swap(CAS) and other primitives can be cheaper though.

Another point that I raised earlier, but probably failed to explain
well is that I rarely want to lock an individual string. Typically
a data item shared between multiple threads has many shared
items including string(s) and we need to lock the shared item or
container rather than individual strings.


> Reference counting has a definite cost, if only in setting up
> the count in the constructor.  If you never copy any strings, you
> pay for reference counting without using it.  In practice, of course,
> it's almost impossible to write anything useful without copying a
> string now and then; Rob Murray's benchmarks measured how much copying
> was necessary (for his implementation) to break even.
>


> > Both cow and non cow strings make sens.
>
RWLock (ReaderWriterLock) also use reference counting, but you can use
a RWLock (when applicable) to lock the entire shared data structure
rather than individual strings.

> Agreed.
>

hash maps with string keys (in typical implementation)
make two copies of a string key. This is one place, where I wish I could
avoid the copy. In performance critical code, I can use a char* key, but
does anybody know a better way to avoid copying of strings in hash_maps.

   -- Saroj Mahapatra

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: phalpern@truffle.ma.ultranet.com (Pablo Halpern)
Date: 1998/08/21
Raw View
AllanW@my-dejanews.com wrote:

>Of course, whenever the user gets a reference or address of a
>character in the string -- no matter how (operator[], or dereferencing
>an iterator) -- taking and holding the address is an absolute no-no.
>This isn't just for ref-counted strings, but also for any other
>container that moves things without an explicit request to do so
>(such as vector).

Not quite right.  There are specific operations that invalidate
references to elements.  Copying the container is not one of them. That
is why operator[] (and begin() and end()) must mark the buffer as
unshareable. Otherwise, a copy of the container could be modified
through a (still valid) element reference that was taken before the copy
was made:

 string A("hello world");
 char& r = A[3];   // get and hold a reference to 4th char
 string B(A);      // B had better not share a buffer with A!
                   // r is still a valid reference.
 r = 'n';   // Should modify A[3] but not B[3]


-------------------------------------------------------------
Pablo Halpern                   phalpern@truffle.ultranet.com

I am self-employed. Therefore, my opinions *do* represent
those of my employer.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: phalpern@truffle.ma.ultranet.com (Pablo Halpern)
Date: 1998/08/21
Raw View
AllanW@my-dejanews.com wrote:

>Maybe what we really should have done is require string to disallow
>looking at individual characters with char*-type notation; leaving
>only iterators. Then any library vendor could implement ref-counting
>simply by having the iterator return a proxy which triggers cow when
>appropriate. Programs that need char*-type notation wouldn't be
>affected, since they could still use array-of-char!
>
>But that doesn't help *now*, you point out? Granted. But what can we
>do now, short of avoiding ref-counting like the plague?

What we can do *now* is to use COW implementations that punt (do a
deep-copy and mark unsharable) when the caller uses (non-const)
operator[], begin(), or end().  My programs rarely use these features of
stings.  Most common operations on strings is copying, creating
substrings, replacing substrings, concatonating and displaying, all of
which can be done without marking a string buffer as non-sharable.
Replacing single characters is much rarer and can either be done using
replace(), or else suffer the performance penalty for using operator[]
or non-const iterators.

Thus a COW implementation can be just as efficient as you want, if use
use it *as if* the operations mentioned were forbidden.

-------------------------------------------------------------
Pablo Halpern                   phalpern@truffle.ultranet.com

I am self-employed. Therefore, my opinions *do* represent
those of my employer.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: herbs@cntc.com (Herb Sutter)
Date: 1998/08/21
Raw View
I wrote:
>Nathan Myers <ncm@nospam.cantrip.org> writes:
>> the reference-counted version is
>> *always* more efficient than the always-copying version.
>
>This is true for code built for single-threaded use only.

I should probably have said: "This can be true..." or qualified it with
"The overhead aside, this is true..." or "For the number of copies, this
is true..."  Reference counting will never perform more copies, but it
does incur overhead.

Anyway, my main point was that COW is an abomination in multithreaded
code, and THAT'S true virtually without qualification. :-)

Herb


---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp  2695 North Sheridan Way, Suite 150
www.cntc.com www.peerdirect.com    Mississauga Ontario Canada L5K 2N6


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/13
Raw View
In article <35D1DA1C.A2B@bean.jpl.nasa.gov>,
  mckelvey@bean.jpl.nasa.gov wrote:
>
> Nathan Myers wrote:
> >
> > John Potter<jpotter@falcon.lhup.edu> wrote:
> > >I received some email responses to the original.  Not seeing any
> > >posts, I will pass on the information to you and others.  Here is
> > >another nasty problem:
> > >
> > > string s("abc");
> > > string::iterator i(s.begin());
> > > string c(s);
> > > *i = 'z';
> > > // What is the value of c[0]?
> > >
> > >A solution is for strings to have one of three states (shared, unique,
> > >unsharable).  You have the first two now.  Any function which returns
> > >a non-const reference to internals must take care of making the copy
> > >and marking it unsharable.  Any copy function which gets an unsharable
> > >to copy must do a deep copy.  In the above, s.begin() would do that
> > >for s and the copy constructor would be forced to do a deep copy in
> > >creating c.
> >
>
> Maybe I'm missing something, but an object should never return a pointer
> or reference to its internals.
>
> If s is a string, an iterator on s would bump the reference counter on
> the value of s. This means that if s changes, the iterator still refers
> to the previous value of s.
>
> The iterator should not be able to change s. It just represents a
> position within s.
>
> So the code becomes something like:
>
> string s("abc");
> string::iterator i(s);
> string c(s);
> s(i) = 'z';
> // What is the value of c[0]?
>
> c[0] is 'a'.
> The only thing changed is s, which is "zbc".

This is true, although the syntax above doesn't show it. If
the implementation of string::iterator contains the address
of the object and an offset then this particular problem is
solved instantly.

But this still doesn't apply to the general case.
    string a("abcdefghijklmnopqrstuvwxyz");
    string b("zyxwvutsrqponmlkjihgfedcba");
    printf("%s", &a[0]);
    char *x = &b[0];
    string c(a);
    string d(b);
    *x = '-';
If taking the address of a character in the string marks the
string "unsharable", then we have four separate strings here.
In this case, ref-counting has been mostly disabled, except
for the extra run-time overhead which is still present.

If we don't mark the string that way just because the user
peeked inside the string, then a/c will (correctly) share
a buffer, and we will get the memory (and possibly performance)
gains that ref-counting promises. Unfortunately, b/d will also
(incorrectly) share a buffer, and changing string b will also
affect string d.

> The call to s(i) can verify that i is indeed an iterator on s. If
> desired, it can then update the iterator to refer to the new value of s.
> In general though, an iterator should be considered a snapshot.

A reasonable idea, but it doesn't use the correct syntax.

> It seems to me that the problem is really that the string class is
> poorly designed.

That does seem to be one of the hot issues of the moment.

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/12
Raw View
 <AllanW@my-dejanews.com> wrote:
>
>The frustrating thing is that string is *so close* to making
>ref-counting easy, except for some seemingly-minor decisions which
>nevertheless are absolute killers for ref-counted implementations.

It is not at all clear that they are "absolute killers".  In fact,
it's hard for me to find an interpretation that satisfies this claim.
Given reasonable atomic increment and decrement-and-test instructions
(common to all modern machines) the reference-counted version is
*always* more efficient than the always-copying version.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: saroj@bear.com
Date: 1998/08/12
Raw View
In article <35D0B96F.3A42@pratique.fr>,
  bonnardv@pratique.fr wrote:
>
> Matt Austern <austern@sgi.com> writes:
>
> > ncm@nospam.cantrip.org (Nathan Myers) writes:
> >
> > > <saroj@bear.com> wrote:
> > > >
> > > >I still feel that creating ref-counted string is not worth it. With
little
> > > >care, actually you can do better with non-ref-counted string.
> > > > ...
> > > >* I pass input strings as const std::string& and output strings
> > > >  as std::string* (like I'd do for a vector).  ...
> > >
> > > This is a very unnatural way to operate on strings.
> >
> > Naturalness is in the eye of the beholder.
>
> Not really.
>
> > Strings are containers,
> > and they potentially manage large chunks of memory.  Passing large
> > objects by reference, rather than by value, is common advice.
>
> You missed (or ignored) the relevant parts. const T& arguments
> are natural, T* return values aren't (where T is a concrete type
> with copy construction defined).
>

How about vector, list, 'Customer' return values? They are (or, can be)
concrete types with copy constructors too.

> SB = <saroj@bear.com>; SB wrote:
> SB>   name_list.push_back(std::string());
> SB>   name_list.back().swap(name);
>
> swaping instead of direct assignment isn't.
>

I agree, direct assignment is more natural. What I tried to illustrate is
that most often, a COW implementation increments the ref-count, just to
immediately decrement it (like in a loop I showed). swap() makes the
action lock-free and explicit here. Speed and predictability are
important to me. Construct and swap (I coined after compare-and-swap) idiom
should be in the toolbox of every MT programmer.


> I use const std::string& as function arguments and std::string
> as return values.
>

That is a bad advice, because you can not be sure whether your program
will run on a non-COW implementation.

         -- Saroj Mahapatra

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Jim McKelvey <mckelvey@bean.jpl.nasa.gov>
Date: 1998/08/12
Raw View
Nathan Myers wrote:
>
> John Potter<jpotter@falcon.lhup.edu> wrote:
> >I received some email responses to the original.  Not seeing any
> >posts, I will pass on the information to you and others.  Here is
> >another nasty problem:
> >
> > string s("abc");
> > string::iterator i(s.begin());
> > string c(s);
> > *i = 'z';
> > // What is the value of c[0]?
> >
> >A solution is for strings to have one of three states (shared, unique,
> >unsharable).  You have the first two now.  Any function which returns
> >a non-const reference to internals must take care of making the copy
> >and marking it unsharable.  Any copy function which gets an unsharable
> >to copy must do a deep copy.  In the above, s.begin() would do that
> >for s and the copy constructor would be forced to do a deep copy in
> >creating c.
>


Maybe I'm missing something, but an object should never return a pointer
or reference to its internals.

If s is a string, an iterator on s would bump the reference counter on
the value of s. This means that if s changes, the iterator still refers
to the previous value of s.

The iterator should not be able to change s. It just represents a
position within s.

So the code becomes something like:

string s("abc");
string::iterator i(s);
string c(s);
s(i) = 'z';
// What is the value of c[0]?

c[0] is 'a'.
The only thing changed is s, which is "zbc".

The call to s(i) can verify that i is indeed an iterator on s. If
desired, it can then update the iterator to refer to the new value of s.
In general though, an iterator should be considered a snapshot.


It seems to me that the problem is really that the string class is
poorly designed.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jkanze@otelo.ibmmail.com
Date: 1998/08/13
Raw View
In article <6qo3fq$dum$1@pigpen.csrlink.net>,
  jpotter@falcon.lhup.edu (John Potter) wrote:
>
> The original question was the conformance of a string class which
> sorted the original string when a copy was sorted.  The answer was
> that the FDIS is different from the CD2 and the implementation is not
> conformant.  A following question asked about making a cow string
> class conformant.
>
> On writing a standard conformant string class.
>
> AllanW@my-dejanews.com wrote:
>
> : I think that using a proxy class should handle this fairly well
>
> Sorry.  Unless this has also changed from the CD2, it is required that
> string::operator[] return a T& and that string::iterator::operator*
> return a T&.  Proxies are expressly forbidden by these requirements.
>
> Perhaps one of the reasons that SGI decided against reference counting
> for their string as you indicated.

I thing that the problem was more linked with thread safety.  However, th=
e
requirement you just mentioned does make it more difficult (and less of
a win).

> But Mark wants to write a standard conformant reference counted cow
> string class.  Let's not tell him not to do it but offer some help in
> doing it.
>
> My interest was in what I can expect from a conformant string and that
> has been satisfied.  However, I am interested in any standard
> conformant techniques that may help Mark do a better job than the
> little relayed information which I posted.  Or maybe there are no
> better techniques.

I didn't see what has been posted, but basically, you have to isolate
(unshare) and lock (prevent further sharing) anytime you take non-const
reference or iterator.  The lock must be maintained until the next action
which invalidates the iterator.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
        +49 (0)69 66 45 33 10    mailto: jkanze@otelo.ibmmail.com
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient=E9e objet --
              -- Beratung in objektorientierter Datenverarbeitung

-----=3D=3D Posted via Deja News, The Leader in Internet Discussion =3D=3D=
-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/10
Raw View
In article <6qn19u$sb0$1@pigpen.csrlink.net>,
  jpotter@falcon.lhup.edu (John Potter) wrote:
>
> snowball3@usa.net (Mark E.) wrote:
>
> : The program segment under discussion and whether the output is conforming:
> : >#include <iostream>
> : >#include <string>
> : >#include <algorithm>
> : >using namespace std;
> : >int main () {
> : >   string s("Hello world");
> : >   string c(s);
> : >   sort(c.begin(), c.end()); // undefined behavior?
> : >   cout << s << endl;
> : >   }

> : I have written my own string class using cow and my class produces the same
> : answer. As far as I'm concerned, s shouldn't also be sorted so my class
> : would be considered "non-conforming".

And even if that were understood, the class would also be non-intuitive
and error-prone. When you copy a string, logically you have two strings
with the same value; when you modify one it shouldn't affect the other.
The reference count/copy-on-write mechanism is a local optimization that
the programmer shouldn't have to be involved with.

> : In my class, I set a flag in the
> : non-const versions of begin() and end() so when you write a character, a
> : cow is performed.

The problem is how to detect a write. The phrase "set a flag" implies
that you aren't triggering the deep copy right away -- but if you
return a char*, then you won't get another chance, because modifying
the char isn't a class operation. On the other hand, if non-const
begin() and end() do trigger a deep write right away, then casual
(read: not careful) users, who may use begin() and end() on a
non-const string that they don't intend to modify, will take a big
performance hit.

> : begin() and end() return pointers to the internal
> : character array, so if you write:
>
> : string s("abc");
> : string c(s);
>
> : *(c.begin()) = 'd'; // Where do I fix this?

Exactly. The assignment has to trigger a copy somehow, but begin()
doesn't seem like the right place.

> I received some email responses to the original.  Not seeing any
> posts, I will pass on the information to you and others.  Here is
> another nasty problem:
>
>  string s("abc");
>  string::iterator i(s.begin());
>  string c(s);
>  *i = 'z';
>  // What is the value of c[0]?

A more vexing version of the same. The iterator is created before the
logical copy, so simply having the new iterator set a flag couldn't
possibly solve the problem. We have to detect the write itself somehow.

> A solution is for strings to have one of three states (shared, unique,
> unsharable).  You have the first two now.  Any function which returns
> a non-const reference to internals must take care of making the copy
> and marking it unsharable.  Any copy function which gets an unsharable
> to copy must do a deep copy.  In the above, s.begin() would do that
> for s and the copy constructor would be forced to do a deep copy in
> creating c.

But once a string is marked unsharable, can we ever change it back to
unique? When does an interator and/or address go out of scope?

What we really should do is detect changes to the strings that are
shared, triggering a deep copy but somehow not invalidating any
iterators. (Changing the string length would continue to invalidate
iterators, but that's not part of this problem.)  I think that using
a proxy class should handle this fairly well; below is my (UNTESTED)
example, although I'm sure many of the usual contributors to this
group could do better.

    class string {
        // Class Data holds the actual array, and a ref count
        struct Data {
            char *data;
            int len;
            mutable int ref;

            Data(int i) : data(new char[1+i]), len(i), ref(1) {}
            Data(const char*c) : len(c?strlen(c):0), ref(1) {
                data = new char[1+len];
                if (c) strcpy(data, c);
            }
            Data(const Data&d) : data(new char[1+d.len]), len(d.len), ref(1)
                { memcpy(data, d.data, len+1); --d.ref; } // Deep copy
            void addRef() { ++ref; }
            void removeRef() { if (!--ref) delete this; }
            bool needCow() { return ref>1; }
        } *data;

        // Class Proxy reads or writes the string,
        // returned by Iterator::operator*
        struct Proxy {
            string *object;
            int     pos;
            Proxy(string*o, int p) : object(o), pos(p) {}
            operator char() const { return object->data->data[pos]; }
            char operator=(char c) {
                if (object->data->needCow())
                    object->data = new Data(object->data); // Deep copy
                return object->data->data[pos] = c;
            }
            char* operator&() {
                if (object->data->needCow())
                    object->data = new Data(object->data); // Deep copy
                return object->data->data + pos;
            }
            // ... more ...
        };

        // class Iterator is the iterator returned by
        // non-const begin() and end()
        struct Iterator {
            string *object;
            int     pos;
            Iterator(string*o, int p) : object(o), pos(p) {}
            Iterator operator ++() { ++pos; return *this; }
            Iterator operator ++(int)
                { Iterator i(object,pos); ++pos; return i; }
            Iterator operator --() { --pos; return *this; }
            Iterator operator --(int)
                { Iterator i(object,pos); --pos; return i; }
            Proxy operator*() const { return Proxy(object, pos); }
            Proxy operator[](int i) { return Proxy(object, pos+i); }
            // ... more ...
        };
        class ReverseIterator { /* Extremely similar */ };
        class ConstIterator { /* Similar to Iterator but simpler */ };
        class ConstReverseIterator { /* Extremely similar */ };

    public:
        string(const char*c) : data(new Data(c)) {}
        string(const string&d) : data(d.data) { data->addRef(); }
        ~string() { data->removeRef(); }
        string operator=(const char*c) {
            data->removeRef();
            data = new Data(c);
            return *this;
        }
        string operator=(const string&d) {
            d.data->addRef(); // Add ref before removing old ref
            data->removeRef(); // (in case d.data == data)
            data = d.data;
            return *this;
        }
        Proxy operator[](int i) { return Proxy(this,i); }
        char operator[](int i) char { return data->data[i]; }
        Iterator begin() { return Iterator(this, 0); }
        ConstIterator begin() const { return ConstIterator(this,0); }
        Iterator end() { return Iterator(this,data->len); }
        ConstIterator end() const { return ConstIterator(this,data->len); }
        ReverseIterator rbegin() { return ReverseIterator(this,data->len); }
        ConstReverseIterator rbegin() {
            return ConstReverseIterator(this,data->len); }
        ReverseIterator rend() { return ReverseIterator(this,0); }
        ConstReverseIterator rend() const {
            return ConstReverseIterator(this,0); }
        // ... much more ...
    };

As you can see, class string::Data implements the reference counting
and enables copy-on-write. Copying a string adds a reference to the
new Data and removes one from the old Data (possibly triggering a
delete).

We also have the usual [Const][Reverse]Iterator classes with the usual
++/--/[] operators. But when the user dereferecnes an iterator, instead
of returning a char reference we return a Proxy. If the user uses this
like a character, operator char() automatically returns the correct
character from the correct string (valid as long as the string length
hasn't changed -- even if it's been relocated in memory). If the user
tries to modify the character, operator=() triggers Copy-On-Write, if
neccesary, and then does the assignment.

This solves code of this form:
    string a("Hello string!");
    string::Iterator p = a.begin(); // Fine.
    string b(a);
    *p = 'x'; // Modifies a, but not b
but it still doesn't fix code of this form:
    string a("Hello string!");
    char *c = &*a.begin(); // No!
    string b(a);
    *c = 'x'; // Modifies both strings
But really, this last case is insolvable. Rather than using an
iterator to point to a contained character, the user literally takes
the address. In general, copy-on-write means that at some times the
implementation must move the string to a new address. Retaining an
actual address of a particular character is incompatible with the
entire mechanism, but the user can do something conceptually
identical by using an Iterator.

SGI considered all of the above issues, but also considered
multi-threaded issues. You can see their excellent analysis and
some interesting conclusions (they decided not to do reference
counting at all!) by looking at
  http://www.sgi.com/Technology/STL/string_discussion.html
(and no, they don't pay me a commission for sending you there...)

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jpotter@falcon.lhup.edu (John Potter)
Date: 1998/08/11
Raw View
The original question was the conformance of a string class which
sorted the original string when a copy was sorted.  The answer was
that the FDIS is different from the CD2 and the implementation is not
conformant.  A following question asked about making a cow string
class conformant.

On writing a standard conformant string class.

AllanW@my-dejanews.com wrote:

: I think that using a proxy class should handle this fairly well

Sorry.  Unless this has also changed from the CD2, it is required that
string::operator[] return a T& and that string::iterator::operator*
return a T&.  Proxies are expressly forbidden by these requirements.

Perhaps one of the reasons that SGI decided against reference counting
for their string as you indicated.

But Mark wants to write a standard conformant reference counted cow
string class.  Let's not tell him not to do it but offer some help in
doing it.

My interest was in what I can expect from a conformant string and that
has been satisfied.  However, I am interested in any standard
conformant techniques that may help Mark do a better job than the
little relayed information which I posted.  Or maybe there are no
better techniques.

John



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/11
Raw View
John Potter<jpotter@falcon.lhup.edu> wrote:
>I received some email responses to the original.  Not seeing any
>posts, I will pass on the information to you and others.  Here is
>another nasty problem:
>
> string s("abc");
> string::iterator i(s.begin());
> string c(s);
> *i = 'z';
> // What is the value of c[0]?
>
>A solution is for strings to have one of three states (shared, unique,
>unsharable).  You have the first two now.  Any function which returns
>a non-const reference to internals must take care of making the copy
>and marking it unsharable.  Any copy function which gets an unsharable
>to copy must do a deep copy.  In the above, s.begin() would do that
>for s and the copy constructor would be forced to do a deep copy in
>creating c.

This is not a nasty problem in a conforming implementation, for the
reason stated above.

In fact, these three states (shared, unique, unsharable) neatly
correspond to values n>0, 0, and -1 for a reference count n, so
no extra storage is needed to represent the extra state.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/11
Raw View
<saroj@bear.com> wrote:
>
>I still feel that creating ref-counted string is not worth it. With little
>care, actually you can do better with non-ref-counted string.
> ...
>* I pass input strings as const std::string& and output strings
>  as std::string* (like I'd do for a vector).  ...

This is a very unnatural way to operate on strings.

>* Most programs use small strings. Either you lock (in MT env) for ref-count,
> or, you lock for new storage, but once you get separate storage, a string
>can  approach the speed of a C string (char*) (no check for refcount > 1).
>Copying is fairly fast on most machines. Another thing, I have observed  that
>in MT programs, a data item shared between multiple threads typically
>contains more than one string, so you need to lock the data item (or  the
>container contianing it); so locking individual strings is unnecessary and
>expensive.
>
>  For large strings, you can use SGI STL rope.

The expense of copying strings like vectors is not only in looping
over the characters, it is also in constructing a new string object and
allocating storage for it.  On modern architectures incrementing a
reference count can be done with a single instruction (no lock), and
decrement-and-test with two instructions.   The overwhelming
majority of strings in most programs are simply passed around and
written out without ever being examined character-by-character
except within the string members.

Still, it makes sense for a conscientious library vendor to provide
both reference-counted and non-reference-counted versions, and let
the user choose, but the reference-counted variety should be fine for
normal use even in MT environments.

The SGI rope may be preferable for some programs, but std::string is
still the standard way to communicate strings between user code and
library code.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/11
Raw View
<AllanW@my-dejanews.com> wrote:
>... if non-const
>begin() and end() do trigger a deep write right away, then casual
>(read: not careful) users, who may use begin() and end() on a
>non-const string that they don't intend to modify, will take a big
>performance hit.

This is correct, as far as it goes.  Of course if one doesn't
mean to modify the string, it's better to use a const reference
to it.  Getting good performance from some std::string operations
will require a bit more lore than is ideal.

>> : begin() and end() return pointers to the internal
>> : character array, so if you write:
>>
>> : string s("abc");
>> : string c(s);
>>
>> : *(c.begin()) = 'd'; // Where do I fix this?
>
>Exactly. The assignment has to trigger a copy somehow, but begin()
>doesn't seem like the right place.

It's a place that works, which has a unique virtue over other places.

>>  string s("abc");
>>  string::iterator i(s.begin());
>>  string c(s);
>>  *i = 'z';
>>  // What is the value of c[0]?
>
>A more vexing version of the same. The iterator is created before the
>logical copy, so simply having the new iterator set a flag couldn't
>possibly solve the problem. We have to detect the write itself somehow.

Or, as suggested above, create the copy when the iterator is constructed.

>> A solution is for strings to have one of three states (shared, unique,
>> unsharable).  You have the first two now.  Any function which returns
>> a non-const reference to internals must take care of making the copy
>> and marking it unsharable.  Any copy function which gets an unsharable
>> to copy must do a deep copy.  In the above, s.begin() would do that
>> for s and the copy constructor would be forced to do a deep copy in
>> creating c.
>
>But once a string is marked unsharable, can we ever change it back to
>unique? When does an interator and/or address go out of scope?

You can mark it sharable again when the iterator becomes invalid.
Many operations render iterators invalid, and they are listed in
the standard:

     * As an argument to non-member functions swap()
       (lib.string.special), operator>>() (lib.string.io), and getline()
       (lib.string.io).

     * As an argument to basic_string::swap().

     * Calling data() and c_str() member functions.

     * Calling non-const member functions, except operator[](), at(),
       begin(), rbegin(), end(), and rend().

     * Subsequent to any of the above uses except the forms of insert()
       and erase() which return iterators, the first call to non-const
       member functions operator[](), at(), begin(), rbegin(), end(), or
       rend().


>SGI considered all of the above issues, but also considered
>multi-threaded issues. You can see their excellent analysis and
>some interesting conclusions (they decided not to do reference
>counting at all!) by looking at
>  http://www.sgi.com/Technology/STL/string_discussion.html
>(and no, they don't pay me a commission for sending you there...)

The conclusion of the referenced analysis is not so sound as
the analysis itself.  In fact, reference-counted strings are
quite reasonable to use in an MT environment.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/11
Raw View
John Potter<jpotter@falcon.lhup.edu> wrote:
>On writing a standard conformant string class...
>But Mark wants to write a standard conformant reference counted cow
>string class.  Let's not tell him not to do it but offer some help in
>doing it.

If you want to participate in development of free standard-conforming
library components, the Egcs libstdc++-v3 project would welcome your
efforts.  As it happens, a conforming reference-counted string template
is already done, though not heavily tested.  It would benefit most by
better safety, as it currently uses raw pointers as iterators.  Class
objects instead would give better type-safety, and even permit a
compile-time debug option.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: snowball3@usa.net (Mark E.)
Date: 1998/08/11
Raw View
jpotter@falcon.lhup.edu (John Potter) wrote:

>
>: I think that using a proxy class should handle this fairly well
>
>Sorry.  Unless this has also changed from the CD2, it is required that
>string::operator[] return a T& and that string::iterator::operator*
>return a T&.  Proxies are expressly forbidden by these requirements.

My copy of CD2 lists iterator and const_iterator as being implementation
defined so use of Allan's iterator class idea is still possible. SGI's rope
iterators use a similar method but of course that class has no standard to
conform to.

>But Mark wants to write a standard conformant reference counted cow
>string class.  Let's not tell him not to do it but offer some help in
>doing it.

Thanks, I appreciate the discussion about the string class since I haven't
been able to find that much discussion about it. I have modified the class
and it does now give correct answers for those tricky snippets when string
sharing is enabled. I don't like the way it pre-emptively guards against a
problem when it frequently doesn't need to, but it's hard to argue with a
correct answer.

Mark

--
Mark E.: snowball3@usa.net
http://snowball3.home.ml.org/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Email_To::.Hans.Olsson@dna.lth.se (Hans Olsson)
Date: 1998/08/11
Raw View
In article <35d04855.2117468@news3.ibm.net>, Mark E. <snowball3@usa.net> wrote:
>
>jpotter@falcon.lhup.edu (John Potter) wrote:
>
>>
>>: I think that using a proxy class should handle this fairly well
>>
>>Sorry.  Unless this has also changed from the CD2, it is required that
>>string::operator[] return a T& and that string::iterator::operator*
>>return a T&.  Proxies are expressly forbidden by these requirements.
>
>My copy of CD2 lists iterator and const_iterator as being implementation
>defined so use of Allan's iterator class idea is still possible.

But operator[] returns reference and not iterator which is explicitly
required to be Allocator::reference (at least in CD2), and
Allocator::reference is in CD2 declared as having return type T&.

The meaing of return type for a type is a bit unclear to me, but I assume
the intention is that Allocator::reference is the type T&.

The idea might still be possible for the iterators, but it becomes
a bit more complex.


--
// Home page  http://www.dna.lth.se/home/Hans_Olsson/
// Email To..Hans.Olsson@dna.lth.se [Please no junk e-mail]




[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Matt Austern <austern@sgi.com>
Date: 1998/08/11
Raw View
ncm@nospam.cantrip.org (Nathan Myers) writes:

> <saroj@bear.com> wrote:
> >
> >I still feel that creating ref-counted string is not worth it. With little
> >care, actually you can do better with non-ref-counted string.
> > ...
> >* I pass input strings as const std::string& and output strings
> >  as std::string* (like I'd do for a vector).  ...
>
> This is a very unnatural way to operate on strings.

Naturalness is in the eye of the beholder.  Strings are containers,
and they potentially manage large chunks of memory.  Passing large
objects by reference, rather than by value, is common advice.

That's especially true for strings.  The standard does not guarantee
that passing strings by value is an inexpensive operation, and this
isn't an oversight.  The standardization committee deliberately
rejected mandating the complexity of string operations.

Some implementations use tricks to speed up string copying in some
cases, but it's easy to prove that these tricks can't correctly be
used in all cases.  Roughly speaking, you can't use shared-
representation tricks for any string where anyone has looked at an
individual character.

You can work out the rules for when an implementation may legally
perform speedup tricks, and pass strings by value in some cases and by
reference in others, or you can just pass strings by reference all the
time.  I prefer the latter; it's less work.

The tradeoffs for SGI ropes are different, of course.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/11
Raw View
In article <6qo3fq$dum$1@pigpen.csrlink.net>,
  jpotter@falcon.lhup.edu (John Potter) wrote:
>
> The original question was the conformance of a string class which
> sorted the original string when a copy was sorted.  The answer was
> that the FDIS is different from the CD2 and the implementation is not
> conformant.  A following question asked about making a cow string
> class conformant.
>
> On writing a standard conformant string class.
>
> AllanW@my-dejanews.com wrote:
>
> : I think that using a proxy class should handle this fairly well
>
> Sorry.  Unless this has also changed from the CD2, it is required that
> string::operator[] return a T& and that string::iterator::operator*
> return a T&.  Proxies are expressly forbidden by these requirements.

That's a pity. It would be invisible to most code.

> Perhaps one of the reasons that SGI decided against reference counting
> for their string as you indicated.

That seems to be all we're left with.

Were I a library vendor, I'd write two versions of string: one that's
standards-compliant, and another called __rcString which was
reference-counted but also used the proxy where needed.

> But Mark wants to write a standard conformant reference counted cow
> string class.  Let's not tell him not to do it but offer some help in
> doing it.

I'd be interested in this as well. The standard doesn't require
reference counting, but it's such an obvious gain that many
people *want* to do this. But is it possible to satisfy all three
goals at once:
    * Standard-conformant
    * Reference counted
    * Intiutive and correct?

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/11
Raw View
In article <6qp000$78e$1@shell7.ba.best.com>,
  ncm@nospam.cantrip.org (Nathan Myers) wrote:
>
> <AllanW@my-dejanews.com> wrote:
> >>  string s("abc");
> >>  string::iterator i(s.begin());
> >>  string c(s);
> >>  *i = 'z';
> >>  // What is the value of c[0]?
> >
> >A more vexing version of the same. The iterator is created before the
> >logical copy, so simply having the new iterator set a flag couldn't
> >possibly solve the problem. We have to detect the write itself somehow.
>
> Or, as suggested above, create the copy when the iterator is constructed.

So you're suggesting marking the string as "unsharable" in this
situation. This basically turns off reference counting/cow whenever
an iterator exists.

> >> A solution is for strings to have one of three states (shared, unique,
> >> unsharable).  You have the first two now.  Any function which returns
> >> a non-const reference to internals must take care of making the copy
> >> and marking it unsharable.  Any copy function which gets an unsharable
> >> to copy must do a deep copy.  In the above, s.begin() would do that
> >> for s and the copy constructor would be forced to do a deep copy in
> >> creating c.
> >
> >But once a string is marked unsharable, can we ever change it back to
> >unique? When does an interator and/or address go out of scope?
>
> You can mark it sharable again when the iterator becomes invalid.
> Many operations render iterators invalid, and they are listed in
> the standard:
>
>      * As an argument to non-member functions swap()
>        (lib.string.special), operator>>() (lib.string.io), and getline()
>        (lib.string.io).
>
>      * As an argument to basic_string::swap().

So far, every one of these conceptually replaces the value in the
string. (Implementation details of swap() ignored.)

>      * Calling data() and c_str() member functions.

Surely this is not a good time to change the string from unsharable to
unique -- the iterators may be invalid, but now we have a pointer to
the buffer!

>      * Calling non-const member functions, except operator[](), at(),
>        begin(), rbegin(), end(), and rend().

Again, conceptually replacing the value in the string.

>      * Subsequent to any of the above uses except the forms of insert()
>        and erase() which return iterators, the first call to non-const
>        member functions operator[](), at(), begin(), rbegin(), end(), or
>        rend().
>
> >SGI considered all of the above issues, but also considered
> >multi-threaded issues. You can see their excellent analysis and
> >some interesting conclusions (they decided not to do reference
> >counting at all!) by looking at
> >  http://www.sgi.com/Technology/STL/string_discussion.html
> >(and no, they don't pay me a commission for sending you there...)
>
> The conclusion of the referenced analysis is not so sound as
> the analysis itself.  In fact, reference-counted strings are
> quite reasonable to use in an MT environment.

You don't reach the same conclusion? I do. But even if it were not
so, the fact is that at least some library vendors (i.e. SGI) have
decided not to use reference counting for the standard strings.
This means that programmers writing portable code cannot expect the
string class to be reference-counted. Applications that demand the
performance and memory savings supplied by ref-counted strings will
have to roll their own version, even though equivalent functionality
may already exist in the library.

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/11
Raw View
In article <6qptos$a0u$1@news.lth.se>,
  Email_To::.Hans.Olsson@dna.lth.se (Hans Olsson) wrote:
>
> In article <35d04855.2117468@news3.ibm.net>, Mark E. <snowball3@usa.net>
wrote:
> >
> >jpotter@falcon.lhup.edu (John Potter) wrote:
> >
> >>
> >>: I think that using a proxy class should handle this fairly well
> >>
> >>Sorry.  Unless this has also changed from the CD2, it is required that
> >>string::operator[] return a T& and that string::iterator::operator*
> >>return a T&.  Proxies are expressly forbidden by these requirements.
> >
> >My copy of CD2 lists iterator and const_iterator as being implementation
> >defined so use of Allan's iterator class idea is still possible.
>
> But operator[] returns reference and not iterator which is explicitly
> required to be Allocator::reference (at least in CD2), and
> Allocator::reference is in CD2 declared as having return type T&.

If this is true, then continue to use a proxy when you dereference
an iterator, but handle operator[] as a special case which triggers
copy-on-write instantly.

I'm hoping that using operator[] on strings is fairly rare; if that's
what the user really wants, then an array of char is more appropriate
than a string.

Of course, whenever the user gets a reference or address of a
character in the string -- no matter how (operator[], or dereferencing
an iterator) -- taking and holding the address is an absolute no-no.
This isn't just for ref-counted strings, but also for any other
container that moves things without an explicit request to do so
(such as vector).

> The meaing of return type for a type is a bit unclear to me, but I assume
> the intention is that Allocator::reference is the type T&.

I'm hoping that this is one of the places that calls for something
"equivalent to T" (because it's convertible to T).

> The idea might still be possible for the iterators, but it becomes
> a bit more complex.

Complex, yes. Because the simple things don't merit much discussion.

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Matt Austern <austern@sgi.com>
Date: 1998/08/11
Raw View
AllanW@my-dejanews.com writes:

> > Sorry.  Unless this has also changed from the CD2, it is required that
> > string::operator[] return a T& and that string::iterator::operator*
> > return a T&.  Proxies are expressly forbidden by these requirements.
>
> That's a pity. It would be invisible to most code.

It's more visible than you might think: It's very hard to get proxy
references to work correctly with template functions whose arguments
are passed by reference.  You wind up with a reference to a proxy, and
that sometimes has unexpected consequences.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/11
Raw View
In article <fxtu33jl9zt.fsf@isolde.engr.sgi.com>,
  Matt Austern <austern@sgi.com> wrote:
>
> ncm@nospam.cantrip.org (Nathan Myers) writes:
>
> > <saroj@bear.com> wrote:
> > >
> > >I still feel that creating ref-counted string is not worth it. With little
> > >care, actually you can do better with non-ref-counted string.
> > > ...
> > >* I pass input strings as const std::string& and output strings
> > >  as std::string* (like I'd do for a vector).  ...

I tend to like this idea, although I'm not thrilled about it.
After all, not every programmer (not even every top-notch application
programmer) reads comp.std.c++ -- so they aren't all aware of these
"trade-offs" you refer to.

> > This is a very unnatural way to operate on strings.
>
> Naturalness is in the eye of the beholder.  Strings are containers,
> and they potentially manage large chunks of memory.  Passing large
> objects by reference, rather than by value, is common advice.

You're right; those same top-notch programmers would never dream
of passing a Customer object by value, because they know it's large.
Unfortunately, the concept of char*-as-string is so ingrained in C
that it's easy to get caught up by passing whole strings where mere
references are more appropriate.

> That's especially true for strings.  The standard does not guarantee
> that passing strings by value is an inexpensive operation, and this
> isn't an oversight.  The standardization committee deliberately
> rejected mandating the complexity of string operations.

It was probably the only reasonable choice left. But it's still a
problem for C programmers turned towards C++.

The frustrating thing is that string is *so close* to making
ref-counting easy, except for some seemingly-minor decisions which
nevertheless are absolute killers for ref-counted implementations.

> Some implementations use tricks to speed up string copying in some
> cases, but it's easy to prove that these tricks can't correctly be
> used in all cases.  Roughly speaking, you can't use shared-
> representation tricks for any string where anyone has looked at an
> individual character.

Maybe what we really should have done is require string to disallow
looking at individual characters with char*-type notation; leaving
only iterators. Then any library vendor could implement ref-counting
simply by having the iterator return a proxy which triggers cow when
appropriate. Programs that need char*-type notation wouldn't be
affected, since they could still use array-of-char!

But that doesn't help *now*, you point out? Granted. But what can we
do now, short of avoiding ref-counting like the plague?

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Valentin Bonnard <bonnardv@pratique.fr>
Date: 1998/08/11
Raw View
Matt Austern <austern@sgi.com> writes:

> ncm@nospam.cantrip.org (Nathan Myers) writes:
>
> > <saroj@bear.com> wrote:
> > >
> > >I still feel that creating ref-counted string is not worth it. With little
> > >care, actually you can do better with non-ref-counted string.
> > > ...
> > >* I pass input strings as const std::string& and output strings
> > >  as std::string* (like I'd do for a vector).  ...
> >
> > This is a very unnatural way to operate on strings.
>
> Naturalness is in the eye of the beholder.

Not really.

> Strings are containers,
> and they potentially manage large chunks of memory.  Passing large
> objects by reference, rather than by value, is common advice.

You missed (or ignored) the relevant parts. const T& arguments
are natural, T* return values aren't (where T is a concrete type
with copy construction defined).

SB = <saroj@bear.com>; SB wrote:
SB>   name_list.push_back(std::string());
SB>   name_list.back().swap(name);

swaping instead of direct assignment isn't.

I use const std::string& as function arguments and std::string
as return values.

--

Valentin Bonnard                mailto:bonnardv@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1998/08/11
Raw View
In article <6qp0cj$849$1@shell7.ba.best.com>,
  ncm@nospam.cantrip.org (Nathan Myers) wrote:
>
> John Potter<jpotter@falcon.lhup.edu> wrote:
> >On writing a standard conformant string class...
> >But Mark wants to write a standard conformant reference counted cow
> >string class.  Let's not tell him not to do it but offer some help in
> >doing it.
>
> If you want to participate in development of free standard-conforming
> library components, the Egcs libstdc++-v3 project would welcome your
> efforts.  As it happens, a conforming reference-counted string template
> is already done, though not heavily tested.  It would benefit most by
> better safety, as it currently uses raw pointers as iterators.  Class
> objects instead would give better type-safety, and even permit a
> compile-time debug option.

Can you tell us how it gets around some of the problems brought up
in this thread? For instance:
    std::string s("Hello string");
    std::string::iterator i = s.begin();
    std::string c(s);
    *i = 'x';
    std::cout << c << std::endl;
Does this write the intuitively-corect result? If so, how does it
manage it using only raw pointers as iterators?

--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Valentin Bonnard <bonnardv@pratique.fr>
Date: 1998/08/11
Raw View
John Potter wrote:
>
> The original question was the conformance of a string class which
> sorted the original string when a copy was sorted.  The answer was
> that the FDIS is different from the CD2 and the implementation is not
> conformant.  A following question asked about making a cow string
> class conformant.
>
> On writing a standard conformant string class.
>
> AllanW@my-dejanews.com wrote:
>
> : I think that using a proxy class should handle this fairly well

Not really

> Sorry.  Unless this has also changed from the CD2, it is required that
> string::operator[] return a T& and that string::iterator::operator*
> return a T&.  Proxies are expressly forbidden by these requirements.

You are perfectly right.

> However, I am interested in any standard
> conformant techniques that may help Mark do a better job than the
> little relayed information which I posted.  Or maybe there are no
> better techniques.

The user can get a reference or pointer to the one
charT in a basic_string. You have *no way* to remplace
these with UDT, change their behaviour or be told when
they go out of scope. So once you have released them,
you are unsharable as long as they are valid.

If iterator is charT*, then begin() and end() put
you in the unsharable state. If iterator is a UDT,
then you have to mark the string as unsharable when
the iterator is dereferenced (or before).

Few programmers call begin() or end() just for fun;
most will dereference the iterator at some point,
generally just after. So what's the point ?

The point is that there will be an unsharable state
in a cow basic_string. The question is how many strings
are in this state, and is there still a significant
gain, or a gain at all (it's application dependant, of
course).

Using const std::string& as arguments is still a good
idea. The compiler can optimise std::string return values.
There are still cases where you copy a string without
touch-ing it.

--

Valentin Bonnard                mailto:bonnardv@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: saroj@bear.com
Date: 1998/08/12
Raw View
In article <6qqbvs$adv$1@nnrp1.dejanews.com>,
  AllanW@my-dejanews.com wrote:
>
> In article <6qptos$a0u$1@news.lth.se>,
>   Email_To::.Hans.Olsson@dna.lth.se (Hans Olsson) wrote:
> >
> > In article <35d04855.2117468@news3.ibm.net>, Mark E. <snowball3@usa.net>
> wrote:
> > >
> > >jpotter@falcon.lhup.edu (John Potter) wrote:
> > >
> > >>

> I'm hoping that using operator[] on strings is fairly rare; if that's
> what the user really wants, then an array of char is more appropriate
> than a string.
>

No, it is more common than most people think. Array of char does not help
in managing memory. vector<char> can help in memory management, but
string has other goodies like c_str(), find family, hash(non-standard)
etc. In fact, Java has StringBuffer class for character operations. Because
C++ does not have two separate classes (string and stringbuffer), string
must support efficient character operations.

         -- Saroj Mahapatra

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jpotter@falcon.lhup.edu (John Potter)
Date: 1998/08/07
Raw View
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main () {
   string s("Hello world");
   string c(s);
   sort(c.begin(), c.end()); // undefined behavior?
   cout << s << endl;
   }

The output on one implementation is " Hdellloorw".

Since both begin and end are non-const members, the second invalidates
the first and sort uses an invalid iterator producing undefined
behavior?  Is the result conforming or is this an invalid
implementation?

John



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/08/07
Raw View
21.3 - Template class basic_string [lib.basic.string]

   -5- References, pointers, and iterators referring to the elements of a
   basic_string sequence may be invalidated by the following uses of that
   basic_string object:
     ...
     * Calling non-const member functions, except operator[](), at(),
       begin(), rbegin(), end(), and rend().

Anyhow, if you call a function with an invalid iterator, it is allowed
to do *anything* -- even something reasonable.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jpotter@falcon.lhup.edu (John Potter)
Date: 1998/08/08
Raw View
: 21.3 - Template class basic_string [lib.basic.string]

:    -5- References, pointers, and iterators referring to the elements of a
:    basic_string sequence may be invalidated by the following uses of that
:    basic_string object:
:      ...
:      * Calling non-const member functions, except operator[](), at(),
:        begin(), rbegin(), end(), and rend().

OK, thanks.  The program didn't do anything nasty then.

: Anyhow, if you call a function with an invalid iterator, it is allowed
: to do *anything* -- even something reasonable.

Yep, but that program made a copy of a string and sorted it.  The
result was that the original was also sorted.  There was no "cow"
done!  I didn't find that reasonable.

I have interpreted your answer as saying the implementation which did
that nasty thing to me is not conforming.  I am sure glad since it
would mean that no generic algorithms could be used with strings.

John



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: snowball3@usa.net (Mark E.)
Date: 1998/08/09
Raw View
The program segment under discussion and whether the output is conforming:
>#include <iostream>
>#include <string>
>#include <algorithm>
>using namespace std;
>int main () {
>   string s("Hello world");
>   string c(s);
>   sort(c.begin(), c.end()); // undefined behavior?
>   cout << s << endl;
>   }

jpotter@falcon.lhup.edu (John Potter) wrote:

>Yep, but that program made a copy of a string and sorted it.  The
>result was that the original was also sorted.  There was no "cow"
>done!  I didn't find that reasonable.
>
>I have interpreted your answer as saying the implementation which did
>that nasty thing to me is not conforming.  I am sure glad since it
>would mean that no generic algorithms could be used with strings.
>
>John
>

I have written my own string class using cow and my class produces the same
answer. As far as I'm concerned, s shouldn't also be sorted so my class
would be considered "non-conforming". In my class, I set a flag in the
non-const versions of begin() and end() so when you write a character, a
cow is performed. begin() and end() return pointers to the internal
character array, so if you write:

string s("abc");
string c(s);

*(c.begin()) = 'd'; // Where do I fix this?

The solutions I see to bring my class into conformance is to perform a
clone() type function in the c.begin() statement or write a iterator class
that performs a clone() in operator*.

For reference, my class can be found at
http://members.xoom.com/snowball3/cpp/ )

Mark

--
Mark E.: snowball3@usa.net
http://snowball3.home.ml.org/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jpotter@falcon.lhup.edu (John Potter)
Date: 1998/08/10
Raw View
snowball3@usa.net (Mark E.) wrote:


: The program segment under discussion and whether the output is conforming:
: >#include <iostream>
: >#include <string>
: >#include <algorithm>
: >using namespace std;
: >int main () {
: >   string s("Hello world");
: >   string c(s);
: >   sort(c.begin(), c.end()); // undefined behavior?
: >   cout << s << endl;
: >   }

: I have written my own string class using cow and my class produces the same
: answer. As far as I'm concerned, s shouldn't also be sorted so my class
: would be considered "non-conforming". In my class, I set a flag in the
: non-const versions of begin() and end() so when you write a character, a
: cow is performed. begin() and end() return pointers to the internal
: character array, so if you write:

: string s("abc");
: string c(s);

: *(c.begin()) = 'd'; // Where do I fix this?

I received some email responses to the original.  Not seeing any
posts, I will pass on the information to you and others.  Here is
another nasty problem:

 string s("abc");
 string::iterator i(s.begin());
 string c(s);
 *i = 'z';
 // What is the value of c[0]?

A solution is for strings to have one of three states (shared, unique,
unsharable).  You have the first two now.  Any function which returns
a non-const reference to internals must take care of making the copy
and marking it unsharable.  Any copy function which gets an unsharable
to copy must do a deep copy.  In the above, s.begin() would do that
for s and the copy constructor would be forced to do a deep copy in
creating c.

Good luck,
John



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: saroj@bear.com
Date: 1998/08/10
Raw View
In article <6qn19u$sb0$1@pigpen.csrlink.net>,
  jpotter@falcon.lhup.edu (John Potter) wrote:
>
> snowball3@usa.net (Mark E.) wrote:
>
> : The program segment under discussion and whether the output is conforming:
> : >#include <iostream>
> : >#include <string>
> : >#include <algorithm>
> : >using namespace std;
> : >int main () {
> : >   string s("Hello world");
> : >   string c(s);
> : >   sort(c.begin(), c.end()); // undefined behavior?
> : >   cout << s << endl;
> : >   }
>
> : I have written my own string class using cow and my class produces the same
> : answer. As far as I'm concerned, s shouldn't also be sorted so my class
> : would be considered "non-conforming". In my class, I set a flag in the
> : non-const versions of begin() and end() so when you write a character, a
> : cow is performed. begin() and end() return pointers to the internal
> : character array, so if you write:
>
> : string s("abc");
> : string c(s);
>
> : *(c.begin()) = 'd'; // Where do I fix this?
>
> I received some email responses to the original.  Not seeing any
> posts, I will pass on the information to you and others.  Here is
> another nasty problem:
>
>  string s("abc");
>  string::iterator i(s.begin());
>  string c(s);
>  *i = 'z';
>  // What is the value of c[0]?
>
> A solution is for strings to have one of three states (shared, unique,
> unsharable).  You have the first two now.  Any function which returns
> a non-const reference to internals must take care of making the copy
> and marking it unsharable.  Any copy function which gets an unsharable
> to copy must do a deep copy.  In the above, s.begin() would do that
> for s and the copy constructor would be forced to do a deep copy in
> creating c.
>

I still feel that creating ref-counted string is not worth it. With little
care, actually you can do better with non-ref-counted string.

* I mostly work in MT environment.

* I pass input strings as const std::string& and output strings
  as std::string* (like I'd do for a vector).

* In a loop (and other times when possible), I use string::swap to assign.

   std::string name;
   std::list<std::string> name_list;
   while ((c = ist.rdbuf()->sbumpc()) != EOF) {
      //... other code
      if (isspace(c)) {
        name_list.push_back(std::string());
        name_list.back().swap(name);
      } else {
        name.push_back(c);
      }
   }
  (a user of COW string can also do this,
   but typically he will think ref-couting will take care of it).

* Most programs use small strings. Either you lock (in MT env) for ref-count,
 or, you lock for new storage, but once you get separate storage, a string
can  approach the speed of a C string (char*) (no check for refcount > 1).
Copying is fairly fast on most machines. Another thing, I have observed  that
in MT programs, a data item shared between multiple threads typically
contains more than one string, so you need to lock the data item (or  the
container contianing it); so locking individual strings is unnecessary and
expensive.

  For large strings, you can use SGI STL rope.

Any comments are welcome!

         -- Saroj Mahapatra

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]