Topic: basic_string: s[0] = s[1] legal?


Author: kanze@gabi-soft.fr (J. Kanze)
Date: 1999/02/08
Raw View
"Cipolli" <stephen.cipolli@fcc.net> writes:

|>  James.Kanze@dresdner-bank.com wrote in message
|>  <78ndj6$53k$1@nnrp1.dejanews.com>...
|>  >
|>  >In article <36cd9a11.54642421@netnews.worldnet.att.net>,
|>  >  dHarrison@worldnet.att.net (Doug Harrison) wrote:
|>  >
|>
|>  >If the user writes illegal code, the implementation is not required to
|>  >make it work, threads or not.  And I repeat, the problem has nothing to
|>  >do with threads:
|>
|>
|>  I am beginning to understand.  In my original post I had made the problem
|>  multithreaded to allow the second reference to the string buffer to go out
|>  of scope forcing the reference to point to a freed buffer.  This is more of
|>  an implementation detail because typical COW implementations don't copy the
|>  buffer before marking it unshareable if the buffer has no other references.
|>  This, as I say, is purely an implementation detail.
|>
|>  However, I am still having trouble with the standard's language on the
|>  subject. Given the following:
|>
|>  string s="abc";
|>  const string& cs = s;
|>  const char& c1 = cs[1];    // line 1
|>  char& c0 = s[0];                // line 2
|>  c0 = c1;
|>
|>  At line 1, const operator[] is called and no copying or marking
|>  non-shareable occurs.  At line 2, c1 should be invalid and of course c0
|>  should be fine.  Also, s's string buffer should have been copied and marked
|>  non-shareable.
|>
|>  My problem is this, the standard says that references and pointers are
|>  invalidated by
|>
|>  -- Calling non-const member funtions except operator[](), at() , ...
|>
|>  Doesn't this mean that the call to line 2 is not supposed to invalidate c1?

This specific line doesn't say anything at all about line 2.  It gives a
number of reasons which might invalidate the iterators/references, etc.,
but is part of a list of reasons.  The list itself is all inclusive, but
the individual points aren't.

|>  I am obviously not reading this right.  It does appear to me that the next
|>  sentence
|>
|>  -- Subsequent to any of the above uses ... the first call to non-const
|>  member functions operator[](), at(), ...
|>
|>  means that if a second call to non-const operator[] was encountered that c1
|>  would be then invalidated.  What am I missing here?

I'm trying to figure out what the "subsequent" part of this sentence is
supposed to mean myself.  The intend, at least as I understand it, is
that the *first* call to a non-const operator[] may invalidate iterators
and references.  The "subsequent" part is presumably trying to define
what "first" is relative to, but I can't quite figure it out.  I am
fairly sure that the intent is that after construction and later
copying, the "first" call to non-const operator[] will invalidate, but
like you, I am unable to find this in the actual words of the standard.

I am fairly certain that the intent is for line 2 to invalidate the
iterator in line 1.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
Conseils en informatique orient   e objet --
              -- Beratung in objektorientierter Datenverarbeitung
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: dHarrison@worldnet.att.net (Doug Harrison)
Date: 1999/02/04
Raw View
James.Kanze@dresdner-bank.com wrote:

>You can almost
>always simulate the effects of multithreading with some clever use of
>the comma operator, for example.

More than anything, I think that indicates that copy-just-in-case results in
rather fragile classes.

>I disagree.  Code like the above is inherantly not thread safe,
>regardless of the library.  You have no right to suppose that any
>library operation is atomic, and in your example, you are accessing the
>same object in two different threads.  It is not hard to imagine
>implementations without sharing where it might fail.  (In practice,
>without sharing, it only works because the typical implementation
>consists of a single variable which can be read and written atomically.)

Can you elaborate on how it could fail in a non-sharing implementation, so
that I'll know why I should expect the worst? The simplified example would
then be:

// Perhaps this wasn't clear, but the assumption was that the initialization
// of s was not concurrent to its access by the functions running in threads
// 1 and 2 below.
string s = "abc";

bool thread1()
{
   return s[0] == s[1];
}

bool thread2()
{
   return s[0] == s[1];
}

--
Doug Harrison
dHarrison@worldnet.att.net


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: James.Kanze@dresdner-bank.com
Date: 1999/02/04
Raw View
In article <36c6cf11.10286828@netnews.worldnet.att.net>,
  dHarrison@worldnet.att.net (Doug Harrison) wrote:

> Can you elaborate on how it could fail in a non-sharing implementation, so
> that I'll know why I should expect the worst? The simplified example would
> then be:
>
> // Perhaps this wasn't clear, but the assumption was that the initialization
> // of s was not concurrent to its access by the functions running in threads
> // 1 and 2 below.
> string s = "abc";
>
> bool thread1()
> {
>    return s[0] == s[1];
> }
>
> bool thread2()
> {
>    return s[0] == s[1];
> }

I think that normally, the burden of proof is on the other side: what
makes you think that the above will work in a threaded environment.  I
can think of any number of legal implementations where it won't.  I
can't off hand think of a reasonable one, though.  My point is simply
that the standard makes no guarantees about this, and that more
generally, it is an error to suppose that any operations on a complex
type are atomic with regards to threading.

--
James Kanze                                           GABI Software, S   rl
Conseils en informatique orient    objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: kanze@gabi-soft.fr          mailto: James.Kanze@dresdner-bank.com

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: dHarrison@worldnet.att.net (Doug Harrison)
Date: 1999/01/27
Raw View
maxepolk@my-dejanews.com wrote:

>> In article <789chn$chf$1@east44.supernews.com>,
>>   "Cipolli" <stephen.cipolli@fcc.net> wrote:
>> > Is the following code considered legal (I could to find
>> > anything in the C++ Standard which makes it illegal):
>> > string s;
>> > void thread0() { s[0] = s[1]; }
>> > void thread1() { string t = s; }

Assuming that the sharing mechanism used by std::string is thread-safe, the
above should work in copy-on-write (CoW) std::string implementations.
Calling the non-const operator[] will typically unshare the representation
and mark it unshareable.

>Note that as soon as you get a reference to a character in a
>string, the string must immediately make a separate copy of
>itself if it is shared between two references because that
>reference to a character can be used to change the string at
>some later point.  Also, the string must immediately mark
>itself as ineligible for being shared for the same reason.

Right, and it can later be remarked shareable when a function that is
defined to invalidate iterators and references is called.

>Once the string itself changes, that reference cannot be valid
>any longer because the string can reposition itself in memory,
>leaving the reference to a character pointing to no-man's land.

Let's make the above a little more perverse, assuming a typical CoW
implementation:

string s = "abc";
const string& q = s;

void thread0()
{
   q[1]; // 1
}

void thread1()
{
   string t = s; // 2
   s[0]; // 3
} // 4

// 1: Const operator[] does not mark s unshareable. Suppose thread0 is
preempted after computing the address of s[1] but before dereferencing the
pointer. Call this pointer p.
// 2: Shares s's representation (Rep1) with t.
// 3. Non-const operator[] forces unsharing of Rep1, which means s gets a
new copy of Rep1, call it Rep2, but Rep1 remains with t, leaving p pointing
into t's representation.
// 4: t goes out of scope, so thread0 invokes undefined behavior when it
gets around to executing *p, because p points into the now defunct Rep1.

So, I don't see how one can write a CoW std::basic_string that is safe in a
multi-threaded environment even for read-only access without performing an
excessive amount of locking or marking strings unshareable to an excessive
extent. Neither would be necessary if it was permissible to use a reference
proxy class instead of a real reference, but that would introduce other
inefficiencies, which to be fair, are mitigated by the infrequency of users
manipulating std::basic_string as an array. The usual CoW approach really
isn't copy-on-write so much as it is copy-just-in-case, and the CoW
implementations I've seen aren't thread-safe at all, because read-only
access by one thread can dump the representation employed by read-only
access taking place in another thread.

--
Doug Harrison
dHarrison@worldnet.att.net
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: hinnant@_anti-spam_lightlink.com (Howard Hinnant)
Date: 1999/01/27
Raw View
In article <36cd9a11.54642421@netnews.worldnet.att.net>,
dHarrison@worldnet.att.net (Doug Harrison) wrote:

> Let's make the above a little more perverse, assuming a typical CoW
> implementation:
>
> string s = "abc";
> const string& q = s;
>
> void thread0()
> {
>    q[1]; // 1
> }
>
> void thread1()
> {
>    string t = s; // 2
>    s[0]; // 3
> } // 4
>
> // 1: Const operator[] does not mark s unshareable. Suppose thread0 is
> preempted after computing the address of s[1] but before dereferencing the
> pointer. Call this pointer p.
> // 2: Shares s's representation (Rep1) with t.
> // 3. Non-const operator[] forces unsharing of Rep1, which means s gets a
> new copy of Rep1, call it Rep2, but Rep1 remains with t, leaving p pointing
> into t's representation.
> // 4: t goes out of scope, so thread0 invokes undefined behavior when it
> gets around to executing *p, because p points into the now defunct Rep1.
>
> So, I don't see how one can write a CoW std::basic_string that is safe in a
> multi-threaded environment even for read-only access without performing an
> excessive amount of locking or marking strings unshareable to an excessive
> extent.

Good example.  But doesn't the standard address this?

<quote>

-5- References, pointers, and iterators referring to the elements of a
basic_string sequence may be invalidated by the following
uses of that basic_string object:

....

Calling non-const member functions, except operator[](), at(), begin(),
rbegin(), end(), and rend().

Subsequent to any of the above uses except the forms of insert() and
erase() which return iterators, the first call to non-const member
functions operator[](), at(), begin(), rbegin(), end(), or rend().

<end quote>

So if you let thread1 execute non-const s::op[], even if it doesn't write
to it, then thread0 must assume that it's reference into s is invalid.

In other words, you're right, but I think you've violated the C++ standard
combined with a non-C++-standard concept of level-1 thread safety.

Perhaps a fair thread-safety clause might look like:

1.It is safe to simultaneously call const and non-const from different
threads to distinct containers.

2.It is safe to simultaneously call const methods from different threads
to the same container.  Non-const methods may also be called as long as
they do not actually change the state of the container, including
invalidation of outstanding references or iterators.

3.It is not safe for different threads to simultaneously access methods of
a container when at least one thread changes the state of the container or
invalidates references or iterators of the container. The programmer is
responsible for using thread synchronization primitives (mutexex,
mutex-like objects, and so on) to avoid such situations.

Disclaimer:  This text has been lifted from:

Ignatchenko, Sergey: "STL Implementations and Thread Safety", C++ Report,
July-August 1998.

and mauled to try and make my point.

-Howard
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James.Kanze@dresdner-bank.com
Date: 1999/01/27
Raw View
In article <36cd9a11.54642421@netnews.worldnet.att.net>,
  dHarrison@worldnet.att.net (Doug Harrison) wrote:

> Let's make the above a little more perverse, assuming a typical CoW
> implementation:
>
> string s = "abc";
> const string& q = s;
>
> void thread0()
> {
>    q[1]; // 1
> }
>
> void thread1()
> {
>    string t = s; // 2
>    s[0]; // 3
> } // 4

Note that, once again, there is nothing specific to multithreading in
the example.

> // 1: Const operator[] does not mark s unshareable. Suppose thread0 is
> preempted after computing the address of s[1] but before dereferencing the
> pointer. Call this pointer p.
> // 2: Shares s's representation (Rep1) with t.
> // 3. Non-const operator[] forces unsharing of Rep1, which means s gets a
> new copy of Rep1, call it Rep2, but Rep1 remains with t, leaving p pointing
> into t's representation.

This operation invalidates all interators, references or pointers into
the string.  Thus, the results of 1, and anything derived from those
results.  Whether those results are in the same thread or not.

> // 4: t goes out of scope, so thread0 invokes undefined behavior when it
> gets around to executing *p, because p points into the now defunct Rep1.

Correct.  Those are the rules.  The reason for these rules is precisely
to allow reference counting, while still requiring operator[] to return
a real reference.

> So, I don't see how one can write a CoW std::basic_string that is safe in a
> multi-threaded environment even for read-only access without performing an
> excessive amount of locking or marking strings unshareable to an excessive
> extent.

Once again, it has nothing to do with thread safety (except, perhaps,
that is more difficult for the user to recognize when he has violated
the rules in a multithreaded environment).  Line 3 invalidates the
results of line 1, according to the standard, so an implementation is
not required to do any extra locking, etc. here.  The effort is on the
user, to not let this case occur.

> Neither would be necessary if it was permissible to use a reference
> proxy class instead of a real reference, but that would introduce other
> inefficiencies, which to be fair, are mitigated by the infrequency of users
> manipulating std::basic_string as an array.

This was the preferred recommendation of the French national body, and
this is what my earlier String classes always did, when they supported
modification of a single character at all.  (The French national body
pointed out a number of problems with the original string class
definition, and proposed several solutions.  The final text adopted none
of them, but solved the problem in yet another way.)

> The usual CoW approach really
> isn't copy-on-write so much as it is copy-just-in-case, and the CoW
> implementations I've seen aren't thread-safe at all, because read-only
> access by one thread can dump the representation employed by read-only
> access taking place in another thread.

If the user writes illegal code, the implementation is not required to
make it work, threads or not.  And I repeat, the problem has nothing to
do with threads:

    string          s = "abcde" ;
    string const&   t = s ;
    char const*     p = &t[ 1 ] ;
    string          u = s ;
    s[ 0 ] = '0' ;
    *p  //  Undefined behavior, according to the standard.

--
James Kanze                                           GABI Software, S   rl
Conseils en informatique orient    objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: kanze@gabi-soft.fr          mailto: James.Kanze@dresdner-bank.com

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1999/01/28
Raw View
[posted and mailed]

Doug Harrison<dHarrison@worldnet.att.net> wrote:
>maxepolk@my-dejanews.com wrote:
>>>   "Cipolli" <stephen.cipolli@fcc.net> wrote:
>>> > Is the following code considered legal  ... ?
>>> > string s;
>>> > void thread0() { s[0] = s[1]; }
>>> > void thread1() { string t = s; }

Note that even in a thread-safe implementation, using the same object
in two different threads without locking (as above) is a bug waiting to
happen.  std::basic_string<> makes no distinction between "read-sharing"
and "sharing" in general, and any visible sharing (as above) requires a
lock.

>Let's make the above a little more perverse, assuming a typical CoW
>implementation:
>
>string s = "abc";
>const string& q = s;
>void thread0() { q[1]; } // 1
>void thread1()
>{  string t = s; // 2
>   s[0]; // 3
>} // 4
>
>// 1: Const operator[] does not mark s unshareable. Suppose thread0 is
>preempted after computing the address of s[1] but before dereferencing the
>pointer. Call this pointer p.
>// 2: Shares s's representation (Rep1) with t.
>// 3. Non-const operator[] forces unsharing of Rep1, which means s gets a
>new copy of Rep1, call it Rep2, but Rep1 remains with t, leaving p pointing
>into t's representation.
>// 4: t goes out of scope, so thread0 invokes undefined behavior when it
>gets around to executing *p, because p points into the now defunct Rep1.
>
>So, I don't see how one can write a CoW std::basic_string that is safe in a
>multi-threaded environment even for read-only access without performing an
>excessive amount of locking or marking strings unshareable to an excessive
>extent. ...  The usual CoW approach really
>isn't copy-on-write so much as it is copy-just-in-case, and the CoW
>implementations I've seen aren't thread-safe at all, because read-only
>access by one thread can dump the representation employed by read-only
>access taking place in another thread.

Typical string implementations are thread-safe under the definition
that any visible sharing, as in the examples above, requires a lock.
If you invent another form of thread-safety, in which read access
doesn't require a lock, you should also invent another name for it.

Whether locking for read access is "excessive" depends on the
application.  If it is, then it would pay to build a different
string representation.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: "Cipolli" <stephen.cipolli@fcc.net>
Date: 1999/01/28
Raw View

James.Kanze@dresdner-bank.com wrote in message
<78ndj6$53k$1@nnrp1.dejanews.com>...
>
>In article <36cd9a11.54642421@netnews.worldnet.att.net>,
>  dHarrison@worldnet.att.net (Doug Harrison) wrote:
>

>If the user writes illegal code, the implementation is not required to
>make it work, threads or not.  And I repeat, the problem has nothing to
>do with threads:


I am beginning to understand.  In my original post I had made the problem
multithreaded to allow the second reference to the string buffer to go out
of scope forcing the reference to point to a freed buffer.  This is more of
an implementation detail because typical COW implementations don't copy the
buffer before marking it unshareable if the buffer has no other references.
This, as I say, is purely an implementation detail.

However, I am still having trouble with the standard's language on the
subject. Given the following:

string s="abc";
const string& cs = s;
const char& c1 = cs[1];    // line 1
char& c0 = s[0];                // line 2
c0 = c1;

At line 1, const operator[] is called and no copying or marking
non-shareable occurs.  At line 2, c1 should be invalid and of course c0
should be fine.  Also, s's string buffer should have been copied and marked
non-shareable.

My problem is this, the standard says that references and pointers are
invalidated by

-- Calling non-const member funtions except operator[](), at() , ...

Doesn't this mean that the call to line 2 is not supposed to invalidate c1?
I am obviously not reading this right.  It does appear to me that the next
sentence

-- Subsequent to any of the above uses ... the first call to non-const
member functions operator[](), at(), ...

means that if a second call to non-const operator[] was encountered that c1
would be then invalidated.  What am I missing here?

--Stephen
stephen.cipolli@fcc.net



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: dHarrison@worldnet.att.net (Doug Harrison)
Date: 1999/01/29
Raw View
Howard Hinnant wrote:

>Good example.  But doesn't the standard address this?
>
><quote>
>
>-5- References, pointers, and iterators referring to the elements of a
>basic_string sequence may be invalidated by the following
>uses of that basic_string object:
>
>....
>
>Calling non-const member functions, except operator[](), at(), begin(),
>rbegin(), end(), and rend().

Does this include ctors? Strange as it is to consider the constness of
ctors, I think it must, in order for the following clause to work.

>Subsequent to any of the above uses except the forms of insert() and
>erase() which return iterators, the first call to non-const member
>functions operator[](), at(), begin(), rbegin(), end(), or rend().
>
><end quote>
>
>So if you let thread1 execute non-const s::op[], even if it doesn't write
>to it, then thread0 must assume that it's reference into s is invalid.
>
>In other words, you're right, but I think you've violated the C++ standard
>combined with a non-C++-standard concept of level-1 thread safety.

I'll agree, but users are definitely going to expect that simultaneous
read-only access is thread-safe, and if it isn't, even experienced users are
going to make mistakes, because it's possible to foul things up in very
subtle ways. I really think that the C++ Standard isn't useful in
determining behavior in this context; it doesn't address multithreading at
all, and following it blindly, for purposes beyond its scope, can lead you
into a wall, and you hurt your nose. Implementations that don't change their
single-threaded CoW (really copy-just-in-case) strategies in MT programs
subject their users to some very subtle problems.

--
Doug Harrison
dHarrison@worldnet.att.net


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: AllanW@my-dejanews.com
Date: 1999/01/29
Raw View

> >  "Cipolli" <stephen.cipolli@fcc.net> wrote:
> >> What is the guiding philosophy here?  Use copy-on-write implementations
> >> only in non-multithreaded environments?

> <AllanW@my-dejanews.com> wrote:
> >    6 [Note: These rules are formulated to allow, but not require, a
> >      reference-counted implementation.
> >
> >I don't think that was the intent, but I do believe that is the result.

In article <78k9rt$3m7$1@shell7.ba.best.com>,
  ncm@nospam.cantrip.org (Nathan Myers) wrote:
> Allen may in fact believe that, but it is not true.
AllanW, please.

> >If you haven't already done so, be sure to check out
> >    http://www.sgi.com/Technology/STL/string_discussion.html
> >for a discussion about why they didn't implement reference counting
> >on strings.
>
> That document is far from the last word on the subject.

No, of course not. I even pointed out that it was rather old and
based on a C++ draft. However, it is insightful and poses some
rather serious problems.

> In fact,
> it is quite possible (and practical) to implement a good MT-safe
> reference-counting std::basic_string<>.

Would you care to explain this? How do you know? Have you written
one? If so, can we please see it, or at least a design document
specifying how you did get around the inherent problems?

SGI didn't say it was impossible; they said (paraphrasing) that it
was impractical, because the string would be marked unsharable so
often that the result would rarely share strings. Do you agree with
this assessment? If not, would you please explain where their thinking
is faulty, or how you did (or would) overcome the problems they cite?

Also, please consider Doug Harrison's example from another post:

    string s = "abc";
    const string& q = s;
    void thread0() {
        q[1]; // 1
    }
    void thread1() {
        string t = s; // 2
        s[0]; // 3
    }
    // 1: Const operator[] does not mark s unshareable. Suppose
       thread0 is preempted after computing the address of s[1]
       but before dereferencing the pointer. Call this pointer p.
    // 2: Shares s's representation (Rep1) with t.
    // 3. Non-const operator[] forces unsharing of Rep1, which
       means s gets a new copy of Rep1, call it Rep2, but Rep1
       remains with t, leaving p pointing into t's representation.
    // 4: t goes out of scope, so thread0 invokes undefined
       behavior when it gets around to executing *p, because p
       points into the now defunct Rep1.

Can your implementation stand up to this type of situation?

Theoretically, theory is the same as practice. In practice, they
are usually different! It's easy to say that MT COW strings should
be practical, but how do you know?

----
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: hinnant@_anti-spam_lightlink.com (Howard Hinnant)
Date: 1999/01/29
Raw View
In article <78stct$tkt$1@nnrp1.dejanews.com>, AllanW@my-dejanews.com wrote:

> Also, please consider Doug Harrison's example from another post:
>
>     string s = "abc";
>     const string& q = s;
>     void thread0() {
>         q[1]; // 1
>     }
>     void thread1() {
>         string t = s; // 2
>         s[0]; // 3
>     }
>     // 1: Const operator[] does not mark s unshareable. Suppose
>        thread0 is preempted after computing the address of s[1]
>        but before dereferencing the pointer. Call this pointer p.
>     // 2: Shares s's representation (Rep1) with t.
>     // 3. Non-const operator[] forces unsharing of Rep1, which
>        means s gets a new copy of Rep1, call it Rep2, but Rep1
>        remains with t, leaving p pointing into t's representation.
>     // 4: t goes out of scope, so thread0 invokes undefined
>        behavior when it gets around to executing *p, because p
>        points into the now defunct Rep1.
>
> Can your implementation stand up to this type of situation?

Others have already stated that this problem can be recast to a single
thread problem:

std::string s = "abc";
const std::string& q = s;

void thread0() {
   const char& p = q[1]; // 1
   thread1();
   std::cout << p << '\n';
}

void thread1() {
   std::string t = s; // 2
   s[0]; // 3
   t = "123";
}

The standard states that s[0] in thread1() may invalidate any outstanding
references and iterators.  Thus, a standard conforming string may
invalidate "p" in thread0(), when s[0] is executed.  This is standard
behavior whether or not the string is ref-counted.

The above snippet of code has undefined behavior because it references the
invalidated reference p when it tries to print it out.

Since the original code may be executed in the order of the transformed
single-threaded version, it also has undefined behavior.

std::string may be made both level-1 thread safe, and refcounted.  Nathan
has succesfully argued that fact in this group before (search for "cow").
Any notion of thread safety must take into account outstanding iterators
and references, and what methods may invalidate them.

-Howard


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: dHarrison@worldnet.att.net (Doug Harrison)
Date: 1999/01/30
Raw View
James.Kanze@dresdner-bank.com wrote:

>If the user writes illegal code, the implementation is not required to
>make it work, threads or not.  And I repeat, the problem has nothing to
>do with threads:
>
>    string          s = "abcde" ;
>    string const&   t = s ;
>    char const*     p = &t[ 1 ] ;
>    string          u = s ;
>    s[ 0 ] = '0' ;
>    *p  //  Undefined behavior, according to the standard.

At least the problem with this example is a bit more obvious. For one,
you're retaining a pointer, and for another, you're modifying the string. My
example did neither. From the user's perspective, it was pure read-only
access. Here's another example that's a little more fleshed out (if not less
contrived):

string s = "abc";
const string& q = s;

bool thread1()
{
   // Could also be s.find() in typical implementations
   return q[0] == q[1];
}

bool thread2()
{
   string t = s;
   return s[0] == s[1];
}

If I was a library vendor, I wouldn't look forward to telling users that the
code above is undefined, even with the C++ Standard to back me up. I think
implementations really need to make this sort of thing work.

--
Doug Harrison
dHarrison@worldnet.att.net
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: AllanW@my-dejanews.com
Date: 1999/01/30
Raw View
In article <78ndj6$53k$1@nnrp1.dejanews.com>,
  James.Kanze@dresdner-bank.com wrote:
>
> In article <36cd9a11.54642421@netnews.worldnet.att.net>,
>   dHarrison@worldnet.att.net (Doug Harrison) wrote:
>
> > Let's make the above a little more perverse, assuming a typical CoW
> > implementation:
> >
> > string s = "abc";
> > const string& q = s;
> >
> > void thread0()
> > {
> >    q[1]; // 1
> > }
> >
> > void thread1()
> > {
> >    string t = s; // 2
> >    s[0]; // 3
> > } // 4
>
> Note that, once again, there is nothing specific to multithreading in
> the example.

Sure there is, as listed in the next paragraph you quoted:

> > // 1: Const operator[] does not mark s unshareable. Suppose thread0 is
> > preempted after computing the address of s[1] but before dereferencing the
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^
> > pointer. Call this pointer p.

This can only happen in an MT environment.

> > // 2: Shares s's representation (Rep1) with t.
> > // 3. Non-const operator[] forces unsharing of Rep1, which means s gets a
> > new copy of Rep1, call it Rep2, but Rep1 remains with t, leaving p pointing
> > into t's representation.
>
> This operation invalidates all interators, references or pointers into
> the string.  Thus, the results of 1, and anything derived from those
> results.  Whether those results are in the same thread or not.

Exactly.

> > // 4: t goes out of scope, so thread0 invokes undefined behavior when it
> > gets around to executing *p, because p points into the now defunct Rep1.
>
> Correct.  Those are the rules.  The reason for these rules is precisely
> to allow reference counting, while still requiring operator[] to return
> a real reference.

Which conflicts with multi-threading.

> > So, I don't see how one can write a CoW std::basic_string that is safe in a
> > multi-threaded environment even for read-only access without performing an
> > excessive amount of locking or marking strings unshareable to an excessive
> > extent.
>
> Once again, it has nothing to do with thread safety (except, perhaps,
> that is more difficult for the user to recognize when he has violated
> the rules in a multithreaded environment).  Line 3 invalidates the
> results of line 1, according to the standard, so an implementation is
> not required to do any extra locking, etc. here.  The effort is on the
> user, to not let this case occur.

Line 1 doesn't do anything to hang on to the reference; it
simply makes it. A more useful example would be

    char q1 = q[1];

In a non-MT environment, this cannot be erroneous. You make a
reference to the second character, then you dereference it, and
store a copy of the char (not a reference to the char).

In MT, it's possible that thread1 interrupts thread0. Here the
reference could be invalidated after it's computed but before
it's dereferenced.

You seem to be saying that line3 causes line1 to be invalid all
the time, MT or not. Are you suggesting that any mutating
operation on any string, makes any use of operator[] invalid
anywhere in the program, even when single-threaded?

> > Neither would be necessary if it was permissible to use a reference
> > proxy class instead of a real reference, but that would introduce other
> > inefficiencies, which to be fair, are mitigated by the infrequency of users
> > manipulating std::basic_string as an array.
>
> This was the preferred recommendation of the French national body, and
> this is what my earlier String classes always did, when they supported
> modification of a single character at all.  (The French national body
> pointed out a number of problems with the original string class
> definition, and proposed several solutions.  The final text adopted none
> of them, but solved the problem in yet another way.)

How did the final text solve the problem?

> > The usual CoW approach really
> > isn't copy-on-write so much as it is copy-just-in-case, and the CoW
> > implementations I've seen aren't thread-safe at all, because read-only
> > access by one thread can dump the representation employed by read-only
> > access taking place in another thread.
>
> If the user writes illegal code, the implementation is not required to
> make it work, threads or not.  And I repeat, the problem has nothing to
> do with threads:
>
>     string          s = "abcde" ;
>     string const&   t = s ;
>     char const*     p = &t[ 1 ] ;
>     string          u = s ;
>     s[ 0 ] = '0' ;
>     *p  //  Undefined behavior, according to the standard.

But this is different than the example above, when single-threaded.

----
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1999/01/30
Raw View
<AllanW@my-dejanews.com> wrote:
>> >  "Cipolli" <stephen.cipolli@fcc.net> wrote:
>> >> What is the guiding philosophy here?  Use copy-on-write implementations
>> >> only in non-multithreaded environments?
>
>> <AllanW@my-dejanews.com> wrote:
>> >    6 [Note: These rules are formulated to allow, but not require, a
>> >      reference-counted implementation.
>> >
>> >I don't think that was the intent, but I do believe that is the result.
>
> ncm@nospam.cantrip.org (Nathan Myers) wrote:
>> Allen may in fact believe that, but it is not true.
>AllanW, please.

AllanW.

>> >If you haven't already done so, be sure to check out
>> >    http://www.sgi.com/Technology/STL/string_discussion.html
>> >for a discussion about why they didn't implement reference counting
>> >on strings.
>>
>> That document is far from the last word on the subject. In fact,
>> it is quite possible (and practical) to implement a good MT-safe
>> reference-counting std::basic_string<>.
>
>Would you care to explain this? How do you know? Have you written
>one? If so, can we please see it, or at least a design document
>specifying how you did get around the inherent problems?

It is publicly available (in the Egcs libstdc++ rewrite snapshots),
though the use of MT operations (atomic increment, atomic decrement-
and-test) have not been installed yet.  The 'inherent' problems are
not so inherent.

>SGI didn't say it was impossible; they said (paraphrasing) that it
>was impractical, because the string would be marked unsharable so
>often that the result would rarely share strings. Do you agree with
>this assessment? If not, would you please explain where their thinking
>is faulty, or how you did (or would) overcome the problems they cite?

Many operations mark the string unshareable, and many others mark
it shareable again.  Many other operations don't "mark" it at all.
If, as in most programs, you just copy and append strings without
using operator[], then strings will never be "marked" unshareable.
If you are careful to arrange that strings which are "marked" are
"unmarked" again before you copy them, then there will be no
unnecessary copying.

I am among the first to point out that the design of basic_string<>
has many unfortunate aspects -- it probably suffered more from the
"committee effect" than anything else in the standard.  Still, its
consequences are better discussed precisely in terms of actual
implementations and actual programs.  Hypothetical analyses shed
much less light.

>Also, please consider Doug Harrison's example from another post: ...
>Can your implementation stand up to this type of situation?

As noted elsewhere, visibly sharing a standard-library object (_any_
standard-library object) between threads requires a user-level lock.

The thread-safety built into a basic_string implementation is there
to protect you when you use two different string objects in different
threads where the string objects happen to (invisibly) share storage.
User-visible sharing is your problem.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James.Kanze@dresdner-bank.com
Date: 1999/02/01
Raw View
In article <36b5c190.4553890@netnews.worldnet.att.net>,
  dHarrison@worldnet.att.net (Doug Harrison) wrote:
> James.Kanze@dresdner-bank.com wrote:
>
> >If the user writes illegal code, the implementation is not required to
> >make it work, threads or not.  And I repeat, the problem has nothing t=
o
> >do with threads:
> >
> >    string          s =3D "abcde" ;
> >    string const&   t =3D s ;
> >    char const*     p =3D &t[ 1 ] ;
> >    string          u =3D s ;
> >    s[ 0 ] =3D '0' ;
> >    *p  //  Undefined behavior, according to the standard.
>
> At least the problem with this example is a bit more obvious. For one,
> you're retaining a pointer, and for another, you're modifying the strin=
g. My
> example did neither. From the user's perspective, it was pure read-only
> access.

I don't disagree that it is often less obvious what is happening in a
multi-threaded environment. My point is simply that if you are talking
about the standard, it's neither relevant nor necessary.  You can almost
always simulate the effects of multithreading with some clever use of
the comma operator, for example.

> Here's another example that's a little more fleshed out (if not less
> contrived):
>
> string s =3D "abc";
> const string& q =3D s;
>
> bool thread1()
> {
>    // Could also be s.find() in typical implementations
>    return q[0] =3D=3D q[1];
> }
>
> bool thread2()
> {
>    string t =3D s;
>    return s[0] =3D=3D s[1];
> }
>
> If I was a library vendor, I wouldn't look forward to telling users tha=
t the
> code above is undefined, even with the C++ Standard to back me up. I th=
ink
> implementations really need to make this sort of thing work.

I disagree.  Code like the above is inherantly not thread safe,
regardless of the library.  You have no right to suppose that any
library operation is atomic, and in your example, you are accessing the
same object in two different threads.  It is not hard to imagine
implementations without sharing where it might fail.  (In practice,
without sharing, it only works because the typical implementation
consists of a single variable which can be read and written atomically.)

--
James Kanze                                           GABI Software, S=E0=
rl
Conseils en informatique orient=E9 objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: kanze@gabi-soft.fr          mailto: James.Kanze@dresdner-bank.com

-----------=3D=3D Posted via Deja News, The Discussion Network =3D=3D----=
------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own  =
 =20
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James.Kanze@dresdner-bank.com
Date: 1999/02/01
Raw View
In article <78t651$622$1@nnrp1.dejanews.com>,
  AllanW@my-dejanews.com wrote:
> In article <78ndj6$53k$1@nnrp1.dejanews.com>,
>   James.Kanze@dresdner-bank.com wrote:
> >
> > In article <36cd9a11.54642421@netnews.worldnet.att.net>,
> >   dHarrison@worldnet.att.net (Doug Harrison) wrote:
> >
> > > Let's make the above a little more perverse, assuming a typical CoW
> > > implementation:
> > >
> > > string s = "abc";
> > > const string& q = s;
> > >
> > > void thread0()
> > > {
> > >    q[1]; // 1
> > > }
> > >
> > > void thread1()
> > > {
> > >    string t = s; // 2
> > >    s[0]; // 3
> > > } // 4
> >
> > Note that, once again, there is nothing specific to multithreading in
> > the example.
>
> Sure there is, as listed in the next paragraph you quoted:
>
> > > // 1: Const operator[] does not mark s unshareable. Suppose thread0 is
> > > preempted after computing the address of s[1] but before dereferencing the
>               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^
> > > pointer. Call this pointer p.
>
> This can only happen in an MT environment.

In this exact example.  It isn't hard to create the exact same sequence
of events with a bit of judicious use of the comma operator.

> > > // 2: Shares s's representation (Rep1) with t.
> > > // 3. Non-const operator[] forces unsharing of Rep1, which means s gets a
> > > new copy of Rep1, call it Rep2, but Rep1 remains with t, leaving p
pointing
> > > into t's representation.
> >
> > This operation invalidates all interators, references or pointers into
> > the string.  Thus, the results of 1, and anything derived from those
> > results.  Whether those results are in the same thread or not.
>
> Exactly.
>
> > > // 4: t goes out of scope, so thread0 invokes undefined behavior when it
> > > gets around to executing *p, because p points into the now defunct Rep1.
> >
> > Correct.  Those are the rules.  The reason for these rules is precisely
> > to allow reference counting, while still requiring operator[] to return
> > a real reference.
>
> Which conflicts with multi-threading.

Again, I don't quite see what multi-threading has to do with it.  If
your argument is that the rules aren't particularly intuitive, and
occasionally lead to undefined behavior if the programmer isn't wary, I
agree with you.  If your argument is that it is easier to make such
mistakes in a multi-threaded environment, this is also true.  But
neither of these arguments have anything to do with copy on write.  The
code above will fail unless all operations on the strings are atomic,
and I see nothing in the standard which would even suggest this.

The example code is wrong, period, and independant of the definition of
the class string.  You are accessing the same object in different
threads, without any precautions.  I don't see why anyone would expect
it to work.  (I do understand that such errors can easily occur, and
that they are hard to spot.  But that doesn't stop them from being
errors.)

> > > So, I don't see how one can write a CoW std::basic_string that is safe in
a
> > > multi-threaded environment even for read-only access without performing an
> > > excessive amount of locking or marking strings unshareable to an excessive
> > > extent.
> >
> > Once again, it has nothing to do with thread safety (except, perhaps,
> > that is more difficult for the user to recognize when he has violated
> > the rules in a multithreaded environment).  Line 3 invalidates the
> > results of line 1, according to the standard, so an implementation is
> > not required to do any extra locking, etc. here.  The effort is on the
> > user, to not let this case occur.
>
> Line 1 doesn't do anything to hang on to the reference; it
> simply makes it. A more useful example would be
>
>     char q1 = q[1];
>
> In a non-MT environment, this cannot be erroneous. You make a
> reference to the second character, then you dereference it, and
> store a copy of the char (not a reference to the char).

There are a lot of things that cannot be erroneous in a single threaded
environment, but won't work in a multi-threaded one.  Accessing any
variable (even a built-in type, sometimes) in one thread, and modifying
it in another, without locking both accesses, is a programming error.
It's as simple as that.  (I'm talking here about more or less portable
application programming.  If you're implementing a string class for a
specific machine, and that machine guarantees the atomicity of certain
operations, then by all means, use them.)

> In MT, it's possible that thread1 interrupts thread0. Here the
> reference could be invalidated after it's computed but before
> it's dereferenced.
>
> You seem to be saying that line3 causes line1 to be invalid all
> the time, MT or not. Are you suggesting that any mutating
> operation on any string, makes any use of operator[] invalid
> anywhere in the program, even when single-threaded?

No.  I'm saying that it isn't too difficult to invent variants of the
program which display the same problem in a single threaded environment.

> > > Neither would be necessary if it was permissible to use a reference
> > > proxy class instead of a real reference, but that would introduce other
> > > inefficiencies, which to be fair, are mitigated by the infrequency of
users
> > > manipulating std::basic_string as an array.
> >
> > This was the preferred recommendation of the French national body, and
> > this is what my earlier String classes always did, when they supported
> > modification of a single character at all.  (The French national body
> > pointed out a number of problems with the original string class
> > definition, and proposed several solutions.  The final text adopted none
> > of them, but solved the problem in yet another way.)
>
> How did the final text solve the problem?

It defines clearly when iterators are or are not valid.  Which solves
the problem as far as the language standard is concerned.  (I would have
preferred a solution which placed less of a burden on the users, but
that's life.)

> > > The usual CoW approach really
> > > isn't copy-on-write so much as it is copy-just-in-case, and the CoW
> > > implementations I've seen aren't thread-safe at all, because read-only
> > > access by one thread can dump the representation employed by read-only
> > > access taking place in another thread.
> >
> > If the user writes illegal code, the implementation is not required to
> > make it work, threads or not.  And I repeat, the problem has nothing to
> > do with threads:
> >
> >     string          s = "abcde" ;
> >     string const&   t = s ;
> >     char const*     p = &t[ 1 ] ;
> >     string          u = s ;
> >     s[ 0 ] = '0' ;
> >     *p  //  Undefined behavior, according to the standard.
>
> But this is different than the example above, when single-threaded.

But it displays exactly the same symptom.  The only difference is that
the order of the operations is more visible.

--
James Kanze                                           GABI Software, S   rl
Conseils en informatique orient    objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: kanze@gabi-soft.fr          mailto: James.Kanze@dresdner-bank.com

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: maxepolk@my-dejanews.com
Date: 1999/01/26
Raw View
In article <78ijij$5vg$1@nnrp1.dejanews.com>,
AllanW@my-dejanews.com says...
>
> In article <789chn$chf$1@east44.supernews.com>,
>   "Cipolli" <stephen.cipolli@fcc.net> wrote:
> > Is the following code considered legal (I could to find
> > anything in the C++ Standard which makes it illegal):
> > string s;
> > void thread0() { s[0] = s[1]; }
> > void thread1() { string t = s; }

Note that as soon as you get a reference to a character in a
string, the string must immediately make a separate copy of
itself if it is shared between two references because that
reference to a character can be used to change the string at
some later point.  Also, the string must immediately mark
itself as ineligible for being shared for the same reason.

Once the string itself changes, that reference cannot be valid
any longer because the string can reposition itself in memory,
leaving the reference to a character pointing to no-man's land.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: "Cipolli" <stephen.cipolli@fcc.net>
Date: 1999/01/26
Raw View
AllanW@my-dejanews.com wrote in message <78ijij$5vg$1@nnrp1.dejanews.com>...
>
>In article <789chn$chf$1@east44.supernews.com>,
>  "Cipolli" <stephen.cipolli@fcc.net> wrote:

<snip>

>From 21.3: (Emphasis mine)
>
>   5 References, pointers, and iterators referring to the elements of
>     a basic_string sequence may be invalidated by the following uses
>     of that basic_string object:
>
>     ...
>
>     -- Calling non-const member functions, EXCEPT operator [](),
>        at(), begin(), rbegin(), end(), and rend().
>
>Because of the word EXCEPT here, this actually says that [], at, and
>so on do NOT invalidate iterators.
I believe in a typical COW implentation, these functions copy their buffers
and make them non-shareable.  Subsequent calls to these function will then
not have to copy themselves and the string buffers will stay valid.
>
>     -- Subsequent to any of the above uses except the forms of
>        insert() and erase() which return iterators, the first call
>        to non-const member functions operator[](), at(), begin(),
>        rbegin(), end(), rend().
>
>I interpret "Subsequent to any of the above uses" to mean "After one
>of these uses which we already said may invalidate pointers/iterators."
>So what this seems to mean is that after we invalidate iterators any
>other way (except for insert() and erase()), the next time we use
>[], at, and so on we invalidate them AGAIN. Which seems pointless.

What I think they are getting at here, is that if say we did the following:
string x("hello");           // line 1
string y = x;                  // line 2
char& a = x[0];            // line 3
char& b = x[1];            // line 4
y.erase(0, 3);              // line 5
char& c = x[2];            // line 6
At line 2 x and y would share the same buffer.  At line 3 x would allocate a
new buffer and copy and mark it non-shareable; returning the reference to
the character in the new buffer.  At line 4 the no copying would be
necessary and a refernce to the character in the buffer created in line 3
would be returned.  At line 5 the references a and b would be invalidated
and x's string buffer would be made shareable.  At line 6 the allocate, copy
and mark non-shareable logic would apply.
>
<snip>

>> What is the guiding philosophy here?  Use copy-on-write implementations
only
>> in non-multithreaded environments?
>
>I don't think that was the intent, but I do believe that is the
>result.
>
>If you haven't already done so, be sure to check out
>    http://www.sgi.com/Technology/STL/string_discussion.html
>for a discussion about why they didn't implement reference counting
>on strings. The discussion was written long ago, before the standard
>was approved, but it still seems valid today. Note that their
>discussion assumes multiple threads; there is nothing presented which
>would prevent reference-counted strings in single-thread environments.
>
I have looked at the article.  I agree, it is somewhat out dated and does
not seem to get into enough detail as you point out.  Also, it mentions the
'97 draft had some words explictily banning my example, but the standard
seems to be devoid of such wording.

>What if basic_string was not considered a sequence?

<snip>

>Such a class would solve most (not all) of the problems that are
>currently being solved by basic_string. But it wouldn't conform
>to what C/C++ programmers think of as a string, so I guess it is
>doomed.


After thinking about the problem a bit more, I can see a way out of the
problem by having the const versions of operator[], at, begin, etc. perform
the same actions as their non-const counter parts.  This does not violate
the standard, but has obvious performance consequences (more copying).

What I'm looking for is someone on the standards committee to verify that
the above solution was the "way out" of this problem or was there some other
thinking.  A brief history of what transpired and what the thinking was
might clear this problem right up.

Thanks,
Stephen

Stephen Cipolli
stephen.cipolli@fcc.net
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James.Kanze@dresdner-bank.com
Date: 1999/01/26
Raw View
In article <789chn$chf$1@east44.supernews.com>,
  "Cipolli" <stephen.cipolli@fcc.net> wrote:
> Is the following code considered legal (I could to find anything in the C++
> Standard which makes it illegal):
> string s;
> void thread0() { s[0] = s[1]; }
> void thread1() { string t = s; }
>
> Given a copy-on-write implementation and the fact that the compiler is free
> to evaluate s[0] and s[1] in either order (s[0] then s[1], or s[1] then
> s[0]) s[1] could evalute first and get a reference to the current string
> representation, then s[0] will cause a duplicate representation to be
> produced and a reference to it returned.  If after both s[1] and s[0] are
> evaluated, but before the assignment occurs, t goes out of scope (in another
> thread) the memory reference returned by s[1] would be invalid.
>
> I am aware that the Standard does not speak to multithreading, however this
> is the only aspect of the Standard Library I have encountered where the
> Standard seems to prevent a useful implementation in a multithreaded
> environment (at least one with copy-on-write semantics).

You don't need multi-threading to get the problem: just rewrite the
program as:

    void thread0() { s[ thread1() , 0 ] = s[ 1 ] ; }
    void thread1() { string t = s ; }

The program is fully legal, and required to work. In general, I think
that the solution is to     lock     the string anytime a reference to a
character in it is returned.  The action of locking ensures that 1) the
current copy is unique, and 2) copy-on-write is suspended for this
string until it is unlocked (pratically, until a function is called that
invalidates the references).

IMHO, a better solution would involve operator[] returning a helper
class, but the standard forbids it.

--
James Kanze                                           GABI Software, S   rl
Conseils en informatique orient    objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: kanze@gabi-soft.fr          mailto: James.Kanze@dresdner-bank.com

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: James.Kanze@dresdner-bank.com
Date: 1999/01/27
Raw View
In article <78ijij$5vg$1@nnrp1.dejanews.com>,
  AllanW@my-dejanews.com wrote:

> What if basic_string was not considered a sequence? We could have
> had a basic_string without operator[], begin(), end(), et. al.
> at()const would still exist, returning a character by value. There
> would be no way to get a pointer or iterator or reference to the
> internal data. This needn't be a loss of functionality; we could
> add member functions like left(n), right(n), mid(a,b), and so on
> to create new strings from portions of the old strings. We could
> add member functions like replace(n,char) and delete(n,char,count=1)
> and insert(n,char) and insert(n,basic_string&) to manipulate the
> existing string.

Why not a non-modifiable string, as in Java, with a separate class for
manipulating.  (This is what I did, many years back before there was a
standard string class.)

Note that modifying a string through operator[] is rarely really useful,
at least in an international environment.  Functions like replace are
useful, but can be expressed in terms of assignment, i.e. the semantics
of replace are the create a new string, and then assign it to the old
one.  (This is how I implemented it when I added the functionality to my
original string class.)

For that matter, it would suffice if the standard allowed returning a
helper class from operator[].

--
James Kanze                                           GABI Software, S   rl
Conseils en informatique orient    objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: kanze@gabi-soft.fr          mailto: James.Kanze@dresdner-bank.com

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1999/01/27
Raw View
<AllanW@my-dejanews.com> wrote:
>  "Cipolli" <stephen.cipolli@fcc.net> wrote:
>> Is the following code considered legal (I could to find anything in the C++
>> Standard which makes it illegal):
>> string s;
>> void thread0() { s[0] = s[1]; }
>> void thread1() { string t = s; }
>
>    6 [Note: These rules are formulated to allow, but not require, a
>      reference-counted implementation.
>
>> What is the guiding philosophy here?  Use copy-on-write implementations
>> only in non-multithreaded environments?
>
>I don't think that was the intent, but I do believe that is the result.

Allen may in fact believe that, but it is not true.

>If you haven't already done so, be sure to check out
>    http://www.sgi.com/Technology/STL/string_discussion.html
>for a discussion about why they didn't implement reference counting
>on strings.

That document is far from the last word on the subject.  In fact,
it is quite possible (and practical) to implement a good MT-safe
reference-counting std::basic_string<>.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Francis Glassborow <francis@robinton.demon.co.uk>
Date: 1999/01/25
Raw View
In article <789chn$chf$1@east44.supernews.com>, Cipolli
<stephen.cipolli@fcc.net> writes
>What is the guiding philosophy here?  Use copy-on-write implementations only
>in non-multithreaded environments?

The C++ Standard has exactly nothing to say about multi-threaded code.

Francis Glassborow      Chair of Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: sbnaran@localhost.localdomain (Siemel Naran)
Date: 1999/01/25
Raw View
On 25 Jan 1999 16:48:49 GMT, Francis Glassborow
>In article <789chn$chf$1@east44.supernews.com>, Cipolli

>>What is the guiding philosophy here?  Use copy-on-write implementations only
>>in non-multithreaded environments?

>The C++ Standard has exactly nothing to say about multi-threaded code.

But doesn't the standard provide the tools to let you do
multi-threading, in the same way that it provides you the
tools to let you do multi-dispatching?  Anyway, I find
this topic of multi-threading very confusing, so I'll
just listen for the most part.

Incidentally, there was a GotW about this about two months
ago.  See "www.cntc.com/resources", and look for the
GotW called "reference counting III" or something like
that.

--
----------------------------------
Siemel B. Naran (sbnaran@uiuc.edu)
----------------------------------
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: AllanW@my-dejanews.com
Date: 1999/01/25
Raw View
In article <789chn$chf$1@east44.supernews.com>,
  "Cipolli" <stephen.cipolli@fcc.net> wrote:
> Is the following code considered legal (I could to find anything in the C++
> Standard which makes it illegal):
> string s;
> void thread0() { s[0] = s[1]; }
> void thread1() { string t = s; }
>
> Given a copy-on-write implementation and the fact that the compiler is free
> to evaluate s[0] and s[1] in either order (s[0] then s[1], or s[1] then
> s[0]) s[1] could evalute first and get a reference to the current string
> representation, then s[0] will cause a duplicate representation to be
> produced and a reference to it returned.  If after both s[1] and s[0] are
> evaluated, but before the assignment occurs, t goes out of scope (in another
> thread) the memory reference returned by s[1] would be invalid.
>
> I am aware that the Standard does not speak to multithreading, however this
> is the only aspect of the Standard Library I have encountered where the
> Standard seems to prevent a useful implementation in a multithreaded
> environment (at least one with copy-on-write semantics).



Author: "Cipolli" <stephen.cipolli@fcc.net>
Date: 1999/01/24
Raw View
Is the following code considered legal (I could to find anything in the C++
Standard which makes it illegal):
string s;
void thread0() { s[0] = s[1]; }
void thread1() { string t = s; }

Given a copy-on-write implementation and the fact that the compiler is free
to evaluate s[0] and s[1] in either order (s[0] then s[1], or s[1] then
s[0]) s[1] could evalute first and get a reference to the current string
representation, then s[0] will cause a duplicate representation to be
produced and a reference to it returned.  If after both s[1] and s[0] are
evaluated, but before the assignment occurs, t goes out of scope (in another
thread) the memory reference returned by s[1] would be invalid.

I am aware that the Standard does not speak to multithreading, however this
is the only aspect of the Standard Library I have encountered where the
Standard seems to prevent a useful implementation in a multithreaded
environment (at least one with copy-on-write semantics).

What is the guiding philosophy here?  Use copy-on-write implementations only
in non-multithreaded environments?

Thanks

Stephen Cipolli
stephen.cipolli@fcc.net
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Jeff Greif <jmg@trivida.com>
Date: 1999/01/25
Raw View
Third-hand info suggests that this issue has been discussed by the standards
committee.  See
http://www.sgi.com/Technology/STL/string_discussion.html
for one set of comments.

Jeff

Cipolli wrote:
....

> Given a copy-on-write implementation and the fact that the compiler is free
> to evaluate s[0] and s[1] in either order (s[0] then s[1], or s[1] then
> s[0]) s[1] could evalute first and get a reference to the current string
> representation, then s[0] will cause a duplicate representation to be
> produced and a reference to it returned.  If after both s[1] and s[0] are
> evaluated, but before the assignment occurs, t goes out of scope (in another
> thread) the memory reference returned by s[1] would be invalid.
> ...
> I am aware that the Standard does not speak to multithreading, however this
> is the only aspect of the Standard Library I have encountered where the
> Standard seems to prevent a useful implementation in a multithreaded
> environment (at least one with copy-on-write semantics).

....
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]