Thread

Topic: reference counted objects verses reference counted handles

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Sun, 3 Jul 2005 16:14:05 GMT Raw View

Howard Hinnant wrote:

> In article <1119522633.973038.263650@g47g2000cwa.googlegroups.com>,
>  martinfrompi@yahoo.co.uk wrote:
>
>
> Yes, thank you both for your comments.
>
> It is probably more accurate for me to say that COW is an optimization
> that favors a situation where the copy/write ratio is high (many copies,
> few writes).  As that ratio drops, so does the value of COW.  And move
> semantics will only help in the area where the copy/write ratio is close
> to 1 (e.g. returning from a function, manipulation in containers and
> algorithms, etc.).
>
> -Howard

Sometimes multiple copies is exactly what the conceptual design calls for.
That is the area that I wanted to address.  I suspect there are a lot of
subtelties involved designing a truly comprehensive, and yet simple and
low-overhead reference counting base class, with, or without COW.  Such
issues as threading and cloning as well as MI would have to be understood,
and addressed.  One comment I made earlier is that it would be desirable to
force it to derive virtually.  I now believe that would place an
unnecessary cost on single inheritance hierarchies.

I know that I do not have a sufficient background to design such a feature
within the next few month.  If anybody believes this is a worthwhile idea,
and would like to get it into C++0X, please pick it up and run with it.
--
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Me" <anti_spam_email2003@yahoo.com>
Date: 3 Jul 2005 22:50:01 GMT Raw View

> In summary, COW is an optimization which makes shared state look
> distinct when modified.  Move semantics greatly reduces the need to
> share state in the first place, and thus distinct objects maintain
> distinct state.

It's not just an optimization, it swaps the modification guarantee:

COW classes have guaranteed copy construction and assignment operators
but you're not guaranteed updating of members.

non-COW (lets call this Modify On Write) classes don't have guaranteed
copy construction or assignment operators but members can be guaranteed
to be modified.

Some people do not code COW classes with a mechanism for forcing the
copy to take place, they leave it as a hidden implementation detail.
Unfortunately, if you do that, there are certain situations where
operations (modifying a value in-place or removing an value from a data
structure) that would normally be guaranteed to succeed, could possibly
fail due to out of resource conditions.

Refcounts on top of COW can be viewed as an optimization that lets you
have MOW semantics because you can track the number of live copies.
Garbage collected languages could also use refcounts but it would be a
conservative approximation to how many live copies there are.

C++ is in a good position because all objects are values but it also
has a distinct syntax for reference semantics. There are languages that
pretend everything is a value without any separate reference semantics
so COW is usually done for efficient passing of variables around in
functions. C++'s reference semantics eliminates the big need for COW
and move semantics eliminates the remaining case that regular reference
semantics left off.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Mon, 4 Jul 2005 02:33:27 GMT Raw View

Me wrote:

>> In summary, COW is an optimization which makes shared state look
>> distinct when modified.  Move semantics greatly reduces the need to
>> share state in the first place, and thus distinct objects maintain
>> distinct state.
>
> It's not just an optimization, it swaps the modification guarantee:

I'm not sure what the "modification guarantee" is.  If it is stated in D&E,
I have it on order, but not on hand.  Would you please proved a statement
of that guarantee, or a reference to such?

> COW classes have guaranteed copy construction and assignment operators
> but you're not guaranteed updating of members.

The period in that sentence seems to come prematurely. I'm thinking there
should be, perhaps, a comma followed by 'when...'.

Let's take a concrete example.  I'm thinking it might be valuable to have
implicitly shared containers such as std::vector.[*][**] Implicit sharing
means that when we construct a new std::shared_vector from another, assign
one to another, or pass one as a parameter, the data is not copied, only
pointers to the data are copied.  In the case of assignment, there would be
a need to manipulate the reference count for the data previously held by
the assignee, as well as that held by the assignor.

As long as no attempt is made to modify the data, the data remains shared
among the various instances.  The data will have associated metadata (the
reference count), which is modified during construction, copying,
assignment, data modification, and destruction.  As such, it falls into the
category of mutable data (even though it may not be necessary in all design
scenerios to use the /mutable/ specifier).  The reference count is
characterized as mutable because it is subject to modification even when
the std::shared_vector instance is declared const.

If an attempt is made to modify the data in (through) a non-const instance,
the data will be modified, for that instance.  To the observer, there is no
other change in any of the previously sharing objects (assuming the
reference count is not observable.)

>From what I've read about std::string, it is sometimes the case that a
conforming implementation uses some kind of implicit sharing, so I'm led to
believe that the mechanism itself doesn't compromise the current guarantees
under restricted conditions.

[*]It may be possible to implement this, in part, by adding a template
parameter to the current container implementations in the Standard.

[**] What I'm describing is a design in which the hypothetical
std::shared_vector /is/ the handle.

> non-COW (lets call this Modify On Write) classes don't have guaranteed
> copy construction or assignment operators but members can be guaranteed
> to be modified.

I don't follow.  If I try to add data to a member container, that
modification might fail.  There is no guarantee that it will succeed.  At
best, the guarantee is that it will succeed, or do nothing.

> Some people do not code COW classes with a mechanism for forcing the
> copy to take place, they leave it as a hidden implementation detail.
> Unfortunately, if you do that, there are certain situations where
> operations (modifying a value in-place or removing an value from a data
> structure) that would normally be guaranteed to succeed, could possibly
> fail due to out of resource conditions.

Define "copy".  There are two different meanings in this context.  What
happens when we do the things we call copying when discussing non-sharing
data types, and there is what is meant by the "copy" in copy-on-write.
That is often called cloning in order to distinguish it from traditional
copy operations.  /Deep copy/ as opposed to /shallow copy/ is another way
these concepts are distinguished.

As for guarantees, these are defined for specific contexts, and are stated
as either provided by a type, or required of a type.  Can you be more
specific as to what guarantees are violated, and in what circumstances?

> Refcounts on top of COW can be viewed as an optimization that lets you
> have MOW semantics because you can track the number of live copies.

I've never encountered a description of COW that didn't involve a reference
count.

> Garbage collected languages could also use refcounts but it would be a
> conservative approximation to how many live copies there are.

As I understand things, Java's GC is reliant on reference counting.  And we
could view a reference counted sharing mechanism as a form of localized
garbage collection.

> C++ is in a good position because all objects are values but it also
> has a distinct syntax for reference semantics. There are languages that
> pretend everything is a value without any separate reference semantics
> so COW is usually done for efficient passing of variables around in
> functions. C++'s reference semantics eliminates the big need for COW

I don't follow.  I'm really not sure what you mean by reference semantics,
because that can mean one of two different things to me.  It can mean
specifically the behavior of references, as opposed to pointers, or it can
mean the collective behavior of references and pointers.

> and move semantics eliminates the remaining case that regular reference
> semantics left off.

Suppose I have hundreds of instances of an object holding a dozen or so
sequence type objects (arrays) each of which holds hundreds, or even
thousands of elements which contain their own elements.  For example, an
object in a 3D scene graph.  There are arrays of vertices, normals, normal
index maps, colors, color index maps, textures, texture index maps, etc.
It is often the case that these arrays are shared between objects.  The
objects themselves can also be shared. Does the introduction of move
semantics eliminate the need for sharing?  If the objects are shared, then
is there a potential usefulness of COW?

Move semantics (as I understand it) is simply a generalization of the
behavior of std::auto_ptr<>.  It assumes there is no need to preserve the
source data when it is assigned.  How is that useful in the case of
composite data structures which share data between nodes?
--
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Me" <anti_spam_email2003@yahoo.com>
Date: Tue, 5 Jul 2005 12:07:04 CST Raw View

> >> In summary, COW is an optimization which makes shared state look
> >> distinct when modified.  Move semantics greatly reduces the need to
> >> share state in the first place, and thus distinct objects maintain
> >> distinct state.
> >
> > It's not just an optimization, it swaps the modification guarantee:
>
> I'm not sure what the "modification guarantee" is.  If it is stated in D&E,
> I have it on order, but not on hand.  Would you please proved a statement
> of that guarantee, or a reference to such?

Lets take a concrete example for the COW and MOW. In most cases of MOW
you have a "modification guarantee". For example lets take a vector of
characters:

- COW style: Copy construction and assignment are guaranteed to
succeed. Removing a member or modifying an element may or may not
succeed because a fresh copy needs to be made.
- MOW style: Copy construction and assignment may or may not succeed.
Removing an element or modifying an element is guaranteed to succeed.

To avoid confusion, I am thinking of the vector as a whole instead of
the individual element that gets modified. The individual element it
refers to may or may not be COW, but that's irrelevant.

What are the possibilities that can occur when you modify a vector
element:
- COW: require a fresh new copy of the vector to be made
- MOW: modify the vector anyway and let the user worry about it
- COW with tracking as an optimization: track the number of copies of
the vector and either do MOW if there is only 1 copy, COW otherwise.

In languages that pass by reference, they can either do MOW and let the
user deal with this by forcing the user to explicitly make a copy
somehow or they can do COW and pretend they're pass by value. In
languages that pass by value, they can do MOW and have it work
correctly in all cases. (This is sort of muddy area because lots of
languages let you do both for efficiency (like primitive types in some
pass by reference COW languages), ease of use, and when you want to
capture different semantics).

> From what I've read about std::string, it is sometimes the case that a
> conforming implementation uses some kind of implicit sharing, so I'm led to
> believe that the mechanism itself doesn't compromise the current guarantees
> under restricted conditions.

std::string can be done using refcounts but iterators are more
complicated because it has to follow the constraints given in the
standard (see paragraph 21.3/6). The STL container classes cannot be
refcounted because erase/pops are required not to throw an exception.

> > non-COW (lets call this Modify On Write) classes don't have guaranteed
> > copy construction or assignment operators but members can be guaranteed
> > to be modified.
>
> I don't follow.  If I try to add data to a member container, that
> modification might fail.  There is no guarantee that it will succeed.  At
> best, the guarantee is that it will succeed, or do nothing.

Agreed. I shouldn't have said it as an absolute saying in all cases MOW
will succeed. Though there are cases where adding a member would
succeed like if you're adding an object with intrusive node pointers on
node based data structures.

> > Refcounts on top of COW can be viewed as an optimization that lets you
> > have MOW semantics because you can track the number of live copies.
>
> I've never encountered a description of COW that didn't involve a reference
> count.

COW and MOW are general concepts. COW can only be implemented in
non-GCed languages using refcounts (unless you don't care about leaking
memory). In a garbage collected language (I take garbage collected to
mean non-refcount based for the sake of this discussion), you do not
know how many objects refer to that object, so all modifications
require a copy to be done. Data structures in functional languages are
all COW based, any modification requires a fresh copy of it to be made.
In most functional languages, some sharing of the elements are possible
depending on the data structure.

> > C++ is in a good position because all objects are values but it also
> > has a distinct syntax for reference semantics. There are languages that
> > pretend everything is a value without any separate reference semantics
> > so COW is usually done for efficient passing of variables around in
> > functions. C++'s reference semantics eliminates the big need for COW
>
> I don't follow.  I'm really not sure what you mean by reference semantics,
> because that can mean one of two different things to me.  It can mean
> specifically the behavior of references, as opposed to pointers, or it can
> mean the collective behavior of references and pointers.

Here I mean:
reference semantics - if you pass a variable by reference to a function
and the variable gets modified inside the function, the variable from
outside the function gets modified as well
value semantics - acts as if each function gets a local copy of a
variable

> > and move semantics eliminates the remaining case that regular reference
> > semantics left off.
>
> Suppose I have hundreds of instances of an object holding a dozen or so
> sequence type objects (arrays) each of which holds hundreds, or even
> thousands of elements which contain their own elements.  For example, an
> object in a 3D scene graph.  There are arrays of vertices, normals, normal
> index maps, colors, color index maps, textures, texture index maps, etc.
> It is often the case that these arrays are shared between objects.  The
> objects themselves can also be shared. Does the introduction of move
> semantics eliminate the need for sharing?  If the objects are shared, then
> is there a potential usefulness of COW?

I view the above more like general sharing/grouping of data. For
example, if we represent a sphere as a collection of triangles
representing the surface and give each triangle a color. If we change
the color, we want the object as a whole to be modified so it would be
easier to have 1 copy of the color and have each triangle point to that
color and break that sharing when we want to explicitly modify each
triangle's color. This also lets you explicitly group data you want to
share like if we want the left half of the sphere to point to one color
and the right side to point to another one and have these colors
animate over time. You just have to modify 2 colors as opposed to how
many numbers of triangles there are and explicitly track how these
colors are grouped somehow.

> Move semantics (as I understand it) is simply a generalization of the
> behavior of std::auto_ptr<>.  It assumes there is no need to preserve the
> source data when it is assigned.  How is that useful in the case of
> composite data structures which share data between nodes?

I think we're talking about 2 different ideas that gets merged together
often because they have similar implementation details. r-value
references and move semantics in C++ specifically deal with forcing a
MOW to take place because we're telling the compiler that either there
is only 1 copy lying around or this copy should really act like a move
because the object that would be copied if move semantics didn't exist
would be destroyed immediately after the copy would have taken place.
For example:

A av = A();

This would create a temporary A, copy construct the temporary A into
av, and then destroy the temporary. Return Value Optimization makes
this act as if you can just create a temporary A directly into av. Move
semantics generalizes RVO:

A::insert(A &a);

A av;
a.insert(A());

Without move semantics, this would result in a temporary copy, a copy
construction, and a destruction of the temporary. With move semantics
(change A & to A && above), it "acts" as if the temporary A gets
directly inserted into the correct place without a temporary copy (Not
quite though because the proposal doesn't deal with destructive moves
but most classes either have trivial destructors or lame duck states so
this doesn't really matter).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: bop@gmb.dk ("Bo Persson")
Date: Tue, 5 Jul 2005 17:11:25 GMT Raw View

""Steven T. Hatton"" <hattons@globalsymmetry.com> skrev i meddelandet
news:TLCdnegqYIkUAFXfRVn-uA@speakeasy.net...
> Me wrote:
>
>>> In summary, COW is an optimization which makes shared state look
>>> distinct when modified.  Move semantics greatly reduces the need to
>>> share state in the first place, and thus distinct objects maintain
>>> distinct state.
>>
>> It's not just an optimization, it swaps the modification guarantee:
>
> I'm not sure what the "modification guarantee" is.  If it is stated in
> D&E,
> I have it on order, but not on hand.  Would you please proved a
> statement
> of that guarantee, or a reference to such?
>
>> COW classes have guaranteed copy construction and assignment
>> operators
>> but you're not guaranteed updating of members.
>
> The period in that sentence seems to come prematurely. I'm thinking
> there
> should be, perhaps, a comma followed by 'when...'.
>

'when there is sharing'.

Here's an example:

std::string s1 = "Hello";
std::string s2 = s1;

s2[0] = 'X';

A non-shared implementation might fail on s2 = s1 if it runs out of
memory. It never fails on modifying s2[0].

A COW-implementation may "work" on s2 = s1, but might be unable to
unshare s1 and s2 while updating s2[0]. This moves the point of failure
to an unexpected place.



Bo Persson






---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Wed, 6 Jul 2005 03:49:23 GMT Raw View

Me wrote:

>> >> In summary, COW is an optimization which makes shared state look
>> >> distinct when modified.  Move semantics greatly reduces the need to
>> >> share state in the first place, and thus distinct objects maintain
>> >> distinct state.
>> >
>> > It's not just an optimization, it swaps the modification guarantee:
>>
>> I'm not sure what the "modification guarantee" is.  If it is stated in
>> D&E,
>> I have it on order, but not on hand.  Would you please proved a statement
>> of that guarantee, or a reference to such?
>
> Lets take a concrete example for the COW and MOW. In most cases of MOW
> you have a "modification guarantee". For example lets take a vector of
> characters:
>
> - COW style: Copy construction and assignment are guaranteed to
> succeed. Removing a member or modifying an element may or may not
> succeed because a fresh copy needs to be made.
> - MOW style: Copy construction and assignment may or may not succeed.
> Removing an element or modifying an element is guaranteed to succeed.

There are guarantees implicit in the "obvious" design of certain data types,
and there are guarantees explicitly stated as required of, or provided by
certain data types.  It seems you are talking about implicit guarantees,
not ones that are explicitly stated.

> To avoid confusion, I am thinking of the vector as a whole instead of
> the individual element that gets modified. The individual element it
> refers to may or may not be COW, but that's irrelevant.

Modifying an element in a std::vector<vector<int> > can fail.

> What are the possibilities that can occur when you modify a vector
> element:
> - COW: require a fresh new copy of the vector to be made
> - MOW: modify the vector anyway and let the user worry about it
> - COW with tracking as an optimization: track the number of copies of
> the vector and either do MOW if there is only 1 copy, COW otherwise.
>
> In languages that pass by reference, they can either do MOW and let the
> user deal with this by forcing the user to explicitly make a copy
> somehow or they can do COW and pretend they're pass by value.

I'm not sure why you are talking about "languages".  This is not about what
other languages do.  It's about how these features are implemented in C++.

>> From what I've read about std::string, it is sometimes the case that a
>> conforming implementation uses some kind of implicit sharing, so I'm led
>> to believe that the mechanism itself doesn't compromise the current
>> guarantees under restricted conditions.
>
> std::string can be done using refcounts but iterators are more
> complicated because it has to follow the constraints given in the
> standard (see paragraph 21.3/6). The STL container classes cannot be
> refcounted because erase/pops are required not to throw an exception.

Now that sounds like an explicit guarantee.  Indeed, I can see no way of
designing a COW std::vector.  That is different than saying I can see no
way of designing one with reference counts.

>> > non-COW (lets call this Modify On Write) classes don't have guaranteed
>> > copy construction or assignment operators but members can be guaranteed
>> > to be modified.
>>
>> I don't follow.  If I try to add data to a member container, that
>> modification might fail.  There is no guarantee that it will succeed.  At
>> best, the guarantee is that it will succeed, or do nothing.
>
> Agreed. I shouldn't have said it as an absolute saying in all cases MOW
> will succeed. Though there are cases where adding a member would
> succeed like if you're adding an object with intrusive node pointers on
> node based data structures.
>
>> > Refcounts on top of COW can be viewed as an optimization that lets you
>> > have MOW semantics because you can track the number of live copies.
>>
>> I've never encountered a description of COW that didn't involve a
>> reference count.
>
> COW and MOW are general concepts. COW can only be implemented in
> non-GCed languages using refcounts (unless you don't care about leaking
> memory). In a garbage collected language (I take garbage collected to
> mean non-refcount based for the sake of this discussion), you do not
> know how many objects refer to that object, so all modifications
> require a copy to be done. Data structures in functional languages are
> all COW based, any modification requires a fresh copy of it to be made.
> In most functional languages, some sharing of the elements are possible
> depending on the data structure.

Not in Mathematica.  But I really don't see this as applicable to the
current discussion.

>> > C++ is in a good position because all objects are values but it also
>> > has a distinct syntax for reference semantics. There are languages that
>> > pretend everything is a value without any separate reference semantics
>> > so COW is usually done for efficient passing of variables around in
>> > functions. C++'s reference semantics eliminates the big need for COW
>>
>> I don't follow.  I'm really not sure what you mean by reference
>> semantics,
>> because that can mean one of two different things to me.  It can mean
>> specifically the behavior of references, as opposed to pointers, or it
>> can mean the collective behavior of references and pointers.
>
> Here I mean:
> reference semantics - if you pass a variable by reference to a function
> and the variable gets modified inside the function, the variable from
> outside the function gets modified as well

If we were discussin any other language I know of other than C++ I would
take that to mean pass a pointer.  I will assume that is what you mean,
though I find that usage to be ambiguous when discussin C++.

> I view the above more like general sharing/grouping of data. For
> example, if we represent a sphere as a collection of triangles
> representing the surface and give each triangle a color. If we change
> the color, we want the object as a whole to be modified so it would be
> easier to have 1 copy of the color and have each triangle point to that
> color and break that sharing when we want to explicitly modify each
> triangle's color. This also lets you explicitly group data you want to
> share like if we want the left half of the sphere to point to one color
> and the right side to point to another one and have these colors
> animate over time. You just have to modify 2 colors as opposed to how
> many numbers of triangles there are and explicitly track how these
> colors are grouped somehow.

Yes, that is what you do whit a color array and a color index array.  But
what happens when you have 250 spheres with the same colors, and each
triangle has its own color?

>> Move semantics (as I understand it) is simply a generalization of the
>> behavior of std::auto_ptr<>.  It assumes there is no need to preserve the
>> source data when it is assigned.  How is that useful in the case of
>> composite data structures which share data between nodes?
>
> I think we're talking about 2 different ideas that gets merged together
> often because they have similar implementation details. r-value
> references and move semantics in C++ specifically deal with forcing a
> MOW to take place because we're telling the compiler that either there
> is only 1 copy lying around or this copy should really act like a move
> because the object that would be copied if move semantics didn't exist
> would be destroyed immediately after the copy would have taken place.
> For example:
>
> A av = A();
>
> This would create a temporary A, copy construct the temporary A into
> av, and then destroy the temporary. Return Value Optimization makes
> this act as if you can just create a temporary A directly into av. Move
> semantics generalizes RVO:
>
> A::insert(A &a);
>
> A av;
> a.insert(A());
>
> Without move semantics, this would result in a temporary copy, a copy
> construction, and a destruction of the temporary. With move semantics
> (change A & to A && above), it "acts" as if the temporary A gets
> directly inserted into the correct place without a temporary copy (Not
> quite though because the proposal doesn't deal with destructive moves
> but most classes either have trivial destructors or lame duck states so
> this doesn't really matter).

That doesn't address my question about sharing data in a scenegraph.
--
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Mon, 27 Jun 2005 03:02:24 GMT Raw View

Dave Harris wrote:

> hattons@globalsymmetry.com ("Steven T. Hatton") wrote (abridged):
>> My question is: what if we do own the code for the class of objects we
>> want to keep track of?  Can we then derive that class from some
>> reference counted baseclass, and proceed to use the class without a
>> lot of further consideration of how to manage the reference count?
>> boot::intrusive_ptr<> seems to offer some hope for this kind
>> of approach.  The documentation seems to suggest shared_ptr<> is
>> a better general solution to the problem of shared memory management.
>
> In my view shared_ptr<> is more general but not better, if you see what I
> mean. If an intrusive pointer is possible, I prefer it, and modify my
> classes accordingly.
>
> I actually use my own pointer template instead of boost's. I do use
> boost's shared_ptr for types which I can't modify.

I'm wondering if a boot::shared_ref<> to complement boost::shared_ptr<>
would be valuable.  There are times when it's not possible, or not
desirable to use an intrusive reference counting approach.  Even in those
cases there are reasons to prefer reference semantics over pointer
semantics. In particular, function objects are far more elegant when
accessed as references than when accessed as pointers.  Using pointers to
function objects makes the whole concept of operator() almost pointless.

What I'm thinking is the smart pointer and smart references could share
their reference counts. A similar arrangement could be established for
classes with intrusive reference counts, but there would be less coupling
between the intrusive handles because the coupling is built into the
reference counted object.  I believe that the reference handles I'm
envisioning would require something along the lines of the "Overloading
operator.() & operator.*()" proposal:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1671.pdf

If I understand correctly, that would enable the creation of smart handles
with reference semantics, and the same interface as the handled object.  It
might be thought of as something of an auto-proxy.

I still don't know if that can coexist with a COW functionality.  It seems a
bit silly to spend too much time trying to build a design on a proposal
which may not make it into the Standard.
--
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Wed, 22 Jun 2005 16:14:12 GMT Raw View

In article <frydncCk89GFhiTfRVn-vg@speakeasy.net>,
 hattons@globalsymmetry.com ("Steven T. Hatton") wrote:

> What do others think about the value of using an intrusive reference
> counting mechanism, and even supporting it in the Standard Library?

I would be interested in seeing such a proposal.

> I'm not sure what to do about the copy-on-write problem.

I would also be interested in an enumerated list of the specific
problems you are referring to with COW.  Not that I disagree that there
are problems, just that I would like to discuss them more concretely.

I am not positive, but I strongly suspect that the introduction of move
semantics will displace much or all of our current motivation to code
COW, especially as multithreading becomes more and more prevalent.
However I also believe that refcounting techniques will continue to be
important where COW is not required.

Reasons to code COW:

1.  Because I want to efficiently return the object by value from a
function.  (example: string+string, matrix*matrx, etc.)

But rvo can often do this today faster than refcounting.  And for those
cases where rvo isn't practical, or can not be relied upon (such as
assignment from the function:  a = f();), move semantics takes over to
effect a return as efficient as a refcounted copy (more so in a
multithread environment, even with a lock-free shared_ptr).

2.  Because I want my objects to copy efficiently while I'm putting them
into my std::container (esp. vector which tends to perform internal
copies more than other containers).

A move-aware vector holding cheaply movable types will be as fast or
faster than the same vector holding refcounted objects.

3.  I need to hold pointers to base classes in my container, so I use
vector<shared_ptr<Base>>.

vector<unique_ptr<Base>> is likely to accomplish the same tasks you
require, and be significantly faster while using less memory.

4.  Because I need sequences of my objects to be efficiently rearranged
by the sequence-modifying std::algorithms (sort, rotate, inplace_merge,
etc.).

Those std::algorithms that only need to move objects instead of copy
them (sort, rotate, inplace_merge, etc.), will do so, making them as
efficient with cheaply movable objects as they would be with refcounted
objects.

5.  Others?

In summary, COW is an optimization which makes shared state look
distinct when modified.  Move semantics greatly reduces the need to
share state in the first place, and thus distinct objects maintain
distinct state.

Generally it will be easier to give a heavy class move semantics than it
will to give it COW semantics.  The former typically requires an extra
constructor and assignment (and these typically require the same
implementation effort as the copy constructor and copy assignment).  The
latter typically requires modifications in many or all of the non-const
member functions, even those that simply return (non-const) references
or pointers to internal state.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Wed, 22 Jun 2005 14:55:53 GMT Raw View

I've attempted to get a discussion going on this topic in both c.l.c++ an=
d
c.l.c++.m.  I received no constructive feedback on the subject.  After
reading many of the posts in the rambling thread "boost::shared_ptr
revisited again", I've been led to wonder if consideration of reference
counted objects has been neglected by the standards committee.

As I see things, the biggest problem with boost::shared_ptr<> and similar
smart pointers is the possibility that multiple, independent sharing can =
be
established.  That is, two shared pointers can be assigned the same subje=
ct
directly, rather than through a mechanism that communicates the existence
of the other owner.  That is, both pointers have their own reference coun=
t.

In Accelerated C++ =A714.2 the authors tell us "To add reference counting=
 to a
class, we have to allocate a counter, and change the operations that
create, copy, and destroy the objects so that they update the counter
appropriately.  Each object to which we have a Ref_handle will have a
reference count associated with it.  The only question is where to store
the counter.  In general, we don't own the source code for the types from
which we want to make Ref_handles, so we can't just add the counter to th=
e
object class type itself."

My question is: what if we do own the code for the class of objects we wa=
nt
to keep track of?  Can we then derive that class from some reference
counted baseclass, and proceed to use the class without a lot of further
consideration of how to manage the reference count?  boot::intrusive_ptr<=
>
seems to offer some hope for this kind of approach.  The documentation
seems to suggest shared_ptr<> is a better general solution to the problem
of shared memory management. =20

I have some experience working with OpenSceneGraph's osg::Referenced, and
osg::ref_ptr<> classes.  They have their limitations.  AFAIK, the CopyOp
intended to support copy on modify is not fully implemented.  Another
limitation is that you cannot use osg::Referenced derivatives as member
variables because the destructor for osg::Referenced is protected.  The
more I understand the ideas behind reference counted memory management, t=
he
more I come to realize, that's just how it works. =20

Typically, the osg::Referenced objects are "passed around" as raw pointer=
s.=20
If there is a need to hold onto one for longer than the duration of a
function call, then it can be assigned to an osg::ref_ptr<> which increas=
es
the osg::Referenced object's reference count.  Basically that means if th=
e
osg::Referenced object needs to be pointed to by a member variable, you
should use an osg::ref_ptr<> to hold it. Once I grew accustomed to workin=
g
with this mechanism, it seemed quite natural to use.

There are a few things I wish were different.  For one, I don't like the
consequences of using pointer syntax on function objects. Typing (*ptr)()
causes my bloodpressure to go up.  It's just plain ugly to my eyes,
especially when it's part of a more complex expression.  The lack of a
clear and simple means of supporting copy-on-modify is also modestly
troublesome.

I wonder if Boost.Ref might provide a startingpoint for some kind of smar=
t
reference to augment boot::intrusive_ptr<> and enable the same reference
counted object to be shared as both a pointer and a reference.  That migh=
t
address the ugly syntax problem. =20

I'm not sure what to do about the copy-on-write problem.

What do others think about the value of using an intrusive reference
counting mechanism, and even supporting it in the Standard Library?

--=20
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Wed, 22 Jun 2005 19:52:59 GMT Raw View

Howard Hinnant wrote:

> In article <frydncCk89GFhiTfRVn-vg@speakeasy.net>,
>  hattons@globalsymmetry.com ("Steven T. Hatton") wrote:
>
>> What do others think about the value of using an intrusive reference
>> counting mechanism, and even supporting it in the Standard Library?
>
> I would be interested in seeing such a proposal.

I'm less than two years into learning C++, and can't claim to be the most
qualified person to produce the formal proposal, but I'll see what I can
come up with.

There is one thing I believe would be necessary for implementing the support
features I'm suggesting. All classes should derive virtually from the
std::referenced baseclass.  Otherwise, there would be great potential for
multiple baseclass subobjects holding redundant, or worse, inconsistent,
reference counters.  I don't know of a straight forward means of enforcing
that requierment other than introducing another level of indirection.

>> I'm not sure what to do about the copy-on-write problem.
>
> I would also be interested in an enumerated list of the specific
> problems you are referring to with COW.  Not that I disagree that there
> are problems, just that I would like to discuss them more concretely.

1) I'm not sure how to define it for a(n abstract?) baseclass in such a way
as to be easily implemented in an arbitrarily defined derived class.

2) Should there be a (pure?) virtual clone function in the baseclass?  That
seems like it might be hard to call on the derived class in such a way as
to both match the signature, and invoke the proper functionality to
correctly clone the derived class (hierarchy).  Perhaps some kind of
pointer to member function hocus pocus might do the trick.

4) Should there be both read only and read/write smart_pointers and
smart_references?  Does constness suffice as a means of providing that?

5) Should there be enable/disable control over the COW?  One example of
where I wanted such a feature is when trying to create two views of the
same edit buffer in Qt.  QString supports COW (implicit sharing).  In order
to emulate the kind of behavior I get with Emacs, where I can view the same
edit buffer in two windows, I needed to have a way of disabling the COW
functionality, or force a continuous update of the non-focused buffer.
When I realized the latter was the choice provided by QString I opted for
choice #3: don't try to implement it.

6) Should there be a means of arbitrarily dividing the owners between newly
cloned instances?

7) can any of this be done with a traditional copy constructor?

> I am not positive, but I strongly suspect that the introduction of move
> semantics will displace much or all of our current motivation to code
> COW, especially as multithreading becomes more and more prevalent.
> However I also believe that refcounting techniques will continue to be
> important where COW is not required.

As for my reason for wanting copy on write, I don't currently have a need
for that.  But the potential usefulness seems obvious.  I'm working with a
composite directed acyclical graph of very complex nodes of heterogeneous
derived types, many of which hold (or point to) vast amounts of data.

An example node class:
http://www.openscenegraph.org/documentation/OpenSceneGraph/include/osg/Geometry

> Reasons to code COW:
>
> 1.  Because I want to efficiently return the object by value from a
> function.  (example: string+string, matrix*matrx, etc.)

Check.

> But rvo can often do this today faster than refcounting.

Optimize this return value:

 LatheGeometry* Lathe::generate(osg::Vec3 axle) {

    if(axle == osg::Vec3()) axle = axisVector();
    float scale(axle.length());
    LatheGeometry* lg_ptr(dynamic_cast<LatheGeometry*>(_products[axle]));

    if(lg_ptr) return lg_ptr;

    lg_ptr = _products[axle] = new LatheGeometry(axle, this);

    osg::Vec3Array* vertices = new osg::Vec3Array();
    osg::Vec3Array* normals = new osg::Vec3Array();
    osg::Geometry::PrimitiveSetList* psl = new
osg::Geometry::PrimitiveSetList();

    for(size_t i = 0; i < _radii.size()-1; i++) {
      osg::Vec2 unv(_radii[i+1] - _radii[i]);
      unv = osg::Vec2f(-unv[1], unv[0]);
      unv.normalize();

      for(size_t j = 0; j < _ring.size(); j++) {
        osg::Vec3 v0(_radii[i][0], _radii[i][1] * _ring[j][0], _radii[i][1]
* _ring[j][1]);
        osg::Vec3 v1(_radii[i+1][0], _radii[i+1][1] * _ring[j][0],
_radii[i+1][1] * _ring[j][1]);
        osg::Vec3 nv(unv[0], unv[1] * _ring[j][0], unv[1] * _ring[j][1]);

        v0 *= scale;
        v1 *= scale;

        if(_axis == X){
          vertices->push_back(v0);
          vertices->push_back(v1);
          normals->push_back(nv);
        } else {
          vertices->push_back(rotv3(v0, _axis));
          vertices->push_back(rotv3(v1, _axis));
          normals->push_back(rotv3(nv, _axis));
        }

        normals->push_back(normals->back());

        size_t f2 = 2 * _facets;
        size_t n = f2 * i;
        size_t j2 = 2 * j;

        osg::DrawElementsUInt* quad = new
osg::DrawElementsUInt(osg::PrimitiveSet::QUADS, 0);

        quad->push_back(n + j2);
        quad->push_back(n + j2 + 1);
        quad->push_back(n + (j2 + 3)%f2);
        quad->push_back(n + (j2 + 2)%f2);

        psl->push_back(quad);

      }
    }

    lg_ptr->setVertexArray(vertices);
    lg_ptr->setNormalArray(normals);
    lg_ptr->setPrimitiveSetList(*psl);
    lg_ptr->setNormalBinding( osg::Geometry::BIND_PER_VERTEX );

    osg::Vec4Array * ca = new osg::Vec4Array();
    ca->push_back(_color);
    lg_ptr->setColorArray(ca);
    lg_ptr->setColorBinding(osg::Geometry::BIND_OVERALL);
    lg_ptr->transform();
    return lg_ptr;
  }

> And for those
> cases where rvo isn't practical, or can not be relied upon (such as
> assignment from the function:  a = f();), move semantics takes over to
> effect a return as efficient as a refcounted copy (more so in a
> multithread environment, even with a lock-free shared_ptr).

But there doesn't need to be a reference counted copy. The only time I need
to manipulate the reference count is when I'm actually planning to keep (a
handle on) the object around as a member field. Or, of course, in the
function where it is created so I have RAII exception safety, if I don't
immediatly give it a home. That's dirt cheap in comparison to the other
operations I'm performing.

> 2.  Because I want my objects to copy efficiently while I'm putting them
> into my std::container (esp. vector which tends to perform internal
> copies more than other containers).

Certainly a worthwhile consideration.

> A move-aware vector holding cheaply movable types will be as fast or
> faster than the same vector holding refcounted objects.

Define "cheaply moveable". In the ten minutes I just spent reading about
move semantics, I gathered it may better be described as usurpation
semantics.  To get any bang for your buck, you will need to make extensive
use of the source objects resources. The main motivation for reference
counting (especially with COW) is so that I can use the multiple references
can be used to simulate multiple identical objects.

> 3.  I need to hold pointers to base classes in my container, so I use
> vector<shared_ptr<Base>>.
>
> vector<unique_ptr<Base>> is likely to accomplish the same tasks you
> require, and be significantly faster while using less memory.

That looks like an auto_ptr<> with container friendly behavior. Certainly
worth knowing about, but my objects are typically shared many (hundreds of)
times.

> 4.  Because I need sequences of my objects to be efficiently rearranged
> by the sequence-modifying std::algorithms (sort, rotate, inplace_merge,
> etc.).
>
> Those std::algorithms that only need to move objects instead of copy
> them (sort, rotate, inplace_merge, etc.), will do so, making them as
> efficient with cheaply movable objects as they would be with refcounted
> objects.
>
> 5.  Others?

an arbitrary number of objects sharing the same function object which can be
modified at one point in order to change one aspect of the behavior of all
these objects.

> In summary, COW is an optimization which makes shared state look
> distinct when modified.  Move semantics greatly reduces the need to
> share state in the first place, and thus distinct objects maintain
> distinct state.

Every arrow in this image representing an identical vector is actually one
C++ composite object:

http://baldur.globalsymmetry.com/gs-home/images/mtree.png

> Generally it will be easier to give a heavy class move semantics than it
> will to give it COW semantics.  The former typically requires an extra
> constructor and assignment (and these typically require the same
> implementation effort as the copy constructor and copy assignment).  The
> latter typically requires modifications in many or all of the non-const
> member functions, even those that simply return (non-const) references
> or pointers to internal state.
>
> -Howard

I want shared objects. Now, the real trick is to arbitrarily split the group
of owners when I do a COW.  Suppose I want to change the color of every
second arrow in the image linked above.  One way to do that would be to
split the group of owners in two, and replicate all the data once.  It
would also potentially work to keep sharing all the original data except
the colorArray between the original owners.
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

--
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: martinfrompi@yahoo.co.uk
Date: Thu, 23 Jun 2005 10:27:30 CST Raw View

"Steven T. Hatton" wrote:
> Howard Hinnant wrote:
> > Reasons to code COW:

> > 5.  Others?

[snip]

I think Steven has come up with a real use-case for COW that cannot be
replaced with move.

COW is useful as a *space* optimization where you have many similar,
editable, objects.  By using COW you can hold one copy of the data for
physically identical but logically distinct objects.

The CAD system my (then) company wrote 25 years ago used COW.  Graphics
were arranged into "objects", which could be copied and edited.  A
well-trained draughtsman would draw one door (say), and then duplicate
it many times.  If one particular door instance needed editing, he
could do so.  The system used COW to keep one copy of data for all the
identical doors, and another copy for the special door.  (Of course, in
those days it was single threaded - and written in Fortran!)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Thu, 23 Jun 2005 18:05:42 GMT Raw View

In article <1119522633.973038.263650@g47g2000cwa.googlegroups.com>,
 martinfrompi@yahoo.co.uk wrote:

> "Steven T. Hatton" wrote:
> > Howard Hinnant wrote:
> > > Reasons to code COW:
>
> > > 5.  Others?
>
> [snip]
>
> I think Steven has come up with a real use-case for COW that cannot be
> replaced with move.
>
> COW is useful as a *space* optimization where you have many similar,
> editable, objects.  By using COW you can hold one copy of the data for
> physically identical but logically distinct objects.
>
> The CAD system my (then) company wrote 25 years ago used COW.  Graphics
> were arranged into "objects", which could be copied and edited.  A
> well-trained draughtsman would draw one door (say), and then duplicate
> it many times.  If one particular door instance needed editing, he
> could do so.  The system used COW to keep one copy of data for all the
> identical doors, and another copy for the special door.  (Of course, in
> those days it was single threaded - and written in Fortran!)

Yes, thank you both for your comments.

It is probably more accurate for me to say that COW is an optimization
that favors a situation where the copy/write ratio is high (many copies,
few writes).  As that ratio drops, so does the value of COW.  And move
semantics will only help in the area where the copy/write ratio is close
to 1 (e.g. returning from a function, manipulation in containers and
algorithms, etc.).

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hattons@globalsymmetry.com ("Steven T. Hatton")
Date: Fri, 24 Jun 2005 00:04:11 GMT Raw View

martinfrompi@yahoo.co.uk wrote:

>
>
> "Steven T. Hatton" wrote:
>> Howard Hinnant wrote:
>> > Reasons to code COW:
>
>> > 5.  Others?
>
> [snip]
>
> I think Steven has come up with a real use-case for COW that cannot be
> replaced with move.
>
> COW is useful as a *space* optimization where you have many similar,
> editable, objects.  By using COW you can hold one copy of the data for
> physically identical but logically distinct objects.
>
> The CAD system my (then) company wrote 25 years ago used COW.  Graphics
> were arranged into "objects", which could be copied and edited.  A
> well-trained draughtsman would draw one door (say), and then duplicate
> it many times.  If one particular door instance needed editing, he
> could do so.  The system used COW to keep one copy of data for all the
> identical doors, and another copy for the special door.  (Of course, in
> those days it was single threaded - and written in Fortran!)

Thus am I baffled.  What you are describing is what I understood to be the
essential problem for which OOP was developed.  Your description reads like
a paragraph from an early 80's Scientific American article on current
advancements in computer programming techniques.  If I'm not mistaken
SIMULA was designed for the purpose of creating computer simulations.  "C++
got the key notions of classes, derived classes, virtual functions (in
other words, the notions of encapsulation, inheritance and polymorphism)
from Simula just like Smalltalk did." [*] The problem I described is
central to writing simulation software.  How is it that 40 years later, we
are still seeking a solution to this essential problem?  Yes, it's a bit
tricky to address, but it is not insurmountable.

[*]http://www.research.att.com/~bs/bs_faq.html#from-Smalltalk
--
STH
http://www.kdevelop.org
http://www.suse.com
http://www.mozilla.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Fri, 24 Jun 2005 14:00:12 GMT Raw View

hattons@globalsymmetry.com ("Steven T. Hatton") wrote (abridged):
> My question is: what if we do own the code for the class of objects we
> want to keep track of?  Can we then derive that class from some
> reference counted baseclass, and proceed to use the class without a
> lot of further consideration of how to manage the reference count?
> boot::intrusive_ptr<> seems to offer some hope for this kind
> of approach.  The documentation seems to suggest shared_ptr<> is
> a better general solution to the problem of shared memory management.

In my view shared_ptr<> is more general but not better, if you see what I
mean. If an intrusive pointer is possible, I prefer it, and modify my
classes accordingly.

I actually use my own pointer template instead of boost's. I do use
boost's shared_ptr for types which I can't modify.

-- Dave Harris, Nottingham, UK.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]