Topic: Copy-stealing operator


Author: smayo@ziplink.net (Scott Mayo)
Date: 1998/04/13
Raw View
Apologies if this has been thrashed out before.

I have a class that maintains a set of members, by managing an array that
it can grow and shrink as needed - trivial stuff. My question comes
in at the point where I write functions that want to return such sets
(other than in the case of constructors). For example, in the case
of an Intersection operation on two sets:

    Set c;
    //...stuff...
    c = a.intersection(b); //produce a new set

The code for intersection() is going to work my creating a local, automatic Set,
adding members to it, and returning the Set. As I understand it,
to return the new Set, the compiler is going to plant a call to my
copy operator, as part of the return statement; then destroy the local, automatic
copy.

I'd *really* rather it didn't do that. My copy operator is going to
malloc() a whole new array and copy all the set elements. The destructor
on the local set will call free(). This is going to get expensive when
done over and over, especially as sets are passed up through a lot of
layers of recursion and so on. It's galling, because I can accomplish the
same thing much, much faster, if I subvert the return mechanism.

If I could pass a *bit-wise* copy of the automatic, temporary value, and then
fail to have the original destructed on exit, just the right thing happens: the new
value of c picks up the pointer to my data, that my automatic temporary used
to have, and since the destructor isn't run on the local copy, I have
effectively passed ownership of the array of elements to my caller - and
done much more efficienty than I could do by duplicating the whole array for
him, and freeing mine.

I realise I can solve this trivially with something like:
    Set outputc;
    a.intersection(&outputc, b); //ick
but that's (IMHO) bad style.

So call this a copy-stealing operator, different than C++'s copy
operator. If I could explain to the C++ compiler than I was providing one,
it seems to me the compiler would  have enough information at all times,
to know when to plant it (instead of a copy and a destructor).

Does it exist? Has it been discussed? It seems to me that it could make
certain kinds of code much more efficient.



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Robert Campbell <rob@geonexuscorp.com>
Date: 1998/04/13
Raw View
How about implementing a "Lazy Copy" scheme.  This would involve
overriding most methods such that a reference count is kept for each
object.   Copy constuctors and assignment operators would merely
increment the reference count of the data being referenced. However, if
three objects point to the same data, and one of the objects decides it
needs to modify the data (through some modification method).  Then, a
copy of the data would be
made, the modifying object would point to this data (whose reference
count would be one), and then the reference count of the originating
data would be decremented.

The implementation of the STL we use here has this idea written in, such
that returning things like strings, valarrays, vectors, ... all work
relatively efficiently.



--
Robert Campbell
Software Developer
Geonexus Interpretive Modelling Corp
E-mail: rob@geonexuscorp.com  Phone: (403) 262-0780
WWW: http://www.geonexuscorp.com/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Sam Fenster <sam@no.personal.mail>
Date: 1998/04/13
Raw View
smayo@ziplink.net (Scott Mayo) writes:
> The code for intersection() is going to work my creating a local, automatic
> Set, adding members to it, and returning the Set. As I understand it, to
> return the new Set, the compiler is going to plant a call to my copy
> operator, as part of the return statement; then destroy the local, automatic
> copy.
>
> I'd *really* rather it didn't do that.

You could use copy-on-write semantics, which make copying almost free:  A new
copy of an object merely points to the old copy's representation.  Data is
only actually copied when there are two or more objects referring to a
representation and one of them is modified.

Have your Set's actual data in a separate object which maintains a reference
count.  The result's count is initially equal to one when its data is only
referenced by the local Set.  The copy constructor for the calling scope's Set
bumps it to two, the destructor for the local Set reduces it to one, and the
actual data never needs to be copied.



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Christopher Eltschka <celtschk@physik.tu-muenchen.de>
Date: 1998/04/14
Raw View
Scott Mayo wrote:
>
> Apologies if this has been thrashed out before.
>
> I have a class that maintains a set of members, by managing an array that
> it can grow and shrink as needed - trivial stuff. My question comes
> in at the point where I write functions that want to return such sets
> (other than in the case of constructors). For example, in the case
> of an Intersection operation on two sets:
>
>     Set c;
>     //...stuff...
>     c = a.intersection(b); //produce a new set
>
> The code for intersection() is going to work my creating a local, automatic Set,
> adding members to it, and returning the Set. As I understand it,
> to return the new Set, the compiler is going to plant a call to my
> copy operator, as part of the return statement; then destroy the local, automatic
> copy.
>
> I'd *really* rather it didn't do that. My copy operator is going to
> malloc() a whole new array and copy all the set elements. The destructor
> on the local set will call free(). This is going to get expensive when
> done over and over, especially as sets are passed up through a lot of
> layers of recursion and so on. It's galling, because I can accomplish the
> same thing much, much faster, if I subvert the return mechanism.
>
> If I could pass a *bit-wise* copy of the automatic, temporary value, and then
> fail to have the original destructed on exit, just the right thing happens: the new
> value of c picks up the pointer to my data, that my automatic temporary used
> to have, and since the destructor isn't run on the local copy, I have
> effectively passed ownership of the array of elements to my caller - and
> done much more efficienty than I could do by duplicating the whole array for
> him, and freeing mine.
>
> I realise I can solve this trivially with something like:
>     Set outputc;
>     a.intersection(&outputc, b); //ick
> but that's (IMHO) bad style.
>
> So call this a copy-stealing operator, different than C++'s copy
> operator. If I could explain to the C++ compiler than I was providing one,
> it seems to me the compiler would  have enough information at all times,
> to know when to plant it (instead of a copy and a destructor).
>
> Does it exist? Has it been discussed? It seems to me that it could make
> certain kinds of code much more efficient.

What about defining a class transfer_set to deal with this?
The class transfer_set would be constructable from your Set,
and your Set should have copy constructor and assignment operator
for transfer_set. Constructing it would "steal" the pointer from the
object (the constructor must take a non-const reference to the set,
of course), and the conversion would "donate" that data to the
new Set (alternatively, you could write a Set constructor which
takes a transfer_set as parameter).

Now you would code your intersection function f. ex. as:

transfer_set Set::intersection(const Set& other)
{
  Set retval;
  // ...
  return retval;
};

then on a call

c=a.intersection(b)

the following  would happen:

- the return statement of intersection creates a transfer_set,
  which "steals" the data from retval
- retval is destructed (but the data isn't destroyed because
  the object doesn't possess it any more)
- the transfer_set is assigned to c, "donating" the
  "stolen" data to it
- the transfer_set is destructed, not destroying the data

This solution has the advantage over copy-on-write solutions
that you don't have to deal with ref counting and special
multithreading problems (at each time, each data is owned by
exactly one Set or one data_set).
OTOH, the transfer_set can be misused. A simple example:

Set a, b;
a.intersection(b); // of course, in this case, the data of the
                   // temporary transfer_set should be destroyed
                   // (it's not given to a Set)
(transfer_set)a;   // steals + destroys the data from a!


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: "Tom McKearney" <no@spam.com>
Date: 1998/04/14
Raw View
Scott Mayo wrote in message <27ecd.smail.smayo@ziplink.net>...
>Apologies if this has been thrashed out before.
>
>I have a class that maintains a set of members, by managing an array that

[snip]
>    c = a.intersection(b); //produce a new set
>
>The code for intersection() is going to work my creating a local, automatic
Set,
>adding members to it, and returning the Set. As I understand it,
>to return the new Set, the compiler is going to plant a call to my
>copy operator, as part of the return statement; then destroy the local,
automatic
>copy.
>
>I'd *really* rather it didn't do that. My copy operator is going to

There is no easy way.  As others have pointed out, you can use various
"lazy" or "copy-on-write" methods, but otherwise, you have no choice.  You
could allocate a pointer to an array and return that (or other such
ugliness), or you can just suck up the speed.

If speed is such a big issue, I guess your only question to answer is:  Do I
have the time to write a "copy-on-write" implementation of my data?  If not,
you're going to have to do some not-so-elegant solution.
My suggestion is that, if you don't have the time, put a large comment where
you wrote the hack code just to tell everyone "If I had the time, I would
have done x, y and z...".  This will make you feel better when your
coworkers are criticizing your code ;)

Tom McKearney





[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jbuck@best.com (Joseph Buck)
Date: 1998/04/14
Raw View
Robert Campbell  <rob@geonexuscorp.com> wrote:
>How about implementing a "Lazy Copy" scheme.  This would involve
>overriding most methods such that a reference count is kept for each
>object.

Yes, many C++ developers use this scheme (and many don't know that
they are using it, since they use string classes and the like that
use it).  But the operation of moving an object is common enough
that it would be worthwhile to have a standard STL algorithm that
performs the function and can be specialized (that is, a relocate
operation that is analogous to the STL swap algorithm).  Then operations
such as vector extension could be more efficient.

For the specific case the original poster described (returning a
complicated object from a function), the compiler is permitted to
attempt to optimize away the copy in some cases.

>The implementation of the STL we use here has this idea written in, such
>that returning things like strings, valarrays, vectors, ... all work
>relatively efficiently.

Not as efficiently as it could.  It annoys me that vector<string>
does lots of reference count incrementing and decrementing when
the vector is grown and the data are relocated (for the common
reference-counted string implementation), and vector<vector<T> > is
even worse.  In both cases, relocation can be done with a shallow
copy.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Valentin Bonnard <bonnardv@pratique.fr>
Date: 1998/04/14
Raw View
Scott Mayo <smayo@ziplink.net> writes:

> Apologies if this has been thrashed out before.

It has been discussed before, but it's an important C++
issue.

The compiler can avoid that. This is called the object
elision optimisation.

For example, given

class T {
...
public:
     T (const T&); // must be accessible
};

T   foo ()
{
    return T (args);
}

T x = foo ();

the compiler can do 2, 1 or 0 calls to the copy ctor.

It's the easiest way: rely on the compiler.

You can avoid that with the handle/body pattern, using
copy on write. For more information on copy on write, you
can read my web page:

http://pages.pratique.fr/~bonnardv/patterns_e.html#handle.body.rc

This is harder, but will also work with operator=. The compiler
can only avoid calls to the copy constructor.

For operations on matrix, you can have a look at Blitz++:
http://monet.uwaterloo.ca/blitz/

--

Valentin Bonnard                mailto:bonnardv@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]