Thread

Topic: Temporaries, copy ("reference") counting, and garbage collection

Author: gyro@kestrel.edu (Scott Layson Burson)
Date: 24 Aug 91 06:23:31 GMT Raw View

In article <74266@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:
>In article <ARI.HUTTUNEN.91Aug16233137@wonderwoman.hut.fi> Ari.Huttunen@hut.fi (Ari Juhani Huttunen) writes:
>>Because garbage collection is expensive, it should be used only when the
>>programmer decides so. I gather it is already possible to write a garbage
>>collector using member operators new/delete? But because I don't know how
>>to do it, I would like to see something like this:
>>
>>collected class String {
>
>I agree with your conclusion, but your premise is wrong.
>
>Garbage Collection need not be expensive.  Yes, there are many
>many very slow, very inefficient ways of implementing garbage collection.
>The very worse, very slowest, worst code generating method of all --
>reference counting -- seems to be a favorite of C++ programmers,

Now wait a minute, Jim.  I beg to differ with you here.  I submit that
for the particular purposes being discussed, copy counting [though it
is traditional, I dislike the term "reference counting" in the context
of C++, because in fact "references" in the C++ sense do not get
counted] ... as I was saying, I submit that copy counting is at least
as good a solution to the particular problems being discussed here as
garbage collection would be.

The reason I think so -- and I admit that I don't have hard data to
back this up (but that never stopped me from having an opinion before
:-) -- is that the use of copy counting permits, in many cases, an
optimization that is not possible in a GC-based system.  If I write an
expression `a + b', where `a' and `b' are strings or matrices or
whatever, then if either `a' or `b' has a copy count of 1 prior to the
operation, I know that it's going to be garbage after the operation,
and I may therefore be able to reuse its storage in the course of
computing the result.

Admittedly, this doesn't always work, because I might need a chunk of
storage for the result that is larger than the pieces allocated for
`a' and `b'.  However, if that is true, then it is at least somewhat
likely that the operation I am performing is going to take sufficient
time just to do the work that the additional overhead of allocating
the new memory will not be too significant.  Or, to put it another
way: the operations one expects to be efficient -- adding a single
element to a vector, for instance -- tend to be those for which this
optimization is usually possible.

I believe I've seen a message on one of these lists to the effect that
somebody was trying to use overloaded arithmetic operators on some
fairly heavyweight data types -- might have been matrices -- but gave
up because of problems with temporary destruction.  In this case I
consider their failure to use copy counting (for whatever reason, I
don't know) to be little short of tragic, because the operations they
were performing had sufficient inherent cost that the additional
overhead of dealing with the copy count would surely have been lost in
the noise.

So I would suggest that copy counting is actually underutilized by C++
programmers, rather than overutilized.

Granted, it's partly a matter of context -- generous use of copy
counting will bloat your code size a bit, making it less suitable for
those poor slobs fighting a 640K limit or whatever.  In a Unix
workstation environment, with cycles and megabytes to burn
(comparatively), I don't hesitate to use it freely.

I'm not saying GC is a bad thing, not at all; and I agree that it has
the advantages you claim, although every GC I know of on stock
hardware has one significant disadvantage: it causes the program to
"go away" periodically.  In contrast, the overhead of copy counting is
distributed.  Ergonomically, users generally seem to prefer a system
that runs slower to one that stops entirely for several seconds at
unpredictable intervals.

-- Scott
Gyro@Reasoning.COM

Author: cok@islsun.Kodak.COM (David Cok)
Date: 24 Aug 91 22:48:39 GMT Raw View

In article <1991Aug24.062331.20778@kestrel.edu> gyro@kestrel.edu (Scott Layson Burson) writes:
 ....
>
>I believe I've seen a message on one of these lists to the effect that
>somebody was trying to use overloaded arithmetic operators on some
>fairly heavyweight data types -- might have been matrices -- but gave
>up because of problems with temporary destruction.  In this case I
>consider their failure to use copy counting (for whatever reason, I
>don't know) to be little short of tragic, because the operations they
>were performing had sufficient inherent cost that the additional
>overhead of dealing with the copy count would surely have been lost in
>the noise.
>

Copy (reference) counting does allow optimizations as you described in the part
of the post I did not replicate.  However, it does not solve the problem of
lifetime of temporaries -- and is more complicated in optimization than
you describe.  First, if a temporary is created and points to data which
then has a reference count of 1, it is still up to the compiler to call
the destructor.  When the desctructor is called, the copy count goes to
zero and the data is deallocated -- but copy counting does not help the
compiler do it any sooner.

Second, one cannot, with copy counting alone distinguish between temporaries
and non-temporary variables.  Consider   A + (B*C).  When doing the + operation
the second argument will have a copy count of 1, so the memory area it points
to can be reused, right?  Wrong.  Consider  A + D.  Here the second argument
also has a copy count of 1, but its data cannot be reused, because D is not
going to be destructed.

One possible solution (which I have not implemented) is to have a shadow
class behind the Matrix (or whatever) class.  Conversions to the shadow
class (Matrix_temp) would increment the copy count; conversions back to
Matrix would decrement the copy count.  All operations are done in terms of
the Matrix_temp class.  Temporaries would then have a copy count of 1;
non-temporaries would have a copy count of at least 2.

By the way, we didn't give up because of problems with temporary destruction.
It just made things harder.  And we do use reference counting where it seems
to make sense.

David R. Cok
Eastman Kodak Company

Author: kearns@softrue.UUCP (Steven Kearns)
Date: 25 Aug 91 16:13:44 GMT Raw View


> The reason I think so -- and I admit that I don't have hard data to
> back this up (but that never stopped me from having an opinion before
> :-) -- is that the use of copy counting permits, in many cases, an
> optimization that is not possible in a GC-based system.  If I write an
> ....
> -- Scott
> Gyro@Reasoning.COM


I have always found that one of the best benefits of reference
counting is that you can implement "copy-on-write"; for example, if
the same string is part of 10 different structures, each string
shares the same memory until it is modified, at which point it is
transparently copied.  This can save lots of space!  Time, too, since
you can avoid creating a new string.


-steve

********************************************************
* Steven Kearns            ....uunet!softrue!kearns    *
* Software Truth           softrue!kearns@uunet.uu.net *
********************************************************

Author: rfb@CS.CMU.EDU (Rick Busdiecker)
Date: 26 Aug 91 20:30:38 GMT Raw View

In article <21.UUL1.3#8618@softrue.UUCP> kearns@softrue.UUCP (Steven Kearns) writes:
   I have always found that one of the best benefits of reference
   counting is that you can implement "copy-on-write"; for example, if
   the same string is part of 10 different structures, each string
   shares the same memory until it is modified, at which point it is
   transparently copied.  This can save lots of space!  Time, too, since
   you can avoid creating a new string.

You can do this without attempting to use reference counting in place
of memory management -- just avoid doing destructive operations on the
shared pieces of memory.  Reference counting has enough drawbacks
(memory fragmentation, memory leaks) that it's really not a good idea.
--
Rick.Busdiecker@CS.CMU.EDU
``Everything should be made as simple as possible, but not more so.
     - Albert Einstein

Author: john@bnrmtl.bnr.ca (John Hickin)
Date: 27 Aug 91 18:08:46 GMT Raw View

In article <21.UUL1.3#8618@softrue.UUCP>, kearns@softrue.UUCP (Steven Kearns) writes:
|>
|> ...
|>
|> I have always found that one of the best benefits of reference
|> counting is that you can implement "copy-on-write"; for example, if
|> the same string is part of 10 different structures, each string
|> shares the same memory until it is modified, at which point it is
|> transparently copied.  This can save lots of space!  Time, too, since
|> you can avoid creating a new string.
|>
|>
|> -steve
|>
|> ********************************************************
|> * Steven Kearns            ....uunet!softrue!kearns    *
|> * Software Truth           softrue!kearns@uunet.uu.net *
|> ********************************************************

Another advantage of COW is that the default constructor can be coded to
share a global default.  In the string example, the costs of

 String aString;
 char*  pChar;

are almost the same (neglecting the one-time cost of building the class global
empty string and assuming that String is just a glorified pointer to a privte
representation).

--
John Hickin      Bell-Northern Research, Montreal, Quebec
(514) 765-8888   bnrmtl!john@larry.mcrcim.mcgill.edu