Thread

Topic: Garbage Collection in C++

Author: fjh@munta.cs.mu.OZ.AU (Fergus James HENDERSON)
Date: Thu, 13 May 1993 07:53:05 GMT Raw View

parag@netcom.com (Parag Patel) writes:

>kanze@us-es.sel.de (James Kanze) writes:
>
>>What I'm looking for would be words along the lines of: "Memory
>>allocated with 'new' is only guarenteed to remain accessible as long
>>as there is at least one active pointer to the allocated block."  (I
>>know, the wording has to be improved, "active pointer" defined, etc.
>>This is just to give an idea of what I would like to see.)

Yes, something like this would be a good idea.

>This also affects the implementation of arrays of objects with
>destructors in C++.  g++ allocates a bit more memory than necessary for
>an array so it can stash the length at its head.  Then it returns the
>pointer+offset for your program to use, which violates the gc requirement.
>
>cfront maintains a separate hashtable which contains the lengths of
>arrays hashed by the pointer value.  Something like this second approach
>would be essentially mandated by this requirement for gc.  Or the gc
>would have to know enough to look for ptr and ptr+offset into each
>block, which will slow it down a bit.
>
>BTW - linking the Boehm GC into a C++ program compiled with g++ which
>allocates arrays of objects with destructors will crash that program
>unless GC is built with ALL_INTERIOR_POINTERS.  This makes GC look for
>any pointer pointing within an allocated block of memory rather than any
>pointer pointing to the head of an allocated block.  This slows down GC
>quite a bit.

In C++ it is necessary that the garbage collector look for any pointer
pointing within an allocated block of memory rather than just pointing
to the head of the block. This is a fundamental requirement imposed
by the language. If this slows down the GC, then so be it.

One example which shows that this requirement is fundamental is
virtual base classes. If I have
 class Derived : public virtual Base1, public virtual Base2 {};
 Base2* foo() { return new Derived; }
then the only pointer to the allocated object does not point
to the start of the allocated memory.

There are no doubt many other examples which demonstrate the fundamental
nature of this requirement, it's not a problem particular to the use
of virtual base classes.

--
Fergus Henderson                     This .signature virus might be
fjh@munta.cs.mu.OZ.AU                getting old, but you still can't
                                     consistently believe it unless you
Linux: Choice of a GNU Generation    copy it to your own .signature file!

Author: kanze@us-es.sel.de (James Kanze)
Date: 13 May 93 19:09:44 Raw View

In article <u90jl.021039.13May1993@ecs.ox.ac.uk> u90jl@ecs.ox.ac.uk
(Jamie Lokier) writes:

|> James Kanze <kanze@us-es.sel.de> writes:
|> > But...  There are certain coding practices that make some garbage
|> > collection algorithms impractical.  (I am thinking of things along the
|> > lines of XOR'ing pointers in a double linked list, or any other type
|> > of pointer mangling.)  Would it not be possible to keep our options
|> > open by stating in the standard that such practices are undefined?

|> That practice is undefined with or without garbage collection anyway.

You are the second person who has told me this.  Could you please tell
me why.

Note that I'm not speaking here specifically of the XOR'ing of
pointers, but all types of pointer mangling in general.  And for that
matter, what should one say about simply storing pointers in an
unaligned address (by using memcpy, for example).  Surely the
following program is legal today:

 void
 f( int i )
 {
  char  buf[ sizeof( char* ) + 1 ] ;
  char*  p = new char[ i ] ;
  memcpy( buf + 1 , &p , sizeof( p ) ) ;
  p = 0 ;
  //  Do a lot of other things...
  memcpy( &p , buf + 1 , sizeof( p ) ) ;
  //  Now use p.  Should point to originally alloc'ed mem.
 }

Most garbage collectors would miss the pointer at buf+1, since it is
at an odd address.

This is the sort of thing I want to ban.

    [Explaination of a particular implementation of garbage
 collection, etc....]

Basically, I too would like to see garbage collection added to C++.  I
think that it *will* be essential if C++ is to become a mainstream
object-oriented language.

But even more important, now, is to get a standard for what we've
already got.  And trying to add garbage collection (or tagged unions,
or nested functions, or ...) will only delay it.  So I will oppose any
move to add it to the current standard.

But we have to prepare for the future.  The C standard had a "future
directions" chapter, for example.  And I would like to see
restrictions added that will mean that we *can* add garbage collection
later without breaking any conforming programs.
--
James Kanze                             email: kanze@us-es.sel.de
GABI Software, Sarl., 8 rue du Faisan, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                   -- Beratung in industrieller Datenverarbeitung

Author: boehm@parc.xerox.com (Hans Boehm)
Date: 13 May 93 19:08:29 GMT Raw View

parag@netcom.com (Parag Patel) writes:

>...  Or the gc
>would have to know enough to look for ptr and ptr+offset into each
>block, which will slow it down a bit.

>BTW - linking the Boehm GC into a C++ program compiled with g++ which
>allocates arrays of objects with destructors will crash that program
>unless GC is built with ALL_INTERIOR_POINTERS.  This makes GC look for
>any pointer pointing within an allocated block of memory rather than any
>pointer pointing to the head of an allocated block.  This slows down GC
>quite a bit.

The collector actually provides two facilities for handling this.  It is
possible to register specific offsets into objects as valid addresses for
the object.  This is completely performance neutral, independent of the
number of offsets you register.  (The pointer validity check is done
largely by table lookup.  This affects only the contents of the tables.
The original reason for the tables is to avoid integer division, which
is atrociously slow on at least the machine on my desk.)  This approach
only works with well-disciplined client code.

The other approach is to recognize all interior pointers to objects.  This
is required by the Ellis-Detlefs proposal, and supported with the
ALL_INTERIOR_POINTERS switch.  This still should not sppreciably slow down
collection per se.  (It still uses table lookup, but the semantics of the
tables changes a bit.)  I think Ben Zorn has some measurements that confirm
this.  The cost is that the chance of a random integer pointing to a large
array becomes very high, unless you take appropriate precautions as to
where you allocate such arrays.  In recent versions of the collector, such
precautions are taken.  The real cost then is that if you try to allocate
very large arrays, you may not find a suitable hole for them that avoids
known false references.  Under SunOS 4.X with a static C library, large is
defined to be 100K or so.  On many other systems, or with dynamic
libraries, it's closer to 1 MB.  Details are in my upcoming SIGPLAN PLDI paper.

This can no doubt indirectly result in excessively large heaps, and thus
other performance problems.

Possible solutions:
1. More compiler support to restrict the number of possible pointer-
-containing locations.  (Refusing to allocate strings at addresses
congruent to 1 mod 4 would help on Suns, and may be a performance
win anyway.  Newer compilers may already do that.)
2. Allocate large arrays as a two level tree.  (For > 1 dimensional
arrays this may even be a performance win.)
3. Allocate such arrays using another mechanism, with a garbage
collected header.  Finalization code for the header frees the array.
4. Minimal cooperation from library writers would help a lot.  (The
SunOS 4.X C library contains several large tables of essentially random
bits.  It wouldn't be hard to exempt them from GC scanning.  But the code
would be highly nonportable.)

Hans-J. Boehm
(boehm@parc.xerox.com)

Author: chased@rbbb.Eng.Sun.COM (David Chase)
Date: 13 May 1993 21:44:06 GMT Raw View

In article <KANZE.93May6163058@slsvdnt.us-es.sel.de> kanze@us-es.sel.de (James Kanze) writes:
>What I'm looking for would be words along the lines of: "Memory
>allocated with 'new' is only guarenteed to remain accessible as long
>as there is at least one active pointer to the allocated block."  (I
>know, the wording has to be improved, "active pointer" defined, etc.
>This is just to give an idea of what I would like to see.)

>If I understand the Boehm garbage collection correctly, for example,
>such a restriction is all that would be needed to make it legal for a
>conforming implementation to offer the garbage collector.

Not quite (in hindsight, I may have misinterpreted you, but bear with
me).  The optimizer attached to your conforming implementation might
shred your pointers into a form not comprehendable by the Boehm (or
any other) garbage collector.  The usual worst-case example of this is
to take a loop along the lines of:

  char c, *p, *q; /* char to avoid address arithmetic quibbling. */
  ...
  while ((c = *p++) != 0) *q++ = c;

and turn it into
(legal assembly language for the C above, expressed as illegal C):

 q_minus_p = q - (p+1);
 while ((c = *p++) != 0) p[q_minus_p] = c;

On some machines, this version is more efficient, yet the pointer to
stored in q is potentially all gone.  Other tricks employed by the
optimizer include hoisted loop-invariant expressions that point
outside the logically addressed object, and intermediate address
calculations (RS/6000, in particular) that point outside the logically
addressed object.

Hans and I have catalogued some of these (I started looking at these
back in 1987).  People up at U Massachusetts (Moss, Diwan, Hudson, et
al?) are attempting to combine optimization with compacting garbage
collection in a GCC-based Modula-3 compiler, and have discovered
additional things that can go wrong in that sort of a system.

Of course, it is also possible to provide a conforming (and
optimizing) implementation that does NOT shred pointers, but that is
more expensive than either a non-shredding-and-non-optimizing or
shredding-and-optimizing compiler.  If nothing else, verifying that an
optimizer does not shred pointers is not likely to be much fun.

David Chase
Sun (speaking for myself)

Author: jimad@microsoft.com (Jim Adcock)
Date: 14 May 93 01:14:57 GMT Raw View

In article <9313317.23374@mulga.cs.mu.OZ.AU> fjh@munta.cs.mu.OZ.AU (Fergus James HENDERSON) writes:
>In C++ it is necessary that the garbage collector look for any pointer
>pointing within an allocated block of memory rather than just pointing
>to the head of the block.

Any legal pointer making a legal reference into the object, that is.
A smart GC could ignore non-conforming references -- since such *aren't*
references.

Author: kanze@us-es.sel.de (James Kanze)
Date: 14 May 93 17:53:36 Raw View

In article <lv5g96INNlaf@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM
(David Chase) writes:

|> In article <KANZE.93May6163058@slsvdnt.us-es.sel.de> kanze@us-es.sel.de (James Kanze) writes:
|> >What I'm looking for would be words along the lines of: "Memory
|> >allocated with 'new' is only guarenteed to remain accessible as long
|> >as there is at least one active pointer to the allocated block."  (I
|> >know, the wording has to be improved, "active pointer" defined, etc.
|> >This is just to give an idea of what I would like to see.)

|> >If I understand the Boehm garbage collection correctly, for example,
|> >such a restriction is all that would be needed to make it legal for a
|> >conforming implementation to offer the garbage collector.

|> Not quite (in hindsight, I may have misinterpreted you, but bear with
|> me).  The optimizer attached to your conforming implementation might
|> shred your pointers into a form not comprehendable by the Boehm (or
|> any other) garbage collector.

   [Examples of how compiler optimization can defeat garbage
 collection deleted...]

The optimizer is part of the implementation.  It is up the the
implementer to organize things so that one part of the implementation
(the optimizer) does not make another part (the garbage collector)
fail.

What I'm concerned with is practices that are legal now, or that are
generally considered portable enough that people might use them.  I'm
concerned about explicitly limiting what is supported, so an
implementation will have a fighting chance of being able to add
garbage collection later without breaking existing code.

I believe that I expressed myself poorly when I refered to the Boehm
implementation.  The present Boehm implementation is independant of
the compiler, and is added on.  The compiler doesn't know it is
there.  I am concerned about a complete implementation, where
something like the Boehm garbage collection is present, but the
compiler (and its optimizer) know about it, and take it into
consideration.

For example, in the case you cite, where the generated code only uses
one pointer instead of two, it would be sufficient that the compiler
not delete the variable containing the second pointer, even if this
variable is not used.  The optimization itself may still be used, but
the compiler has to *know* that it cannot delete the pointer, even
though it is no longer visibly used.
--
James Kanze                             email: kanze@us-es.sel.de
GABI Software, Sarl., 8 rue du Faisan, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                   -- Beratung in industrieller Datenverarbeitung

Author: wildbill@hicomb.hi.com (Bill Torcaso)
Date: 14 May 1993 13:16:50 GMT Raw View

  If I follow the various positions in the discussion of garbage collecting,
  they are condensed as follows:

 (A)  We should add GC to the language definition, stating restrictions
      on legal code, and making some amount of existing code illegal.

 (B)  We should add GC to the language definition, carefully
      preserving (almost all) existing code, and make compiler
      suppliers do the hard work.

 (C)  Various practices involving pointers (memcpy, fwrite/fread)
      make GC impossible.

 (D)  Various compiler optimization techniques produce fast code that
      make GC impossible.  There will be a choice among slow, or
      non-gc-able, or questionably-safe, or expensive, compilers.

  Going back to basics, would someone comment on exactly what GC is and what it
  is not?  What is the precise service under discussion?
  This is my understanding:

 The collector manages a pool of memory.  Entities are allocated from it
 (but not via 'new') and the de-allocation and re-use of this memory
 is not visible to the programmer.

 Pointers into the GC pool provide access to the memory; these pointers
 can be anywhere: in objects in the pool, in global objects, in
 heap objects, in local-variable objects on the run-time stack.

 When collecting must be done, the collector can find all pointers to
 live entities in the pool, and reclaim the memory for no-longer-live
 entities.  In some collectors, objects are moved and all pointers to
 the object get changed.

 The hard part of GC is 'find all pointers to live entities'.  The two
 controversial parts are (1) a programmer can alter and then restore the
 *representation* of a pointer, but between those two operations the
 collector cannot recognize a pointer into the pool, and (2) traversing
 the run-time stack to find live pointers requires integration with the
 compiler - it cannot be done with knowledge of implementation details.

  To fill out the matrix of possibilities, is it _possible_ to write a class
  which offers safe gc-able memory allocation, and derive from that class
  in your particular program?  Or does even a basic (e.g. reference counting)
  implementation have problems, like copying an object can subvert the
  reference counts in spite of good efforts in the copy constructor (or
  whatever)?

  If the answer is that a correct GC service requires compiler support and
  language-standard support, then I guess I'm against adding it to the
  language.



Bill Torcaso
Standard disclaimer:  I speak for myself only, and I get on the net where I can
manage it.

Author: tmb@arolla.idiap.ch (Thomas M. Breuel)
Date: 15 May 93 21:55:21 Raw View

>>>>> On 14 May 1993 13:16:50 GMT, wildbill@hicomb.hi.com (Bill Torcaso) said:
>   If I follow the various positions in the discussion of garbage collecting,
>   they are condensed as follows:

I don't think that's an accurate condensation.

>  (A)  We should add GC to the language definition, stating restrictions
>       on legal code, and making some amount of existing code illegal.
>
>  (B)  We should add GC to the language definition, carefully
>       preserving (almost all) existing code, and make compiler
>       suppliers do the hard work.

There actually isn't a whole lot of difference between (A) and (B).
Adding GC to the language, one way or the other, does not require
extensive changes.  In fact, I believe Stroustrup explicitly says that
vendors may do it already.

More significant changes along the lines that Detlefs and Ellis have
proposed may be desirable to achieve higher performance, make the
language safer, and allow compaction, but they are not necessary to
support GC.

>  (C)  Various practices involving pointers (memcpy, fwrite/fread)
>       make GC impossible.

That's not true.  Yes, you could, in principle, use "memcpy" and
"fwrite" to hide pointers from a garbage collector.  Such uses of
those functions in C++ code are exceedingly rare, and it is
questionable to what degree they are permitted by the existing
standard anyway.

One of the main purposes of explicitly adding garbage collection to
the C++ standard would be to spell out exactly what you can and what
you cannot do.

>  (D)  Various compiler optimization techniques produce fast code that
>       make GC impossible.  There will be a choice among slow, or
>       non-gc-able, or questionably-safe, or expensive, compilers.

Any problems with optimization and conservative GC revolve around the
compiler destroying or encoding the last reference to some block of
memory in some pointer calculation.  The compiler can still perform
such optimizations, but it has to generate code to maintain a copy of
the pointer somewhere.  The cost of that is negligible.

     Thomas.

Author: papresco@undergrad.math.uwaterloo.ca (Paul Prescod)
Date: Sat, 15 May 1993 20:09:38 GMT Raw View

In article <1993May10.194545.18207@rd.hydro.on.ca> twriter@rd.hydro.on.ca writes:

>Is there no room for compromise?  Here's my suggestion:  In addition to
>the language standard, let's have a set of standard extensions.  GC
>could be one such extension.  The advantage is that vendors need not
>provide GC in order to have a conforming implementation.  If, however,
>they choose to provide it, they will at least have a standard to
>follow.  Furthermore, programmers who use the standard extensions are
>guaranteed that their programs will with other compilers that provide
>the standard extensions.

This sounds like an excellent suggestion to me.  And in fact, it sounds like
a good way to extend languages in general.  Then compiler writers can offer
"Ansi Standard C++ with Ansi Standard GC/Multithreading/Windowing/Component
Libaray/..."

The base standard would lose no functionality, and you could code to that,
but when you needed more, at least there would be SOME standard.

Author: twriter@rd.hydro.on.ca (Timothy Writer)
Date: Mon, 10 May 93 19:45:45 GMT Raw View

matt@physics2.berkeley.edu (Matt Austern) writes:

>In article <25419@alice.att.com> bs@alice.att.com (Bjarne Stroustrup) writes:

>> It has been suggested that GC should be mandated in the C++ standard.
>> I see no hope for that. My impression is that many people overestimate
>> what a standards committee can do in the area of language design and
>> underestimate the fundamental conservatism of the majority of users
>> and tool suppliers. The standards committee is chartered to standardize
>> existing practice and can only push the limits of the language in a
>> few widely well-understood and accepted cases.

>In principle, I agree with this---there's a lot to say for the idea
>that a standards committee ought to standardize, instead of trying to
>legislate.

>I would, though, like to note two points.

>The first is that unless garbage collection does become part of some
>standard---maybe not the 1994 draft standard, but a standard a few
>years down the road---then I really don't see how it will ever become
>part of the language.  I can't see many compiler vendors making the
>major effort to put it in their products if no standard exists: why
>should they, if, when a standard does come out and mandates entirely
>different behavior, they'll have to undo that work and implement
>something else anyway?  I also can't see very many programmers writing
>code that relies on garbage collection if there's no guarantee that it
>will be available on at least a reasonable number of different
>implementations.  After all, automatic memory management is supposed
>to make things easier; it's not supposed to destroy portability!

This is just the problem of the chicken and the egg.  If GC becomes part
of the standard, vendors claiming to provide a conforming implementation
must provide GC.  Compiler vendors, many of whom are already "playing
catchup," are likely to view this as an unwelcome burden.

On the other hand, if GC is not made part of the standard, I foresee at
least two dangers:

 i)  Vendors won't provide GC in the near future.  C++ programmers
     who need GC now will continue to suffer with manual storage
     management or choose another language.

 ii) Vendors will provide non-standard and inevitably incompatible
     implementations of GC.  C++ programmers will have to choose
     between portability and convenience.

Is there no room for compromise?  Here's my suggestion:  In addition to
the language standard, let's have a set of standard extensions.  GC
could be one such extension.  The advantage is that vendors need not
provide GC in order to have a conforming implementation.  If, however,
they choose to provide it, they will at least have a standard to
follow.  Furthermore, programmers who use the standard extensions are
guaranteed that their programs will with other compilers that provide
the standard extensions.

> [stuff deleted]

Am I out to lunch?

Tim

--
Tim Writer        phone:  (416) 231-4111 ext. 6990
Ontario Hydro Research Division      fax:    (416) 237-9285
Toronto, Ontario       e-mail: twriter@rd.hydro.on.ca
CANADA

Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Tue, 11 May 1993 09:38:01 GMT Raw View

In article <1993May10.194545.18207@rd.hydro.on.ca> twriter@rd.hydro.on.ca writes:
>This is just the problem of the chicken and the egg.  If GC becomes part
>of the standard, vendors claiming to provide a conforming implementation
>must provide GC.  Compiler vendors, many of whom are already "playing
>catchup," are likely to view this as an unwelcome burden.
>
>On the other hand, if GC is not made part of the standard, I foresee at
>least two dangers:
>
> i)  Vendors won't provide GC in the near future.  C++ programmers
>     who need GC now will continue to suffer with manual storage
>     management or choose another language.
>
> ii) Vendors will provide non-standard and inevitably incompatible
>     implementations of GC.  C++ programmers will have to choose
>     between portability and convenience.
>
>Is there no room for compromise?  Here's my suggestion:  In addition to
>the language standard, let's have a set of standard extensions.  GC
>could be one such extension.  The advantage is that vendors need not
>provide GC in order to have a conforming implementation.  If, however,
>they choose to provide it, they will at least have a standard to
>follow.  Furthermore, programmers who use the standard extensions are
>guaranteed that their programs will with other compilers that provide
>the standard extensions.
>
>> [stuff deleted]
>
>Am I out to lunch?
>
 COBOL is defined that way: a core and a set of optional
modules such as indexed file handling, report writer, and sort.

 It can be done: but whether the committee is willing to
define multiple languages is another issue.

 Is this approach desirable (I have lots of extensions to
add that way: closures, variants ... :-)

--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,      CSERVE:10236.1703
        6 MacKay St ASHFIELD,     Mem: SA IT/9/22,SC22/WG21
        NSW 2131, AUSTRALIA

Author: jimad@microsoft.com (Jim Adcock)
Date: 11 May 93 16:55:37 GMT Raw View

In article <1993May10.194545.18207@rd.hydro.on.ca> twriter@rd.hydro.on.ca writes:
|Is there no room for compromise?  Here's my suggestion:  In addition to
|the language standard, let's have a set of standard extensions.  GC
|could be one such extension.

I think its very difficult to get agreement on a set of standard
extensions.  The Numerical C group's attempts to agree on anything
comes to mind.  Basically, the original language group is strongly
motivated to come to some kind of agreement on those items they
can agree on, no matter how compromised the result.  The extensions
groups then are left to try to agree on those items that couldn't
be agreed on.  The C++ committee formed with a wide consensus to
1) follow the ARM and 2) failing that follow the C standard.
This represented an amazing amount of initial consensus, and represents
a truly extraordinary event.

The only hope I'd see for GC would be if some vendor were to offer
a truly superior GC implementation, such that the other vendors were
forced to follow that effort.  *Then* there would be a basis of agreement
on which a GC extensions group could rally around.

Author: kanze@us-es.sel.de (James Kanze)
Date: 6 May 93 16:30:57 Raw View

In article <25419@alice.att.com> bs@alice.att.com (Bjarne Stroustrup)
writes:

|> Here are a few comments prompted by the discussion of garbage
|> collection, C compatibility, standards, etc.

    [...]

|> [As to why vendors (and others) might oppose adding GC to the
|> standard:] ... because adding GC to the standard
|> might delay the standard.

This one reason would suffice for me, even if none of the other
reasons Bjarne cited were valid.

But...  There are certain coding practices that make some garbage
collection algorithms impractical.  (I am thinking of things along the
lines of XOR'ing pointers in a double linked list, or any other type
of pointer mangling.)  Would it not be possible to keep our options
open by stating in the standard that such practices are undefined?

What I'm looking for would be words along the lines of: "Memory
allocated with 'new' is only guarenteed to remain accessible as long
as there is at least one active pointer to the allocated block."  (I
know, the wording has to be improved, "active pointer" defined, etc.
This is just to give an idea of what I would like to see.)

If I understand the Boehm garbage collection correctly, for example,
such a restriction is all that would be needed to make it legal for a
conforming implementation to offer the garbage collector.
--
James Kanze                             email: kanze@us-es.sel.de
GABI Software, Sarl., 8 rue du Faisan, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                   -- Beratung in industrieller Datenverarbeitung

Author: jimad@microsoft.com (Jim Adcock)
Date: 12 May 93 16:33:42 GMT Raw View

In article <1993May11.093801.25553@ucc.su.OZ.AU> maxtal@physics.su.OZ.AU (John Max Skaller) writes:
| COBOL is defined that way: a core and a set of optional
|modules such as indexed file handling, report writer, and sort.

Actually, ANSI/ISO C is defined that way too.  The "core" part of the
language applies both to "hosted" and un-"hosted" implementations,
whereas the libraries are only required of "hosted" implementations.
The intent being to allow C cross compilers targeting embedded processors
in toaster ovens and like, where I/O routines don't make much sense...
...unless you're gunna burn those bits into the side of someone's morning
biscuits.

Author: parag@netcom.com (Parag Patel)
Date: Wed, 12 May 1993 23:07:25 GMT Raw View

kanze@us-es.sel.de (James Kanze) writes:

>What I'm looking for would be words along the lines of: "Memory
>allocated with 'new' is only guarenteed to remain accessible as long
>as there is at least one active pointer to the allocated block."  (I
>know, the wording has to be improved, "active pointer" defined, etc.
>This is just to give an idea of what I would like to see.)

This also affects the implementation of arrays of objects with
destructors in C++.  g++ allocates a bit more memory than necessary for
an array so it can stash the length at its head.  Then it returns the
pointer+offset for your program to use, which violates the gc requirement.

cfront maintains a separate hashtable which contains the lengths of
arrays hashed by the pointer value.  Something like this second approach
would be essentially mandated by this requirement for gc.  Or the gc
would have to know enough to look for ptr and ptr+offset into each
block, which will slow it down a bit.

BTW - linking the Boehm GC into a C++ program compiled with g++ which
allocates arrays of objects with destructors will crash that program
unless GC is built with ALL_INTERIOR_POINTERS.  This makes GC look for
any pointer pointing within an allocated block of memory rather than any
pointer pointing to the head of an allocated block.  This slows down GC
quite a bit.

 -- Parag Patel <parag@netcom.com>

Author: u90jl@ecs.ox.ac.uk (Jamie Lokier)
Date: Thu, 13 May 1993 01:43:12 GMT Raw View

James Kanze <kanze@us-es.sel.de> writes:
> But...  There are certain coding practices that make some garbage
> collection algorithms impractical.  (I am thinking of things along the
> lines of XOR'ing pointers in a double linked list, or any other type
> of pointer mangling.)  Would it not be possible to keep our options
> open by stating in the standard that such practices are undefined?

That practice is undefined with or without garbage collection anyway.
As for other weird things you can do with pointers -- if the compiler
does it itself as an optimization, then you can expect the compiler to
generate enough data to let garbage collection do its job properly.  If
the programmer does it, and the compiler doesn't understand, then you
should expect the programmer to have some way of conveying the required
information to the garbage collector.

> What I'm looking for would be words along the lines of: "Memory
> allocated with 'new' is only guarenteed to remain accessible as long
> as there is at least one active pointer to the allocated block."  (I
> know, the wording has to be improved, "active pointer" defined, etc.
> This is just to give an idea of what I would like to see.)

That just isn't good enough.  I would really like to see garbage
collection properly available in C++, but I only require the minimum of
compiler support to implement it correctly.

To be more accurate, I don't mind if the C++ standard doesn't require
garbage collection in the system's standard library (roughly what you
are saying).  However, I would like to have the option of implementing
some form of garbage collection myself, and that requires a small amount
of compiler support to be effective.

In a recent project where I was writing a compiler, which required a lot
of very small objects allocated very quickly (forming complicated
graphs), I decided that garbage collection was the most efficient way
for me to implement it; the overhead of reference counting was far too
great, both in space and time.  I had no problem with actually
implementing a (rather simple) collecter.  The problems were

   (a) The compiler could provide me with no information to warn me
       if I had an unprotected pointer while doing something which
       might cause a GC.

   (b) For every object, I had to write a couple of virtual functions
       so that each object knew how to collect itself.

   (c) Derived classes had to redefine those virtual functions.  If they
       didn't, they would not collect properly.  The compiler could not
       warn me when I had forgotten to redefine the virtual functions.

   (d) The overhead in those small, recursive virtual function calls was
       rather excessive (due to the heavy use of the stack), so I had to
       find a way to implement a tail-call.  Unfortunately, I couldn't
       find a way of doing it reliably.

   (e) If the garbage collector were to access type-identification data
       in the class, to decide how to collect it, that took excessive
       space simply because of the need to store a number in the object
       (these were very small objects).

Storing type-information, like size and layout information, or a pointer
to code which could be jumped to, in the object's vtable, is a very
sensible and efficient way to implement GC, IMO.  All that I require
from a C++ compiler in order to implement *my* brand of GC is the
ability to put this kind of information in a C++ program.  I don't want
a library (I'd probably find a better way than the library for my
particular application).  If GC support is done in a general way, it
might even be useful for things other than garbage collection.

Type-specific object data (I call it "virtual data", because it is
similar to virtual functions) is not GC-specific at all.  Code which can
be tail-called an arbitrary number of times without nuking the machine
stack would be useful in many situations too -- I call this "virtual
jumps".  Both virtual data and virtual jumps provide useful forms of
run-time type-identification too; see my article entitled "RTTI with
inheritance: switch (object)"

Warnings about unprotected pointers (pointers not stored somewhere that
the garbage collector knows about) are rather GC-specific.  However,
even this is not necessary, if the compiler can provide the library with
some other way of identifying things like active pointers.  (I don't
automatically protect every pointer in the program because that incurs
lots of overhead, and often prevents the pointers residing in
registers).  A particularly efficient implementation is possible,
similar to the exception handling mechanism which requires *zero
overhead* when an exception is not thrown.  The code pointers found on
unwinding a stack can be used to index a database describing which bits
of program are in which state (in the case of GC, which stack elements
or registers contain pointers to GCable objects).  Compact
representations of such databases are feasible.

-- Jamie Lokier

mail: u90jl@ecs.ox.ac.uk

Author: bs@alice.att.com (Bjarne Stroustrup)
Date: 4 May 93 16:47:18 GMT Raw View

Here are a few comments prompted by the discussion of garbage
collection, C compatibility, standards, etc.



I was asked if I believe the efficiency claims made about GC.
I believe some. However, I don't believe that all the claims made
can be met simultaneously. For example, you can have low overall
overhead and guaranteed low service disruptions, but you cannot
have both simultaneously unless you punt on what ``low'' means
or add hardware support. I believe that much more experimentation
and gathering of experience and evidence is needed. Even the claims
I do believe must be documented to work in real C++ environments
before I'd be willing to recommend their widespread use. Garbage
collection must move from being a semi-religious issue to being an
issue where one can make rational tradeoffs between reasonably
understood alternatives.

Understanding the issues well enough to really understand even the
efficiency tradeoffs is hard:

 (1) I saw a GC example running faster than a non-GC ``equivalent''
 for a smallish test example. It turned out that the GC was so
 well integrated with the compiler that enough information was
 available to do GC with a fewer function calls than the traditional
 manual memory management needed for the explicit `delete's.
 (adding a special purpose new/delete naturally regained the
 advantage for the manual memory management, but that is slightly
 beside the point).

 (2) I saw a description of a system that demonstrated GC with
 only 10% overhead (compared to the system where no garbage was
 collective) and simultaneously kept service interruptions very
 brief without using special hardware. That looked great but
 surprised me. On closer examination, the speed of even the
 non-GC version of the system was about 20 times slower than
 equivalent C++. Thus the 10% overhead really was 200% relative
 to C++ - much more plausible.

I see a real need for non-GC implementations in the forseeable future.
For example, adding GC to existing large systems, using GC in computerized
gadgets, using GC in high-performance numeric computation, using GC
in concurrent systems, using GC in mixed language environments, using
GC for hard real-time problems, etc., all pose research problems. That is,
I don't see even the best GC implementation I can imagine serving all
needs for many years yet. For many applications there are also the
possibility of using specialized new/delete to gain performance unoptainable
by any non-specialized allocation system.

I also see a real need for GC - many sustems do not have the problems and
constraints I hinted at in the previous paragraph - and I consider it
attainable (though not easy) on the time scale of a couple of years
(shorter for smaller communities with specialized needs).



It has been suggested that GC should be mandated in the C++ standard.
I see no hope for that. My impression is that many people overestimate
what a standards committee can do in the area of language design and
underestimate the fundamental conservatism of the majority of users
and tool suppliers. The standards committee is chartered to standardize
existing practice and can only push the limits of the language in a
few widely well-understood and accepted cases.

For example, providing a good GC mechanism (assuming we really knew
what that was) would be a major undertaking for a compiler vendor.
Many vendors would be willing to undertake it if they believed that
their users would appreciate it and thought that they would gain
some competitive advantage from it. I strongly suspect that vendors
eventually will do that, and do it enthusiastically.

I strongly suspect that these very same vendors would object to making
GC a standards requirement because there is uncertainty about the value
of GC in the user community, because not all users need it, because putting
CG in the draft standard would lead users to complain about non-compliance
until GC was actually available, because no one likes to be told that
they have to do more (costly) work, and because adding GC to the standard
might delay the standard.

Many users will have the same attitude because they know that work on
GC will divert attention and effort from other areas that they consider
of more immediate importance.

Thus I believe that we will get optional GC in C++, but that it will not
be in the draft standard (scheduled for September 1994) or in the real
standard arising from public comments on that draft.



C lives and will continue to live for a long time to come, so will C-style
work in C++. People who declare C dead are somewhat premature, and people
who think they can improve C++ by purging ``the evil C parts'' are in my
opinion often misguided. Many of the low-level facilities of C are useful
and even more are used. My view is that to rid ourselves of a feature with
undesirable properties we first make it redundant, then discourage its use,
and then provide tools to guarantee its absence in a particular program.
Just banning something because we think we know it is evil is both infeasible
and arrogant.

Please remember that C++ is a language used by on the order of 500,000
programmers of unimaginably diverse backgrounds and opinions. Stability
- in the sense that yesterdays programs will continue to work tomorrow -
is vitally important to large sections of this community. The diversity
of needs, applications, and styles of uses defy the imagination of any
one individual. The ability to move from C, Fortran, Pascal, etc., to C++
and to support mixed language systems is and will remain essential during
the lifetime of the upcoming standard. C++ is not a kind of Christmas
tree on which one can hang features as ornaments. Nor is it silly-putty
that one can change at whim.

C++ will grow to be a better language (through language features, supporting
tools and environments, and libraries) and it will remain a stable base
for software development. Ensuring that these two necessities are reconciled
is essential and provides the base for the further success of the language.
Please try to avoid getting so overenthusiastic about one aspect that the
other is forgotten.

 - Bjarne Stroustrup

Author: boehm@parc.xerox.com (Hans Boehm)
Date: 4 May 93 22:05:11 GMT Raw View

bs@alice.att.com (Bjarne Stroustrup) writes:

>... Garbage
>collection must move from being a semi-religious issue to being an
>issue where one can make rational tradeoffs between reasonably
>understood alternatives.

>Understanding the issues well enough to really understand even the
>efficiency tradeoffs is hard:

> (1) I saw a GC example running faster than a non-GC ``equivalent''
> for a smallish test example. It turned out that the GC was so
> well integrated with the compiler that enough information was
> available to do GC with a fewer function calls than the traditional
> manual memory management needed for the explicit `delete's.
> (adding a special purpose new/delete naturally regained the
> advantage for the manual memory management, but that is slightly
> beside the point).

I believe Will Clinger has some measurements in which the performance advantage
of the collector was not even regained with special purpose allocation.  This
generally happens when the heap size is relatively large compared to live
data size.  Hence especially a copying garbage collector does almost
no work tracing through accessible objects, and allocation can be
very cheap (incrementing a pointer).  Unfortunately, the extreme cases
seem to be observed primarily in small benchmarks, and not in real
programs (which have lots of live data and constraints on heap size.)
Generational garbage collectors can hold on to some of this advantage for
real programs, but not all of it.  And they lose for clients that allocate
primarily very long-lived data structures.

All garbage collectors probably have a real advantage if you allocate
primarily tiny (say 4-8 byte) objects.  It's much cheaper to deallocate these
en masse than one at a time, especially if delete has to acquire a lock.
Copying them is relatively cheap compared to allocation or deallocation.

> (2) I saw a description of a system that demonstrated GC with
> only 10% overhead (compared to the system where no garbage was
> collective) and simultaneously kept service interruptions very
> brief without using special hardware. That looked great but
> surprised me. On closer examination, the speed of even the
> non-GC version of the system was about 20 times slower than
> equivalent C++. Thus the 10% overhead really was 200% relative
> to C++ - much more plausible.
>

Could you be more specific?  Did the implementation do the same amount of
allocation as the C++ version?  Was it meeting hard real-time constraints?
200% overhead on the total program running time seems implausibly high to
me, and way out of line with what we get in PCR (which has order(100 ms)
pause times).

I agree that there isn't enough experience with garbage collection in C++.
On the other hand, we have something like 30 years experience with
garbage collection in other contexts, and I would argue that we do know
quite a lot about its performance.  We probably know as much as we do
about the performance tradeoffs between various malloc/free implementations.
At least some of us have many years of experience with garbage collectors
in systems programming languages like Cedar Mesa, Modula-2+, and Modula 3.

For example, as has been pointed out in various recent messages,
the performance of our collector compares unevenly with
malloc/free.  But its performance is not unpredictable or mysterious.
Its basic allocation time is comparable to some of the better mallocs.
By default, it tries to keep the heap something like 1/4 empty.
Assuming code that doesn't keep unneeded pointers around, the remainder
of its space overhead is comparable to malloc.

Assuming straightforward allocation with GC_malloc, for every n bytes
allocated, approximately 3n bytes will have to be read by the collector
and traversed.   Plus we clear the object.  Thus the garbage collection
cost associated with allocating an n-byte object is on the same order as
that of scanning or initializing a 4n-byte object.  I think this rule
gives you numbers that are in the same ballpark as those posted or
published.  For 4-byte objects, we may win over a standard malloc/free,
since we don't need the free call, and touching 16 bytes of memory is cheap.
For 512 byte objects that may contain pointers, we lose.  But we're
unlikely to lose by 200%, since that probably means the average memory
object isn't even written and read once.

The performance of simple copying collectors is similarly predictable.
Concurrent/generational collectors are a bit more complicated, but there
are also many measurements of these in the literature.  As always,
not all of these measurements were made under reasonable conditions, but
many of them are useful.

And, as can be verified by reading Ben Zorn's papers, malloc/free
performance isn't that well understood either.

Hans-J. Boehm
(boehm@parc.xerox.com)

Author: matt@physics2.berkeley.edu (Matt Austern)
Date: 5 May 93 13:18:27 Raw View

In article <25419@alice.att.com> bs@alice.att.com (Bjarne Stroustrup) writes:

> It has been suggested that GC should be mandated in the C++ standard.
> I see no hope for that. My impression is that many people overestimate
> what a standards committee can do in the area of language design and
> underestimate the fundamental conservatism of the majority of users
> and tool suppliers. The standards committee is chartered to standardize
> existing practice and can only push the limits of the language in a
> few widely well-understood and accepted cases.

In principle, I agree with this---there's a lot to say for the idea
that a standards committee ought to standardize, instead of trying to
legislate.

I would, though, like to note two points.

The first is that unless garbage collection does become part of some
standard---maybe not the 1994 draft standard, but a standard a few
years down the road---then I really don't see how it will ever become
part of the language.  I can't see many compiler vendors making the
major effort to put it in their products if no standard exists: why
should they, if, when a standard does come out and mandates entirely
different behavior, they'll have to undo that work and implement
something else anyway?  I also can't see very many programmers writing
code that relies on garbage collection if there's no guarantee that it
will be available on at least a reasonable number of different
implementations.  After all, automatic memory management is supposed
to make things easier; it's not supposed to destroy portability!

My second point is that, in practice, the standards committee has not
always adhered to this rule: I'm thinking, in particular, of
exceptions, which were described in the base document for the
standardization process at a time when no commercially available
implementation supported them.

In fact, I think that the parallel between exceptions and garbage
collection is quite close.  Both language features can lead to a major
change in programming paradigm; in fact, in the case of both features,
there is no good way to perform certain kinds of tasks without using
them.  Both language features require a major effort to implement
properly; in particular, in both cases, a naive implementation might
lead to unacceptable overheads, even for code that doesn't
deliberately use them.  Finally, in both cases, the discussion about
adding them to the language is coming at a time when few or no
existing compilers actually support them.

I think we ought to take more seriously the idea that garbage
collection should be a central part of the C++ language; being forced
to do memory management by hand just has the potential to make program
design far more complicated than it has to be, and unnecessarily makes
certain programming styles impossible.
--
Matthew Austern                       Maybe we can eventually make language a
matt@physics.berkeley.edu             complete impediment to understanding.

Author: nagle@netcom.com (John Nagle)
Date: Sat, 8 May 1993 16:14:23 GMT Raw View

bs@alice.att.com (Bjarne Stroustrup) writes:
    "Many of the low-level facilities of C are useful and even more are used."

    That's an excellent way of putting it.

      John Nagle