Topic: Garbage collection algorithm
Author: springle@thegang.uucp (Simone Pringle)
Date: Wed, 26 Feb 92 21:51:33 GMT Raw View
I have been reading James O. Coplien book, Advanced C++ Programming Styles and
Idioms. He describes a garbage collection algorithm "... in the spirit of
mark-and-sweep and Baker's algorithm." (pp. 342).
The algorithm relies on semantics established by the envelope and letter
design metaphor using exemplars (analogous to SmallTalk's prototypes).
As I understand it, the algorithm is as follows: (somewhat paraphrased from
pp. 341 and 347)
(1) The first loop goes through every extant envelope instance
and sets the "mark" bit of the letter class it references.
We can identify all the objects of a given envelope class by
looking in the list in its exemplar.
(2) The second loop goes to each exemplar for the letter classes
and performs the sweep phase, destroying all unmarked letter
instances.
Earlier in the book (pp. 339) Coplien had made the comment that "... reference-
counted reclamation algorithms cannot reclaim memory for circularly referencial
data structures unless they employ expensive recursive scanning techniques."
What I don't get is how the algorithm described above "catches" such circular
references and actually destroys them. Take for an example, a letter class
(letter1) that contains an envelope (envelope 2) to a letter class (letter2)
that contains an envelope (envelope 3) back to the original letter (letter1).
See the picture below:
envelope 1 -> letter1 [envelope 2 -> letter2 [envelope 3 -+
|
/ \ |
| |
| |
+----------------------------------------+
The list in the exemplar to which letter1 belongs contains two envelopes that
refer to it: envelope 1 and envelope 3. The list in the exemplar to which
letter2 belongs to contains one envelope that refers to it: envelope 2. Once
the external envelope (envelope 1) is destroyed, the list of exemplars to
letter1 still contains envelope 3, so letter1 is never destroyed (likewise for
letter2).
Am I missing something? Is it that letter instances can't contain envelopes?
(I thought it was OK...)
Any hints? Email is OK.
Author: lewis@sophists.com (Lewis Pringle)
Date: Fri, 28 Feb 1992 15:05:27 GMT Raw View
In article <1992Feb26.215133.27803@thegang.uucp> springle@thegang.uucp (Simone Pringle) writes:
>I have been reading James O. Coplien book, Advanced C++ Programming Styles and
>Idioms. He describes a garbage collection algorithm "... in the spirit of
>mark-and-sweep and Baker's algorithm." (pp. 342).
>
...
>Am I missing something? Is it that letter instances can't contain envelopes?
>(I thought it was OK...)
>
The key to why this fails to catch circularities is because you have pointers
within the class of things being garbage collected to members of the list of external
pointers (envelopes). Thus, otherwise unreferenced items (letters) are referenced by
envelopes that are "uninteresting". (the key to why they are "referenced" is that the
algorithm looks thru all "envelopes" assuming they indicate external interest in a letter).
In other cases where this algorithm is commonly applied (eg. Lisp garbage collection)
that problem doesn't arise - the things being garbage collected dont have have pointers
to their external containers (eg newly created c variables). With garbage collecting
lisp cells you do have circularities, but they are all lisp-cell to lisp-cell,
and not lisp-cell to envelope (eg c variable) to lisp-cell. The key is that if the
circularities are all within the domain of "Lisp-Cells" or "letters" then unreferenced
cycles will be detected. The algorithm does not work across layers however.
As I undertand it, this is analogous to how multi-layer garbage collection schemes
work where you have short-term, and long-term arenas, and stuff in the short-term
arena can point to stuff in the long term arena, but not the the other way around reversed.
I think this is to allow garbage collection in the short-term arena without running
into the problem you described.
A number related of solutions based on this concept present themselves. The most
obvious (though unweildly) is to have letters point directly to other letters, and not
through envelopes. This presevers the layering assumption refered to above, but allows
for unaccounted references to letters. This places the burdon on us to manually account
for these references - possibly thru a manaul registration/unregistration scheme.
Another possibility is a three layer letter/envelope scheme. Things in the outer
layer can own inner layer ones, or things at their own level. If this seems too undisciplined,
they perhaps try applying this concept to the previous solution. One class of envelopes
for use EXCLUSIVELY as fields of letters (call them small envelopes) and require that
letters have NO pointers (even indirectly) to the Big envelopes (the ones we started with).
In this scheme the small envelopes are just a convenience mechanism to implement the manual
accounting in the first scheme.
Or, you could give up, and just vege out reading news...
Lewis.