Thread

Topic: Real-Time GC (was Widespread C++ Competency Gap?)

Author: nettles+@cs.cmu.edu (Scott Nettles)
Date: 4 Jan 1995 17:04:00 GMT Raw View

Just a few comments on Han's post, most of which I agree with completely as
usual.

In article <3ectgt$49m@news.parc.xerox.com>,
Hans Boehm <boehm@parc.xerox.com> wrote:
>I believe their collector is closely related to the collectors described
>in the 1991 PLDI paper by Demers, Shenker, and myself.  The major
>differences are:

I believe that this is true of our current implementation, which uses a
from-space invariant.  However the general technique of replicating
collection does not require a strict from or to-space invariant.  The
versions which allow the client to access either from-space or to-space
objects differ considerably from the collectors Han's mentions.

All of the other comments are accurate WRT our current implementation.

>observed with dirty bits on 4K pages, though you may have to rethink
>how you track updates.  Certainly even our VM-based version is a big

I agree. In general the issue of how best to implement the write barrier is
an important one.  Write barriers are useful for many things, generational
GC, concurrent GC, transactions, distributed shared memory...

>I don't see much hope for making either collector hard real-time.  In the

I'm more optimistic, but only time and thought and hacking will tell.

>I think this whole idea is orthogonal to generational garbage collection.
>At least in our environment, generational collection doesn't suffice
>to get interactive response.  Survival rates for young objects are

This is an important point. My view is that generational collection makes
the throughput of GC acceptable and may make it easier to achieve good pause
times, but concurrent/incremental collection is required to bound pause
times.

Scott
nettles@cs.cmu.edu

Author: hbaker@netcom.com (Henry Baker)
Date: Thu, 5 Jan 1995 05:21:32 GMT Raw View

In article <19950104.170121.171265.NETNEWS@UICVM.UIC.EDU>,
dhanley@matisse.eecs.uic.edu (David Hanley) wrote:

> Henry Baker (hbaker@netcom.com) wrote:
>
> : > Do you or anyone know of any malloc/free implementations which do give a
> : > firm bound?  How about ones which do so and avoid at some fragmentation
> : > related problems?  I'd be very interested in cites.  Email would be nice,
> : > I'm in thesis mode and I'm only reading this thread once today. Thanks!
>
> : I'm not aware of any malloc/free implementations which have firm bounds
> : of less than O(log S), where S is the size of the storage being managed.
> : This may be good enough for many purposes, since S is usually more-or-less
> : constant.
>
>         Won't a BSD allocator typically do better than this?

Not for _firm_ worst-case bounds.  Actually, the bound can be O(log M), where M
is the largest size block that you will ever need.  The bound I mentioned
before implicitly assumed that one might want objects that were a significant
fraction of all the space available.

Author: hbaker@netcom.com (Henry Baker)
Date: Wed, 4 Jan 1995 17:37:23 GMT Raw View

In article <3eehdh$b8r@cantaloupe.srv.cs.cmu.edu>, nettles+@cs.cmu.edu
(Scott Nettles) wrote:

> In article <nagleD1tFDC.JDr@netcom.com>, John Nagle <nagle@netcom.com> wrote:
>
> >      Many of the existing C/C++ storage allocators that don't use
> >garbage collection don't have hard upper bounds on response time,
> >either.
>
> Do you or anyone know of any malloc/free implementations which do give a
> firm bound?  How about ones which do so and avoid at some fragmentation
> related problems?  I'd be very interested in cites.  Email would be nice,
> I'm in thesis mode and I'm only reading this thread once today. Thanks!

I'm not aware of any malloc/free implementations which have firm bounds
of less than O(log S), where S is the size of the storage being managed.
This may be good enough for many purposes, since S is usually more-or-less
constant.

Note, however, that this time cannot possibly include any initialization.
Any initialization that visits every byte will of necessity take O(n) time,
where n is the size of the object being allocated.

I have suggested schemes which require less initialization -- specifically
schemes involving 'destructive' reading, so that storage is already
cleared at the time it is freed.

See URL ftp://ftp.netcom.com/pub/hb/hbaker/Use1Var.html for more details.

Author: kelvin@cs.iastate.edu (Kelvin Nilsen)
Date: 30 Dec 94 21:27:39 GMT Raw View

boehm@parc.xerox.com (Hans Boehm) writes:

>I'm not an expert on real-time programming.  But I'm always confused
>by these discussions.  It seems to me that "real-time" is usually
>intended to mean one of the following two extremes, or anything
>in between:

>1. Reasonable interactive response on a fast machine.  Maybe good enough
>so that a video game isn't too jerky.

>2. Part of a safety critical system that requires millisecond or so
>GUARANTEED response, and GUARANTEED space bounds.

Based on my experiences attempting to "educate" the real-time community
as to the potential benefits of dynamic memory management and real-time
garbage collection, I would say that Hans has done a good job of
characterizing the spectrum of applications that might be considered to
be "real-time", as well as summarizing the difficulties and challenges
faced by those who attempt to utilize dynamic memory management in
real-time systems.

I would like to point out that there is a very important (increasingly so)
application domain that I'll call:

 1.99: Part of a large system (that is not safety critical) that
 requires GUARANTEED response and GUARANTEED space bounds.

This includes video-on-demand servers, tv-top multimedia boxes, virtual
reality systems, and voice-based human-computer interfaces.  Some might
argue that no guarantees are necessary in such systems.  But won't the
service providers need to provide guarantees to the consumer that:

 a) Her window-oriented TV viewer won't run out of memory and crash right
    in the middle of the winning Super Bowl touchdown play?

 b) His on-demand viewing of the latest Arnold Swarzenegger movie won't
    degrade during the final scene?

 c) Her voice processing system won't arbitrarily ignore words spoken during
    inattentive periods, possibly negating the intended meanings of complete
    sentences?

 d) His new virtual reality system won't give him nausea because the
    video doesn't synchronize perfectly with audio and mechanical
    events?

 e) Her new Pen computer won't misinterpret her hasty but quite legible
    FAX instructions and send an important confidential document to
    the wrong recipient?

Even if the supplier makes no explicit guarantees, such guarantees are
implied.  Can you imagine trying to watch TV on anything as unreliable
as Microsoft Windows is today?  Wouldn't we see huge class-action law
suits against suppliers of products that fail to deliver acceptable
quality?  Are company executives willing to risk their future careers on
high-volume distribution of "unguaranteed" software products?  Even if
there is no class-action law suit against the company, market competition
is likely to impose a stiff penalty on suppliers of unreliable and/or
otherwise disfunctional "real-time" components.

(Note: I am speaking of "guarantee" in the colloquial sense of the word.
 I often buy guaranteed products that are in fact flawed.  A guarantee
 provides me with an opportunity to have my money refunded or the product
 replaced.  The product suppliers have presumably performed a risk analysis
 upon which they base the guarantee's coverage, the company's stated
 liabilities, and the product's prices.  A developer of reliable real-time
 software may not always be able to afford the high resource costs of
 providing 100% assurance that all response-time and memory-allocation
 needs will always be satisfied.  In these cases, the developer must
 quantify the likelihood that certain bounds will be violated, and management
 must make decisions as to how these risks are to be tolerated.)

>d. Since you have to statically bound worst-case space usage, it may
>often be just as easy to allocate space statically.  (I would love to know
>how often this is really the case.  Any informed opinions?)

As Hans' original posting already emphasized, dynamic memory management is
not currently used in existing "hard" real-time systems, so it is difficult
to give factual data to back any opinion, one way or the other...  but here
are a few examples that demonstrate that it is not always desirable to
statically allocate all real-time memory needs (because of the high
implementation costs and the constraints imposed on program functionality):

1) Your personal digital assistant (PDA) specifies that it supports
   up to 30 minutes of voice notes or 100 pages of handwritten notes,
   or any combination of the two in appropriate proportions.  Now, add
   in memory for your address book, your appointments, received FAXes,
   and your to-do list...

2) Your program supports several distinct phases or modes of operation,
   each of which uses memory for different purposes.  (consider the
   control software for a fighter aircraft: takeoff, land, dogfight,
   in-flight refuel, cruise, and attack-land-target are all mutually
   exclusive modes of operation)

3) Your program makes use of huge amounts of memory, but only "small"
   portions of this memory need to be accessed at any particular time
   during program execution.  (example: SGI's Ada Paintball virtual
   reality system is reported to require 500 MBytes of real RAM, but
   would run in "much less memory" if there was a reliable way to
   manage dynamic memory without violating real-time constraints.)

4) Your program supports "guaranteed" minimal levels of service,
   with optional service enhancements provided as resources permit.

   a) The throughput of many multimedia applications is significantly
      improved through the use of disk and network caches and/or prefetch
      buffers.  Dynamic memory management allows service quality to be
      enhanced through the use of dynamically allocated cache buffers
      whenever extra memory is available to serve these purposes.

   b) In distributed and parallel real-time systems, it is often desirable
      to shift data and tasks between processors in order to better balance
      the load.  This migration of responsibilities requires that memory
      be allocated on the target nodes to represent the relocated tasks
      and/or their data.

   c) Some important real-time applications require the solution of
      difficult problems for which no constant-time algorithms exist.
      In these circumstances, it may be necessary for the real-time
      software to make an initial (constant-time) guess at the proper
      solution and then try to incrementally refine this guess as time
      and memory permit.  Being able to dynamically allocate memory
      provides maximum opportunity to dynamically select between
      various time-space tradeoffs in order to optimize the quality of
      service for each operating domain. (Hint: many of these sorts
      of applications are military related)

In summary, real-time programmers would like to be able to do just about
everything that traditional programmers do.  They are forced by current
technology limitations to limit system functionality.  My "informed"
opinion is that it is no more appropriate to expect real-time programmers
to statically allocate all memory than it is for us to remove the new
operation from C++.

Author: dhanley@matisse.eecs.uic.edu (David Hanley)
Date: Fri, 30 Dec 1994 20:15:15 +0000 Raw View

        I had an idea, that, if practical, would go a long way to
justifying GC for me.  If this idea is not new( and I'm guessing it
isn't ) could someone please comment about it to me?

        The idea is to use an extra processor to do garbage collection.
This would relieve the first processor for almost any memory management
tasks.  GC'd programs might therefore run even more quickly than manually
msnsged ones.  This might also be a good way to utilize extra processors
on multiple-processor machines.  Multiprocessor machines will be the
norm in a couple of more years, IMHO.

        Anyways here's the idea: Periodically, the entire program data
space is copied into temporary space, along with a copy of the
rootset.  Unfortunately, the execution of the program must be halted
for this.  On the bright side, this could be done while the task would
be halted ( I'm assuming a multitasking machine here ) anyways, so we
might not lose very much.

        The program can be allowed to execute once again on it's
processor.  Meanwhile, on the second processor, the list of objects
that cannot be accessed from the rootset will be built.  The next GC
interrupt will simply add this to the freelist, before copying the
dataspace again.

        If this could be done without copying the dataspace, it
would, of course, be even better( or even fantastic ).  I tend to
think this turns out to be a necessary step.

        Comments?

--
------------------------------------------------------------------------------
| David James Hanley  -- dhanley@lac.eecs.uic.edu  -- C++, OOD, martial arts |
| Laboratory for advanced computing      |      My employer barely KNOWS me. |
------------------------------------------------------------------------------
"There are trivial truths & there are great truths. The opposite of a
trivial truth is plainly false."
"The opposite of a great truth is also true."

-Neils Bohr

Author: norman@flaubert.bellcore.com (Norman Ramsey)
Date: 31 Dec 1994 04:11:48 GMT Raw View

See a number of PLDI papers, possibly starting with this one:

Proceedings of the ACM SIGPLAN Symposium on Compiler Construction,
       Proceedings of the ACM SIGPLAN Conference on Programming
Language Design
       and Implementation (1979-) (Refs: 340)


       @Article{appel:88,
         author =       "Andrew W. Appel and John R. Ellis and Kai
Li",
         title =        "Real-Time Concurrent Collection on Stock
                        Multiprocessors",
         pages =        "11--20",
         journal =      sigplan,
         year =         "1988",
         month =        jul,
         volume =       "23",
         number =       "7",
         note =         pldi88,
       }

Author: hbaker@netcom.com (Henry Baker)
Date: Sat, 31 Dec 1994 07:19:24 GMT Raw View

In article <19941230.201628.350635.NETNEWS@UICVM.UIC.EDU>,
dhanley@matisse.eecs.uic.edu (David Hanley) wrote:

>         I had an idea, that, if practical, would go a long way to
> justifying GC for me.  If this idea is not new( and I'm guessing it
> isn't ) could someone please comment about it to me?
>
>         The idea is to use an extra processor to do garbage collection.
> This would relieve the first processor for almost any memory management
> tasks.  GC'd programs might therefore run even more quickly than manually
> msnsged ones.  This might also be a good way to utilize extra processors
> on multiple-processor machines.  Multiprocessor machines will be the
> norm in a couple of more years, IMHO.
>
>         Anyways here's the idea: Periodically, the entire program data
> space is copied into temporary space, along with a copy of the
> rootset.  Unfortunately, the execution of the program must be halted
> for this.  On the bright side, this could be done while the task would
> be halted ( I'm assuming a multitasking machine here ) anyways, so we
> might not lose very much.
>
>         The program can be allowed to execute once again on it's
> processor.  Meanwhile, on the second processor, the list of objects
> that cannot be accessed from the rootset will be built.  The next GC
> interrupt will simply add this to the freelist, before copying the
> dataspace again.
>
>         If this could be done without copying the dataspace, it
> would, of course, be even better( or even fantastic ).  I tend to
> think this turns out to be a necessary step.
>
>         Comments?

This is a brilliant idea!!

Now look at the URL's

ftp://ftp.netcom.com/pub/hb/hbaker/RealTimeGC.html  (also .ps.Z)
ftp://ftp.netcom.com/pub/hb/hbaker/CacheCGC.html (also .ps.Z)

(Great minds think alike!  :-)

Author: "Ronald F. Guilmette" <rfg@rahul.net>
Date: 31 Dec 1994 08:32:21 GMT Raw View

In article <19941230.201628.350635.NETNEWS@UICVM.UIC.EDU>,
David Hanley <dhanley@matisse.eecs.uic.edu> wrote:
>
>        I had an idea, that, if practical, would go a long way to
>justifying GC for me.  If this idea is not new( and I'm guessing it
>isn't ) could someone please comment about it to me?
>
>        The idea is to use an extra processor to do garbage collection...

Yea, like maybe all of those mostly-good Pentiums that paranoid people
are about to toss out. (1/2 :-)

--

-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- E-mail: rfg@segfault.us.com ----------- Purveyors of Compiler Test ----
-------------------------------------------- Suites and Bullet-Proof Shoes -

Author: wilson@cs.utexas.edu (Paul Wilson)
Date: 2 Jan 1995 12:14:34 -0600 Raw View

In article <3e2lmf$91@lowell.bellcore.com>,
Norman Ramsey <norman@flaubert.bellcore.com> wrote:
>
>See a number of PLDI papers, possibly starting with this one:
>
>       @Article{appel:88,
>         author =       "Andrew W. Appel and John R. Ellis and Kai
>Li",
>         title =        "Real-Time Concurrent Collection on Stock
>                        Multiprocessors",
>  [ ... ]

Anybody interested in real-time GC should be aware that this and other
algorithms that rely on virtual memory hardware for coordination are
not really real time (or at best, very, very soft real time).

Even Baker's real-time copying algorithm may not be usefully real-time
without specialized hardware support like Kelvin Nilsen's.  Copying
collection is hard to make real-time without a significant performance
hit.

--
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Recent papers on garbage collection, memory hierarchies, persistence and
| Scheme interpreters and compilers available via ftp from cs.utexas.edu
| (128.83.139.9), in pub/garbage or http://www.cs.utexas.edu/oops/)

Author: ellis@parc.xerox.com (John Ellis)
Date: 2 Jan 1995 19:30:05 GMT Raw View

Paul Wilson writes:

    Anybody interested in real-time GC should be aware that this and
    other algorithms that rely on virtual memory hardware for
    coordination are not really real time (or at best, very, very soft
    real time).

    Even Baker's real-time copying algorithm may not be usefully real-time
    without specialized hardware support like Kelvin Nilsen's.

A clarification: The only difference in real-time behavior between
Baker's real-time copying algorithm and a VM-synchronized copying
algorithm is a constant factor related to the page size.  Both
algorithms have the property that pauses between program steps are
bounded by a constant, and both algorithms can slow the program's
progress for long, essentially unpredictable periods of time as the
program references data in from-space that must be copied to to-space.
The maximum pause between individual program steps in Baker's
algorithm is smaller, but the overall decrease in program progress is
essentially the same.

Paul is correct in saying that the known copying algorithms may not be
useful for many or most commercial applications labelled "real-time".

Author: hbaker@netcom.com (Henry Baker)
Date: Mon, 2 Jan 1995 20:46:49 GMT Raw View

In article <3e9k7t$suu@news.parc.xerox.com>, ellis@parc.xerox.com (John
Ellis) wrote:

> Paul Wilson writes:
>
>     Anybody interested in real-time GC should be aware that this and
>     other algorithms that rely on virtual memory hardware for
>     coordination are not really real time (or at best, very, very soft
>     real time).
>
> Paul is correct in saying that the known copying algorithms may not be
> useful for many or most commercial applications labelled "real-time".

I believe that Kaleida's ScriptX uses a version of my 'Treadmill'
RT GC algorithm.  I don't know what its latencies are, but it is
supposedly good enough for multimedia.  Although it doesn't copy, it is
isomorphic to a copying algorithm.  WWW references URL's:

ftp://ftp.netcom.com/pub/hb/hbaker/NoMotionGC.html

ftp://ftp.netcom.com/pub/hb/hbaker/RealTimeGC.html

Author: nagle@netcom.com (John Nagle)
Date: Tue, 3 Jan 1995 06:00:48 GMT Raw View

ellis@parc.xerox.com (John Ellis) writes:
>Paul is correct in saying that the known copying algorithms may not be
>useful for many or most commercial applications labelled "real-time".

      Many of the existing C/C++ storage allocators that don't use
garbage collection don't have hard upper bounds on response time,
either.

      I used to be opposed to putting garbage collection into C++.
But memory leaks and bad pointers cause so much trouble in C++ that
it's worth considering, especially for GUI-type applications.
It's possible to get manual allocation right, but getting it right requires
that storage management be a primary consideration in design,
if not the primary consideration.  This warps class library design.
C++ programs have to obsess on who owns what, even when who owns
what isn't a major functional issue for the program.

      A good example is the distinction between collections and
containers.  In Smalltalk, collections are a generally useful tool
for dealing with groups of objects, regardless of who owns them.
Containers include the concept of ownership; an object can't be in
two containers at once (at least for STL containers).  This prevents
the use of containers for representing temporary groups of objects.
Yes, it's possible to work around this by adding an extra layer of
pointers, but it's unnecessarily messy.

     John Nagle

Author: jgmorris@cs.cmu.edu (Greg Morrisett)
Date: 3 Jan 1995 17:28:27 GMT Raw View

In article <3e9fqa$8ej@jive.cs.utexas.edu>,
Paul Wilson <wilson@cs.utexas.edu> wrote:
>Anybody interested in real-time GC should be aware that this and other
>algorithms that rely on virtual memory hardware for coordination are
>not really real time (or at best, very, very soft real time).
>
>Even Baker's real-time copying algorithm may not be usefully real-time
>without specialized hardware support like Kelvin Nilsen's.  Copying
>collection is hard to make real-time without a significant performance
>hit.

I disagree.  See the work by Nettles and O'Toole.
They show that copying garbage collection can be done incrementally,
concurrently, and with guaranteed pause times.  The basic idea is
to make a replica of the heap instead of destructively updating the
current version.  Then the mutator (user's program) can access the
old replica while the collector is building the compacted replica.
The overheads are quite acceptable.

Here's some references all of which can be found at the following URL:
http://www.cs.cmu.edu:8001/afs/cs.cmu.edu/project/venari/www/home.html.

Concurrent Compacting Garbage Collection of a Persistent Heap.
  James O'Toole, Scott Nettles, and David Gifford. Proceedings of SOSP '93.

Replication-Based Real-Time Garbage Collection. Scott Nettles and
  James O'Toole.  Prog. Language Design and Implementation '93.

Replication-Based Incremental Copying Collection Scott M. Nettles,
   James W. O'Toole, David Pierce, and Nicholas Haines.
   Proceedings of the SIGPLAN International Workshop on Memory Management,
   1992.

Author: kelvin@cs.iastate.edu (Kelvin Nilsen)
Date: 3 Jan 95 18:30:17 GMT Raw View

jgmorris@cs.cmu.edu (Greg Morrisett) writes:

>>
>>Even Baker's real-time copying algorithm may not be usefully real-time
>>without specialized hardware support like Kelvin Nilsen's.  Copying
>>collection is hard to make real-time without a significant performance
>>hit.

>I disagree.  See the work by Nettles and O'Toole.
>They show that copying garbage collection can be done incrementally,
>concurrently, and with guaranteed pause times.  The basic idea is
>to make a replica of the heap instead of destructively updating the
>current version.  Then the mutator (user's program) can access the
>old replica while the collector is building the compacted replica.
>The overheads are quite acceptable.

The Nettles and O'Toole technique shows some promise.  But I think
you are overstating their results.  Last time I checked:

1. The guaranteed pause times were not really guaranteed in the analytical
   sense.  Rather, they were guaranteed in the sense of "tweak and tune
   and measure".  There are not very many developers of hard real-time
   systems who are willing to trust this as a reliable guarantee.  At the
   very least, they would want a reliable characterization of the
   parameter sensitivities upon which the worst-case latencies depend.
   (If my application or the environment in which it runs changes a little
   bit, how much more do I have to tweak and tune and measure in order
   to reinstill confidence in its real-time behavior?)

2. Without hardware support, the guaranteed pause times are still rather
   high, at least for the techniques detailed in the published papers.
   The pause times are much too high to support many realistic real-time
   workloads.

3. One of the main performance advantages of the Nettles, O'Toole et al
   technique is that the GC bookkeeping overhead associated with every
   read operation in Baker's algorithm is eliminated and replaced with
   bookkeeping overhead associated with write operations in their algorithm.
   In an applicative language like SML, this is a big win, because reads
   are much more frequent than writes (the measurements reported so far
   are based on applicative programming styles).  However, it is not so
   clear how this technique would perform on imperative programming
   language workloads.

If you read their papers carefully, you will see that each describes
slightly different variants of an original idea.  Furthermore, you will
find that none provides analytical proofs of real-time performance.
I have corresponded with Nettles and O'Toole in an attempt to nail them
down as to the worst-case behavior of their real-time garbage collector.
After going several rounds with them, they conceded that they had really
not considered their technique to be "real-time" in the "hard real-time"
sense of the word.  Rather, they were thinking more in terms of
"interactive", in approximately the same sense that generational garbage
collectors are "real time".

By the way, I do not want to belittle the contributions of Nettles and
O'Toole.  Their algorithms are interesting and useful.  Their use of
the word "real-time" is not necessarily "wrong".  It is consistent with
the use of the word by other researchers, especially those in the
"real-time garbage collection" community.  I don't even want to state
that appropriate variants of their algorithm cannot be shown to be
"hard real-time".  I simply want to point out that as of today:

1. Their techniques have not yet been shown to be hard real-time, and
   doing so is a nontrivial problem.

2. Their techniques have not yet been empirically evaluated on workloads
   that are representative of the imperative programming world (e.g. C++).

Author: wilson@cs.utexas.edu (Paul Wilson)
Date: 3 Jan 1995 13:29:59 -0600 Raw View

In article <hbaker-0201951249410001@192.0.2.1>,
Henry Baker <hbaker@netcom.com> wrote:
>>
>> Paul is correct in saying that the known copying algorithms may not be
>> useful for many or most commercial applications labelled "real-time".
>
>I believe that Kaleida's ScriptX uses a version of my 'Treadmill'
>RT GC algorithm.  I don't know what its latencies are, but it is
>supposedly good enough for multimedia.  Although it doesn't copy, it is
>isomorphic to a copying algorithm.

Kaleida's non-copying collector (due to Wade Hennessey) is basically a copy
of ours, using a snapshot write barrier rather than an incremental update
write barrier.  (We hope to do some experiments with ours Real Soon, to
quantify the costs and benefits of both approaches.)

Our collector is, in turn, partly inspired by your treadmill algorithm,
generalized to deal with multiple object sizes, and using either Dijkstra
or Steele's incremental update write barrier (it's configurable to do
either) rather than a read barrier.  (Thomas Wang had an earlier "fake
copying" GC for C, but it wasn't real-time.)

--
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Recent papers on garbage collection, memory hierarchies, persistence and
| Scheme interpreters and compilers available via ftp from cs.utexas.edu
| (128.83.139.9), in pub/garbage or http://www.cs.utexas.edu/oops/)

Author: wilson@cs.utexas.edu (Paul Wilson)
Date: 3 Jan 1995 13:53:34 -0600 Raw View

In article <3e9k7t$suu@news.parc.xerox.com>,
John Ellis <ellis@parc.xerox.com> wrote:
>
>A clarification: The only difference in real-time behavior between
>Baker's real-time copying algorithm and a VM-synchronized copying
>algorithm is a constant factor related to the page size.  Both
>algorithms have the property that pauses between program steps are
>bounded by a constant, and both algorithms can slow the program's
>progress for long, essentially unpredictable periods of time as the
>program references data in from-space that must be copied to to-space.
>The maximum pause between individual program steps in Baker's
>algorithm is smaller, but the overall decrease in program progress is
>essentially the same.

This is true for the worst case, making either algorithm unsuitable
for most hard real-time applications.  If you only need soft real-time,
though, Baker's original copying algorithm probably comes out ahead
of the VM-assisted one, because of that constant related to the page
size.  Sensitivity to inscrutable page-granularity locality problems
makes the VM-assisted approaches significantly more likely to run into
bad cases, IMHO.

If you can accept that kind of potential slowdown, though, the Appel-Ellis-Li
algorithm has some very nice features, like being able to work with standard
compilers and achieve reasonable efficiency.

--
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Recent papers on garbage collection, memory hierarchies, persistence and
| Scheme interpreters and compilers available via ftp from cs.utexas.edu
| (128.83.139.9), in pub/garbage or http://www.cs.utexas.edu/oops/)

Author: boehm@parc.xerox.com (Hans Boehm)
Date: 4 Jan 1995 01:26:53 GMT Raw View

kelvin@cs.iastate.edu (Kelvin Nilsen) writes:
...
>3. One of the main performance advantages of the Nettles, O'Toole et al
>   technique is that the GC bookkeeping overhead associated with every
>   read operation in Baker's algorithm is eliminated and replaced with
>   bookkeeping overhead associated with write operations in their algorithm.
>   In an applicative language like SML, this is a big win, because reads
>   are much more frequent than writes (the measurements reported so far
>   are based on applicative programming styles).  However, it is not so
>   clear how this technique would perform on imperative programming
>   language workloads.

I believe their collector is closely related to the collectors described
in the 1991 PLDI paper by Demers, Shenker, and myself.  The major
differences are:

1) We only implemented the noncopying version.  They only implemented
the copying version.
2) We used a VM write barrier.  They used a compiler-implemented write
barrier.  (A VM-write barrier with the copying algorithm actually
3) We implemented the concurrent version first.  They implemented the
incremental version first.  Everybody subsequently implemented the other
variant.
4) We have a sweep phase that's not hard real-time, but irrelevant
in practice.
5) They did some other useful work that's not relevant to this discussion.
6) They did their experiments with ML.  We used Cedar and C.

Based on our experience, I would think that both approaches should also
work reasonably with imperative languages.  A compiler-implemented
write barrier should reduce collector pauses appreciably below what we
observed with dirty bits on 4K pages, though you may have to rethink
how you track updates.  Certainly even our VM-based version is a big
improvement for large Cedar images.

I don't see much hope for making either collector hard real-time.  In the
worst case, essentially all of the data structure can be mistaken for dead
until you finally stop to finish up.  (If you don't finish up atomically,
there is no termination guarantee for the collection.)  Of course, the
probability of this happening is probably less than your winning the lottery.

I think this whole idea is orthogonal to generational garbage collection.
At least in our environment, generational collection doesn't suffice
to get interactive response.  Survival rates for young objects are
too high, and you are still left with moderately frequent full collections.
(Generational collection has other benefits, of course.)  This style
of incremental/concurrent collection is sufficient, and combines well
with generational collection.  But it's only a technique for getting
good interactive response, not for getting guaranteed real-time
bounds.  There are other algorithms that guarantee that (e.g. one of
the Baker algorithms or a snapshot algorithm).

Hans-J. Boehm
(boehm@parc.xerox.com)
Standard disclaimer ...

Author: nettles+@cs.cmu.edu (Scott Nettles)
Date: 4 Jan 1995 16:12:33 GMT Raw View

In article <nagleD1tFDC.JDr@netcom.com>, John Nagle <nagle@netcom.com> wrote:

>      Many of the existing C/C++ storage allocators that don't use
>garbage collection don't have hard upper bounds on response time,
>either.

Do you or anyone know of any malloc/free implementations which do give a
firm bound?  How about ones which do so and avoid at some fragmentation
related problems?  I'd be very interested in cites.  Email would be nice,
I'm in thesis mode and I'm only reading this thread once today. Thanks!

Scott

Author: dhanley@matisse.eecs.uic.edu (David Hanley)
Date: Wed, 4 Jan 1995 17:00:55 +0000 Raw View

Henry Baker (hbaker@netcom.com) wrote:

: > Do you or anyone know of any malloc/free implementations which do give a
: > firm bound?  How about ones which do so and avoid at some fragmentation
: > related problems?  I'd be very interested in cites.  Email would be nice,
: > I'm in thesis mode and I'm only reading this thread once today. Thanks!

: I'm not aware of any malloc/free implementations which have firm bounds
: of less than O(log S), where S is the size of the storage being managed.
: This may be good enough for many purposes, since S is usually more-or-less
: constant.

        Won't a BSD allocator typically do better than this?

------------------------------------------------------------------------------
| David James Hanley  -- dhanley@lac.eecs.uic.edu  -- C++, OOD, martial arts |
| Laboratory for advanced computing      |      My employer barely KNOWS me. |
------------------------------------------------------------------------------
"There are trivial truths & there are great truths. The opposite of a
trivial truth is plainly false."
"The opposite of a great truth is also true."

-Neils Bohr

Author: nettles+@cs.cmu.edu (Scott Nettles)
Date: 4 Jan 1995 16:48:03 GMT Raw View

I think Kelvin's remarks about our work are fair and accurate, although I do
think that the fact that our work isn't real-time in the strong sense is
clear from the published papers.  We have used the term real-time in keeping
with the usage of the GC community, although now I think this is a mistake
and that the GC community should find a new term for bounded pause time
collectors which are not "real-time".  For most applications (like most
interactive programs) bounded pause times is all you need.

A few specific comments and then back to the thesis.

In article <kelvin.789157817@kickapoo>,
Kelvin Nilsen <kelvin@cs.iastate.edu> wrote:
>1. Their techniques have not yet been shown to be hard real-time, and
>   doing so is a nontrivial problem.

Agreed completely.  This is an area which I intend to push on once my thesis
is finished.

>2. Their techniques have not yet been empirically evaluated on workloads
>   that are representative of the imperative programming world (e.g. C++).

Also true, but not of as much interest to me personally. If anyone wanted to
work on this, I'd be glad to help, but my focus is almost certainly going to
stay on languages like SML and Scheme.  Also note that this doesn't impact
on the real-time question, something can be slow and still real-time. The
key issues in real-time are predictability, not speed.

Scott
nettles@cs.cmu.edu