Thread

Topic: C++0x Wish list (memory management)

Author: Hyman Rosen <hyrosen@mail.com>
Date: Fri, 28 Jun 2002 21:40:01 GMT Raw View

Garry Lancaster wrote:
> There is no resurrection problem here. The C++
> programmer who wrote the class has avoided it
> by following simple rules, just as we avoid all the
> other undefined behaviour in the language day in,
> day out. This is not rocket science.

I don't think the rules to follow will be simple.

No one seems to have responded to my finalizer proposal,
so I'll repeat it again:

struct FileCloser : GC_finalizer
{
 int fd;
 FileCloser(fd);
 void operator()() { close fd; }
};
struct FileUser
{
 FileUser(char *file) : fd(open(file))
 {
  GC_finalize(this, new FileCloser(fd));
 }
private:
 int fd;
 ~FileUser();
};
int main()
{
 new FileUser("joe.txt");
 GC_collect();
}

Finalizer functors are reachable as long as they are attached
to an uncollected object.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ryjek <ryjek@cox.net>
Date: Fri, 28 Jun 2002 21:43:34 GMT Raw View

Garry Lancaster wrote:
> Here is how to use it within the kind of GC system I
> have in mind.
>
> int main() {
>     gc_ptr<File> pFile = gc_new<File>();
>     // Calls to other member functions of File.
> }

I can't see a problem with allowing both approaches at the same time:

1) Objects allocated with "new" and leaked are garbage collected without
calling anything. This does not change the behavior of existing programs
and nicely legalizes memory leaks.

2) Objects allocated with "gc_new" have their destructor invoked upon
collection.

Why do you think it is necessary to have a special pointer for garbage
collected objects ?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 1 Jul 2002 16:51:49 GMT Raw View

Garry Lancaster wrote:
> > Here is how to use it within the kind of GC system I
> > have in mind.
> >
> > int main() {
> >     gc_ptr<File> pFile = gc_new<File>();
> >     // Calls to other member functions of File.
> > }

Ryjek:
> I can't see a problem with allowing both approaches at the same time:
>
> 1) Objects allocated with "new" and leaked are garbage collected without
> calling anything. This does not change the behavior of existing programs
> and nicely legalizes memory leaks.

This is how the Boehm et al. system works of course.

> 2) Objects allocated with "gc_new" have their destructor invoked upon
> collection.
>
> Why do you think it is necessary to have a special pointer for garbage
> collected objects ?

It's not necessary but it allows different trade-offs in how the GC
system works. One example: a GC system that works using
raw pointers has to trace through all raw pointers in the sweep
phase of a mark-and-sweep collection whereas one that
uses special GC pointers only has to trace through the special
GC pointers. The latter implies a precise or mostly-precise
approach and is not suitable for Boehm et al.'s system which,
at least usually, employs a conservative approach.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 1 Jul 2002 16:52:04 GMT Raw View

> Garry Lancaster:
> > There is no resurrection problem here. The C++
> > programmer who wrote the class has avoided it
> > by following simple rules, just as we avoid all the
> > other undefined behaviour in the language day in,
> > day out. This is not rocket science.

Hyman Rosen:
> I don't think the rules to follow will be simple.

The rule is, as I have mentioned several times now, don't
use a GC pointer in a destructor. To me this seems simpler
than the alternative solutions.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Mon, 1 Jul 2002 17:51:14 GMT Raw View

Garry Lancaster wrote:
>
> > Garry Lancaster:
> > > There is no resurrection problem here. The C++
> > > programmer who wrote the class has avoided it
> > > by following simple rules, just as we avoid all the
> > > other undefined behaviour in the language day in,
> > > day out. This is not rocket science.
>
> Hyman Rosen:
> > I don't think the rules to follow will be simple.
>
> The rule is, as I have mentioned several times now, don't
> use a GC pointer in a destructor.

Yup.

> To me this seems simpler than the alternative solutions.

It might be worth a couple of EUROs to allow use of GC
pointers [even resurrecting GC-objects; this including]
in a *pre-*destructor: a *finalizer* [with post-ctors/
resurrectors bringing back whatever was 'striped' by
corresponding pre-d-tors/finalizers in the hierarchy
of classes in the object's 'dynamic type'], I think. ;-)

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Mon, 1 Jul 2002 21:02:22 GMT Raw View

Garry Lancaster wrote:
> The rule is, as I have mentioned several times now, don't
> use a GC pointer in a destructor.

Unfortunately, this is not a local rule. A destructor
can call arbitrary functions, and you don't necessarily
know whether those functions will access GC pointers.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Wed, 26 Jun 2002 16:55:41 GMT Raw View

williamkempf@hotmail.com (William E. Kempf) wrote (abridged):
> > It doesn't have anything like SuppressDestructor() or
> > WaitForPendingDestructors().
>
> Nor should it.  Or was that your point?

My point is that, although in some places it calls them destructors, in
other places it calls them finalisers. That it sometimes calls them
finalisers is support for my claim that "finalisers" is the widely-agreed
term. That it is inconsistent suggests to me it is not a good model to
follow anyway.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Wed, 26 Jun 2002 17:00:12 GMT Raw View

glancaster@ntlworld.com (Garry Lancaster) wrote (abridged):
> This solution is more complex than either destruction
> or any finalisation system I am aware of. It's worse
> than it seems too since you haven't shown the required
> polling code, how you manage the FileManager or
> how you initialise the File and FileResource. Put them
> all together and I think it will be clearer why destructors
> are really a simpler solution.

I agree it is more code. I claim it is conceptually simpler, though,
because it avoids the various problems I have raised with destructors.


> This is basically destruction semantics without using
> destructors.

The difference is that it avoids issues with loops and resurrection. There
is less scope for undefined behaviour.

Maybe a compromise would be for GC to magically NULL out all pointers in
objects before their destructor is run?

This would be hard to implement with a conservative GC, and might fall
foul of the "don't pay for what you don't use" rule, but would be
relatively safe. Perhaps it should at least be permitted as QoI.


> I suggest that my version is clearer because it has fewer lines
> of code and works in the same way as non-GC clean-up (which also
> means it can be used outside the GC system).

I am suggesting that GC objects be allowed to have destructors. These
would work exactly the same whether GC is present or not. Thus my GC
objects can also be used outside the GC system. For example:

    int main() {
        File file( "joe.txt" );
    }

    int main() {
        File *pFile = new File( "joe.txt" );
        pFile->close();
    }

    int main() {
        File *pFile = new File( "joe.txt" );
    }

All three of these would be unaffected by GC. The first two would be
correct in that the file would be closed. The last would leave the file
handle unclosed. Your version is equivalent to my last version, except
that the file gets closed if GC is present and leaks otherwise. Thus the
correctness of your code depends on whether we have GC, which I see as an
portability issue.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Wed, 26 Jun 2002 19:20:28 GMT Raw View

In article <memo.20020626175332.58781L@brangdon.madasafish.com>,
brangdon@cix.co.uk wrote:

>I am still struggling to see what GC'd destructors /should/ be used for.
>
>They shouldn't be used to free externally visible resources, such as file
>handles, database locks, window handles etc, because other programs may be
>waiting for them to become free.

The main use would be for making sure C++ class objects are properly
deleted, probably mainly active such objects (also see below).

There is no problem to use destructors to free locks and other resources
if the GC is deterministic. If there is a need for a certain resource to
be deleted at a certain specific point in time after that the last
reference has died, one should use a special type of GC, especially
designed for that:

Perhaps one needs a special C++ construct to indicate that the destruction
should take place immediately after the last reference has gone, say that
the keyword "immediate" is applied to the destructor or reference:
  class A {
  public:
    immediate ~A() {...}
  };
or
  immediate ref<X> x(...);

If the GC chosen can't handle this, one should get a compile error if the
GC-choice is static, and an exception, if the GC-choice is dynamic
(runtime).

>They can't touch pointers to GC'd objects.
...
> Likewise
>static objects - static object have to be destroyed after the program
>terminates and before the GC'd objects they point to are collected, so
>GC'd destructors cannot touch static objects either.

This is not a problem, if the destructor checks that the references it
uses are alive before using them. In C++, one would expect that if one
tries to use a dead reference, an exception is thrown. Thus, the
destructor would need a
  try {
    ...
  } catch (std::reference<GC>& err) { ... }
clause.

> They also can't touch pointers
>to the stack, because the object might out-live the stack frame.

This is not a problem, because one would expect that the GC hides away to
the user where it puts an object. The C++ will only use a C++
"GC-reference", a new type of C++ object.

For example, an intelligent GC might put an object on the stack, but if
the stack frame dies and there are still references to it, then the object
is moved by the GC elsewhere. The GC must then be constructed so that the
user is using references that can handle such a move: Essentially a handle
that can point to both stack and heap. Also, when the object is put on the
stack, it must be able to signal its death to the GC.

>So that leaves heap-allocated objects.

And "registers", used for temporary objects: The GC can put the objects
wherever it wants. The user will be using a "GC-reference", some kind of
new C++ language construct that hides away the real location of the
object. (Think for example of a URL -- in distributed programming, the
object may not be even on the same computer.)

>GC'd destructors are so complex and dangerous,

Actually, this does not seem to be the case; one merely avoids calling the
destructor twice of any object:

The GC merely calls the destructors of objects not reachable from the root
set in an order it deems suitable. The delete request is forwarded to
subobjects which are unreachable from the root set and have not yet had
their destructor initiated.

If there is a need for that certain subobjects to be kept alive before the
main object is deleted, then one should have new C++ construct making it
possible to indicate such a relation. If one has indicated such
"keep-alive" relationships  in a loop, and the GC encounters such a
situation when calling the destructors, the GC should ideally throw an
exception, even though this is not the case of a reference count GC.

> Why should a GC'd object allocate
>and then manually delete objects from the non-GC heap? Wouldn't it be
>better to use GC for those too?
...
>I would hope there is some
>really compelling reason to add them to the language. What is it?

My guess is that if objects are composed only of passive memory resources
and references, then one will not need destructors. If one has objects
that somehow need to be active in the destruction process, then those
destructors will be needed. This could as simple as an object that wants
to signal its destruction, or a file that needs to be closed.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 27 Jun 2002 20:26:57 GMT Raw View

<Dave Harris's Proposed Solution>
    struct File;

    struct FileResource {
       int fd;
       weak_ptr<File> pFile;
    };

    struct File {
        FileResource *pResource;
        // Use pResource->fd here.
    };

    struct FileManager : private vector<FileResource> {
        void CleanUpResources() {
            for (iterator i = begin(); i != end(); ++i)
                if (i->pFile == NULL)
                    close( fd );
        }
    };
</Dave Harris's Proposed Solution>

Garry Lancaster:
> > This solution is more complex than either destruction
> > or any finalisation system I am aware of. It's worse
> > than it seems too since you haven't shown the required
> > polling code, how you manage the FileManager or
> > how you initialise the File and FileResource. Put them
> > all together and I think it will be clearer why destructors
> > are really a simpler solution.

Dave Harris:
> I agree it is more code. I claim it is conceptually simpler,
> though, because it avoids the various problems I have
> raised with destructors.

Here is the destructor based version:

class File {
    int fd;
public:
    File() : fd( open() ) {}
    ~File() { close( fd ); }
    // Other member functions use fd.
};

Here is how to use it within the kind of GC system I
have in mind.

int main() {
    gc_ptr<File> pFile = gc_new<File>();
    // Calls to other member functions of File.
}

There is no resurrection problem here. The C++
programmer who wrote the class has avoided it
by following simple rules, just as we avoid all the
other undefined behaviour in the language day in,
day out. This is not rocket science.

Are you really claiming the convoluted solution you
have presented is conceptually simpler? If you really
think so, please provide a demonstration of how to
use your GC File correctly, thus providing a complete
comparison so we can all judge for ourselves.

[snip]

> Maybe a compromise would be for GC to magically NULL out all pointers in
> objects before their destructor is run?

> This would be hard to implement with a conservative GC, and might fall
> foul of the "don't pay for what you don't use" rule, but would be
> relatively safe. Perhaps it should at least be permitted as QoI.

Indeed this is one option that I have experimented with.
Note it is only really sensible if, as I suggest, GC pointers
are differentiated from normal raw pointers, otherwise it
is too restrictive. It's the kind of thing I'd like to see
happening in my debug builds but probably not in a
release build. (If C++ were a language that otherwise
chose protecting the programmer from themselves over
efficiency, I would feel differently perhaps.)

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Fri, 21 Jun 2002 18:06:28 GMT Raw View

Dave Harris wrote:
> Agreed. And they have some of the problems I mentioned.

I have an idea for finalizers which gets around all of
the problems we've been talking about.

The collector should allow a separate finalizer function
object to be registered for an allocated object, which
will be called when the object's storage is reclaimed.
The finalizer is itself a reachable object, so if it
holds a pointer to the collectable object, the object
will never be collected. But it can be used to deallocate
resources held by the object. Like this:

struct close_file : GC_finalizer<close_file>
{
 close_file(int fd) : fd(fd) { }
 void operator()() { close(fd); }
private:
 int fd;
};

struct file_user
{
 file_user(char *file) : fd(open(file))
 { GC_register_finalizer(this, new close_file(fd)); }
private:
 int fd;
 ~file_user() { }
};

int main()
{
 new file_user("joe");
 GC_collect();
}

I would make the finalizer objects themselves reference counted.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Fri, 21 Jun 2002 18:22:54 GMT Raw View

Hyman Rosen wrote:
> I have an idea for finalizers which gets around all of
> the problems we've been talking about.

If I'm not mistaken, I believe the Boehm collector allows
something like this, which is where my idea came from.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 21 Jun 2002 19:52:23 GMT Raw View

In article <memo.20020621181414.4409A@brangdon.madasafish.com>,
brangdon@cix.co.uk wrote:

>remove.haberg@matematik.su.se (Hans Aberg) wrote (abridged):
>> [To Moderators: This is a re-post, as the copy I posted Tuesday has not
>> yet arrived.]

>Both copies arrived here, eventually. My reply to your earlier article
>(which asked what the problems were) took a day or two too. I don't think
>articles are being lost. It's probably best to be patient, at least if you
>got the RECEIVED notification.

Actually, this was a note to the moderators, so that they could check if
the old copy had appeared and not post the new one if that was the case.
There appeared to be a bottle-neck somehow, because for while nothing was
posted, and in a flush there was a batch of them. Sometimes such a note
can help the moderators to detect such a bottleneck, but it can be
difficult for them to check what has been posted and what has not.

>> The reference count is still deterministic, as one knows exactly when
>> the objects are deleted, but on the other hand, the destruction does
>> not follow what is directly induced by the C++ syntax. So
>> reference counts are something in-between C++'s destructor model and
>> a fully non-deterministic GC.
>
>Agreed. And they have some of the problems I mentioned.
>
>(1) There is still no good way to handle failures in destructors, so your
>example of a destructor which flushes a buffer to disk is dubious.

The example is not dubious because one good use of references is that a
source, like a file, that one does not want or cannot to be copied should
be shared in different parts of the program.

It is simply a good question how thrown exceptions should be handled
properly. -- There is always the possibility of terminating the program
which is a crude option, but nothing wrong in it itself. The question is
how to refine it.

>(2) We still have the resurrection issue.

I do not really know what the resurrection is used for. -- If it is only
to make quick fixes, perhaps resurrection is not needed. (My impression is
that resurrection somehow arise via sloppily written Java classes, so I
did not follow that part. I have no need for it myself. But if somebody
wants to re-explain the issue, please go ahead.)

>(3) At least now destructors will be called in a timely way.

This is feature that may be wanted, but yet again may be unnecessary for
some applications. So if a fully non-deterministic GC can be more
efficient, one should be able to select that.

>(4) And we no longer depend on whether the implementation supports GC.

When I use reference counts, the whole program is designed around that
fact. So one is still going to need how the GC is working.

>(5) We still have the problem of loops.

That is one reason one wants to go beyond reference counts. Another might
be efficiency reasons: Some say reference counts are slow.

>The improvements in (3) and (4) mean that reference counted destructors
>can safely be used for some simple things.
>
>
>>   template<class A, class GC = ...>

It is not difficult to implement this in C++, but C++ still needs some
features to make its use simpler.

>Allowing for several different kinds of GC policy will create more
>opportunities for incompatibilities between code which uses different
>policies. Too many standards is no standard at all.

And C++ is the language of opportunities. :-)

You will not get away with a single GC anyway, because some code will
define its own "operator new", which will be hard-coded into it. Then it
is linked as a library with code using different standards.

The same thing will happen with the GC question: So I think one will need
to figure out how to make several GC's to co-exist. Then this probably
puts extra requirements on the GC's, so that not just any standard GC will
have success in C++ without extra adaptation.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 21 Jun 2002 19:52:33 GMT Raw View

In article <3D134FF6.2040902@cox.net>, Ryjek <ryjek@cox.net> wrote:
>> All objects belonging to a certain GC must have time stamps. If the GC
>> discovers a batch of objects to be destroyed, then it should run the
>> destructor of the oldest object, which in turn may trigger new destructors
>> to be called. When that destructor sequences has terminated, the process
>> is repeated until all objects has been deleted.

>Unfortunately nothing forces the original code to destroy objects in the
>order opposite to their construction. Consider this code:

>struct A {
>     B *b_;
>     A(B*b) : b_(b) { b_->a_ = this; }
>};

>struct B {
>     A *a_;
>     B() a_(0) {};
>     ~B() { delete a_; }
>};

>B *b = new B;
>A *a = new A(b);

>Since a was constructed after b, in your scheme it will be destroyed
>before b, leading to a problem in the B's destructor. As a result,
>adding GC to the above code will introduce undefined behavior where
>previously there was none. Similar configurations are very likely to
>appear in more complex data structures.

Actually, if you read carefully what I wrote, the oldest object should be
destroyed first, which is B.

But after I wrote it, I realized that the true destruction process is more
complicated: If one should follow the pattern of the global C++ objects,
then one should first seek out the roots in the reference tree (that is,
first essentially finding the connected components), and then destroy
perhaps the youngest object among these roots objects.

Then your example uses standard C++ pointers, which can be confusing:

In another post I used a ref<X> class. It means that one does not merely
delete a dumb C pointer, but one gets something like (following your
example above)
  class B {
    ref<A> a_;
  public:
    B() a_(0) {};
    ~B() { delete a_; }
  };
Now, the "delete a_" line would not attempt to delete the object pointed
to directly but via the class ref<A>, which will pass the problem over to
the GC. The GC then has greater chance to do things such as consider
whether the delete request should be handed over to the object itself, or
should be disregarded. -- This is what the reference count system is able
to do.

Thus, it may not be a problem to have pointers to dead objects in a class
in a destruction process if the GC is designed to detect that.

One can still not expect to write just any code, but as one get quite some
way with the reference count system, it seems that one should be able to
design some system.

-- I will have to think more about this, but it seems one could achieve
something playing around with these ideas.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ryjek <ryjek@cox.net>
Date: Fri, 21 Jun 2002 20:50:59 GMT Raw View

Hans Aberg wrote:
> In article <3D134FF6.2040902@cox.net>, Ryjek <ryjek@cox.net> wrote:
>
>>>All objects belonging to a certain GC must have time stamps. If the GC
>>>discovers a batch of objects to be destroyed, then it should run the
>>>destructor of the oldest object, which in turn may trigger new destructors
>>>to be called. When that destructor sequences has terminated, the process
>>>is repeated until all objects has been deleted.
>>
>
>>Unfortunately nothing forces the original code to destroy objects in the
>>order opposite to their construction. Consider this code:
>
>
>>struct A {
>>    B *b_;
>>    A(B*b) : b_(b) { b_->a_ = this; }
>>};
>
>
>>struct B {
>>    A *a_;
>>    B() a_(0) {};
>>    ~B() { delete a_; }
>>};
>
>
>>B *b = new B;
>>A *a = new A(b);
>
>
>>Since a was constructed after b, in your scheme it will be destroyed
>>before b, leading to a problem in the B's destructor. As a result,
>>adding GC to the above code will introduce undefined behavior where
>>previously there was none. Similar configurations are very likely to
>>appear in more complex data structures.
>
>
> Actually, if you read carefully what I wrote, the oldest object should be
> destroyed first, which is B.

Yeah, I got everything backwards. What I really wanted to say is that
the order of construction of objects on the heap does not imply the
order in which they should be collected. Let me write my example again,
but slightly changed:

struct A {
     B *b_;
     A(B*b) : b_(b) { b_->a_ = this; }
     ~A() { delete b_; }
};

struct B {
     A *a_;
     B() a_(0) {};
};

B *b = new B;
A *a = new A(b);

Now b is the oldest object and if you collect b first then A's
destructor will try to delete a nonexisting object.

> But after I wrote it, I realized that the true destruction process is more
> complicated: If one should follow the pattern of the global C++ objects,
> then one should first seek out the roots in the reference tree (that is,
> first essentially finding the connected components), and then destroy
> perhaps the youngest object among these roots objects.
>
> Then your example uses standard C++ pointers, which can be confusing:

In order for the GC to be used successfully, it should be capable of
collecting standard C++ pointers. Otherwise you will find all existing
APIs to be unusable with garbage-collected objects.

> In another post I used a ref<X> class. It means that one does not merely
> delete a dumb C pointer, but one gets something like (following your
> example above)
>   class B {
>     ref<A> a_;
>   public:
>     B() a_(0) {};
>     ~B() { delete a_; }
>   };
> Now, the "delete a_" line would not attempt to delete the object pointed
> to directly but via the class ref<A>, which will pass the problem over to
> the GC. The GC then has greater chance to do things such as consider
> whether the delete request should be handed over to the object itself, or
> should be disregarded. -- This is what the reference count system is able
> to do.

The order of invocation of destructors *is* important and C++ code
depends on that order in too many places. With GC there is no way to
ensure the order.

I can extend your example with a modified destructor:

~B() {
     globalFunc( a_->getSomething() );
     delete a_;
}

Now the call to globalFunc results in an error. Note that in order to
maintain program correctness you would have to avoid referencing
anything outside of the object being deleted. The same applies to
finalizers btw.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Sat, 22 Jun 2002 00:34:59 GMT Raw View

Hans Aberg wrote:
> This will not guarantee that all programmed code is safe, but it will make
> it possible to write GC code that calls destructors and is safe.

You are heaping complexity on complexity, and I still don't
understand what the user code is going to look like. Here's
some code where a pair of objects reference each other in
destructors. What should happen?

struct B;
struct A { B *b; void notify() { } ~A(); };
struct B { A *a; void notify() { } ~B(); };
A::~A() { b->notify(); }
B::~B() { a->notify(); }
int main()
{
 A *a = new A; B *b = new B;
 a->b = b;     b->a = a;
}

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Sat, 22 Jun 2002 00:36:14 GMT Raw View

Dave Harris:
> > > We should keep a clear distinction between "finaliser"
> > > and "destructor". There are two concepts here which need
> > > separate names. They are not synonyms.

Garry Lancaster:
> > This seems to be a very commonly held opinion.
> > The only snag is there is no general agreement
> > over what the clear distinction might be.

Dave Harris:
> The distinction I refer to is that one of them destroys the object and the
> other doesn't. In C++ the thing which destroys is the destructor. Perhaps
> you would like to propose some term in mind for the other thing? I
> recommend we use "finaliser".

The problem is that the terms destructor and finaliser
are overloaded already by their uses in other languages
and systems, some of which you may not initially have
been aware of. On a C++ newsgroup I think we are all
happy that destructor means a C++ destructor, but the
situation for finaliser is not so clear.

I think I understand the distinction you want to make.
But since you are actually suggesting a GC without
any form of automatic clean-up function, we probably
don't need to pick a term at all I guess.

> > > The easiest solution is for GC never to run finalisers or
> > > destructors, or any code at all on an object which has
> > > become unreachable.
> >
> > If you are prepared to be that restrictive on what
> > programmers can do you can solve the problem
> > for that GC system. But that doesn't necessarily
> > mean the problem is solved throughout your
> > program since other systems will be needed to fulfil
> > the needs you haven't satisfied with your restricted
> > GC system.
>
> Agreed. A couple of other mechanisms have been
> suggested: onexit(), and weak pointers. I think both
> of these are needed anyway.
>
> What problems do you see GC destructor/finalisers
> solving that cannot be solved using these other means?

onexit isn't standard: do you mean atexit? This is fine
if we are happy with keeping everything allocated until
program termination. Not so good if we want something
more aggressive.

Weak pointers? These are not an alternative to GC
but an adjunct to it. If the normal GC pointers cannot
control objects that do automatic clean-up neither can
the weak pointers. (The way a weak pointer works, or
at least should work, is that it yields a, possibly null,
GC pointer to the thing it points to.)

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net





---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sat, 22 Jun 2002 17:06:07 GMT Raw View

In article <3D135B67.8000503@mail.com>, Hyman Rosen <hyrosen@mail.com> wrote:

>> This will not guarantee that all programmed code is safe, but it will make
>> it possible to write GC code that calls destructors and is safe.
>
>You are heaping complexity on complexity, and I still don't
>understand what the user code is going to look like.

I will have think these over, even though I think that by playing around
with the ideas, one might arise at a solution. -- Thus, I will have to
leave you to find your own solutions, just indicating some possible ideas.
:-)

> Here's
>some code where a pair of objects reference each other in
>destructors. What should happen?
>
>struct B;
>struct A { B *b; void notify() { } ~A(); };
>struct B { A *a; void notify() { } ~B(); };
>A::~A() { b->notify(); }
>B::~B() { a->notify(); }
>int main()
>{
>        A *a = new A; B *b = new B;
>        a->b = b;     b->a = a;
>}

First, replace with pseudo-code references, so we get away from the
traditional C/C++ dumb pointers:

  struct B;
  struct A { ref<B> b; void notify() { } ~A(); };
  struct B { ref<A> a; void notify() { } ~B(); };
  A::~A() { b->notify(); }
  B::~B() { a->notify(); }
  int main()
  {
    ref<A> a(new A);
    ref<B> b(new B);
    a->b = b;
    b->a = a;
  }

Now, it will hinge on how you design your GC: For example, your references
could be URL's. What happens when you experience a dead link on the net?
-- Your program may indicated an error to you, but the program continues
running. Sometimes this may be acceptable, but if the resource referenced
is essential for the programming running, the program must be taken down.

So the question of what will happen when trying to delete dead links will
depend on what type of resource and what type of program you are using.
The main thing will be that C++ can accommodate such uses.

Then it depends on the type of GC you are using. In the example above, GC
will happen sometime after main() has terminated.

If you use the traditional C++ syntax imposition of "delete", then as at
the point main terminates, objects will be destroyed in the reverse order
they were created, so first the reference to the B object should be
destroyed, and then the reference to the A object.

When you destroy the B object, its destructor will now execute first(?)
a->notify(), which will do nothing, and then
    delete a;
which will be handled by the GC ref<A> object. Then the GC will have to
decide what to about that. If the GC is intelligent, it may discover that
the A object now has no B reference, as the B object has started to
execute its destructor, and thus should be considered dead.

But the A object still has a reference from the C++ syntax deletion model.
So therefore it should probably decide that the A object should be kept
alive. Therefore the A::~A should not be executed at this point time.

After the B object has been deleted, the C++ syntax deletion model
proceeds with deleting the A object. Then you execute b->notify(), a sign
of poorly written code and the program crashes. -- You should have made a
check that the object is alive before accessing it. So here, C++ may need
an extension of some sort in order to properly handle attempted access to
dead objects, for example:
  try {
    b->notify();
  } catch (std::reference_error<B>& r) { ... }

Disregarding this problem of access to dead objects, the C++ syntax
deletion model will proceed with the deletion of A contained objects, that
is,
    delete b;
which will be handled by the GC ref<A> object. The GC must the be able to
check that the B object is dead, which means that the "delete" request
should not be executed.

Now turn to the question of a non-deterministic GC: After main() has
terminated, the GC will find that both the A and B objects are not
reachable from the root set. Say it decides to first delete B. Then I skip
the problem with the execution of notify(), because that is no problem if
the code is properly written. So we arrive at
    delete a;
which will be handed over to the GC ref<A> object. The GC can, for
example, decide that the delete request should be handed over to the a
object, as it is not reachable from the root set. Then, when deleting the
A object, one will arrive at
    delete b;
which will be handed over to the GC ref<B> object. This object then
notices that the B object is dead, in view of that its destructor has
started to execute. So here the delete forwarding process stops.

After this, the GC may have to rescan for unreachable objects in the batch
found, but then both A and B will be marked dead, as their destructors
have been initiated (and terminated) executing.

So this seems to work then, at least as an outlined idea.

In addition to the stuff above, one may want to impose an order of
deleting, in order to arrive at something similar to that of the
destruction of global objects.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 24 Jun 2002 01:05:37 GMT Raw View

> Hans Aberg wrote:
> >>>All objects belonging to a certain GC must have time stamps. If the GC
> >>>discovers a batch of objects to be destroyed, then it should run the
> >>>destructor of the oldest object, which in turn may trigger new
destructors
> >>>to be called. When that destructor sequences has terminated, the
process
> >>>is repeated until all objects has been deleted.

This approach is known as topologically ordered
destruction/finalisation. It is not a perfect solution, for the
reasons Ryjek has mentioned, but it is useful enough that
it is an option provided by the Boehm et al. collector. See:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html

[snip further discussion of topologically ordered destruction]

Ryjek:
> In order for the GC to be used successfully, it should be capable of
> collecting standard C++ pointers. Otherwise you will find all existing
> APIs to be unusable with garbage-collected objects.

This is true for those APIs that create or destroy objects internally,
but for the large set of APIs that don't there is no problem, provided
temporary raw pointers can be obtained from GC pointers. For
example there is no problem with this:

void foo(gc_ptr<foo> f)
{
    // f.get() obtains a temporary raw pointer.
    // IAmAnAPICall doesn't create or delete its argument.
    IAmAnAPICall( f.get() );
}

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 24 Jun 2002 01:05:46 GMT Raw View

Dave Harris <brangdon@cix.co.uk> wrote in message
news:memo.20020621144855.29569B@brangdon.madasafish.com...
> allan_W@my-dejanews.com (Allan W) wrote (abridged):
Dave Harris:
> > > Part of the job of GC is to avoid dangling pointers, and it
> > > cannot do that if it runs destructors.

Allan W.
> > Unfair. That same problem exists today without GC.
> >
> >     myClass *myObject = 0;
> >     void CallMeEveryTenSeconds() {
> >         if (myObject) myObject->doSomething();
> >     }
> >     myClass::~myClass() {
> >         // Keep using this object after destruction!
> >         myObject = this;
> >     }
> >
> > Isn't this asking for trouble, GC or not?

Dave Harris:
> Yes. The point remains: GC is no longer able to do its job. The guarantees
> offered by GC are weakened. This is not such a problem for systems without
> GC, because they don't even try to offer those guarantees.
>
> Reasonable men may differ about what "the job" of GC is, and I suppose
> that is what is happening here.

I think this is a fair summary. Because I know there is no
perfect solution to resurrection with destructors or any other
kind of automatic clean-up function (and I don't think I could
accept the limitations of a GC system without this) it doesn't
seem to me as if GC could guarantee one. I also think we
are a little guilty of making assumptions about the kind of GC
we are talking about, as Gerald Squelart has pointed out.

A solution where GC pointers and raw pointers co-exist
and are easily differentiated for example, but not necessarily,
using a GC smart pointer, makes writing GC-safe destructors
- ones that will not cause resurrection - a lot easier than a GC
where there is no differentiation. Furthermore, since no existing
code will be using the new GC pointers, no existing destructor
code will be broken by the advent of GC.

My impression is that those who are most against running
destructors or some kind of finaliser are thinking of a system
where raw pointers and GC pointers are undifferentiated,
as with the Boehm et al. system. In that case the problems
are very clear: the usually tolerable play-it-safe rule that you
should never access GC pointers within a destructor becomes
very hard to enforce, in the worst case meaning you can't
access *any* pointers, since you don't know which are GC
controlled and which are not. This is too restrictive and it
affects existing code.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Mon, 24 Jun 2002 01:06:22 GMT Raw View

glancaster@ntlworld.com (Garry Lancaster) wrote (abridged):
> > C++ is the only language I know with destructors.
>
> As I previously mentioned, C# has them and they
> behave like what you call finalisers. So now you know
> of at least one other.

I looked into this. It turns out that C# calls them finalisers too. That
is, Object, the root class which all types inherit from, has a method
called Finalize(). This is the finaliser called by the garbage collector
just before the object is reclaimed. If you want to free resources, you
have to override the Finalise() method.

There is a twist. I quote from Archer and Whitechapel's "Inside C#":

    Let's be clear on this. The C# language doesn't strictly
    support destructors. On the other hand, it does support
    overriding the /Object.Finalize/ method. The only twist in
    the tale is that to override /Finalise/, you must write a
    method that's syntactically identical to a destructor.

So I suppose we are both right. Microsoft sometimes pretends finalisers
are destructors. I don't see this as an argument against the "finaliser"
terminology, partly because deep down Microsoft knows what they really
are, and partly because C# is surely too muddled to be a good model to
follow.

Given that it has finalisation, the underlying GC system seems pretty
good. It has methods like SuppressFinalize(), ReRegisterForFinalize(),
WaitForPendingFinalizers(), Dispose(), a special interface for
disposable objects, two kinds of weak pointer, as well as Finalize()
itself. I am sure all this complexity is needed. If we want finalisation,
we could do worse than copy Microsoft's core system.

It doesn't have anything like SuppressDestructor() or
WaitForPendingDestructors().

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 24 Jun 2002 02:24:53 GMT Raw View

In article <3D139014.6040101@cox.net>, Ryjek <ryjek@cox.net> wrote:

> Let me write my example again,
>but slightly changed:

I will have to think these things over, and return some other time.

>> Then your example uses standard C++ pointers, which can be confusing:
>
>In order for the GC to be used successfully, it should be capable of
>collecting standard C++ pointers. Otherwise you will find all existing
>APIs to be unusable with garbage-collected objects.

This is what I think will happen: Code will have to be rewritten for use
with various types of specialized GC's. If one should be able to replace
the traditional "operator new" directly with a GC, so that old code can be
run, that will only happen in some special cases of old C++ code and
special types of GC's.

Thus the GC-collectable pointer should be a new C++ object I think: It
could be a reference, then also admitting a null reference.

Then dead references is not a big problem, because the GC can take of it.
Note also that in say distributed programming, a URL might be viewed as a
distributed reference, and it is quite common to allow dead references in
such cases.

Thus, the dead reference question should not be much of problem with a
proper C++ syntax and properly designed GC's.

Also, the pseudo-code syntax I used for reference counts that I used
    ref<std::ofstream> ofs(new std::ofstream("out.txt"));
is insufficient, because the
    new std::ofstream("out.txt")
part also belongs to the GC. -- The reference count GC is special n that
it operates on top of the "operator new" memory allocation. But this is
slow, so in a full GC implementation, one should be able to avoid it.

>The order of invocation of destructors *is* important and C++ code
>depends on that order in too many places. With GC there is no way to
>ensure the order.

Again, this one additional reason that I think that old C++ code will have
to be rewritten so that it can work with certain classes of GC's: If the
code should be run with a totally non-deterministic GC, then the code must
be written so that the destructor order does not matter in the cases that
the GC may do different choices.

This may not be big problem, because the C++'s GC supporting semantic will
be such that it helps it up as much as possible. -- After all, exceptions
and threading requires C++ code to be rewritten, and that will probably
happen with GC, too.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: 24 Jun 2002 02:30:18 GMT Raw View

glancaster@ntlworld.com (Garry Lancaster) wrote (abridged):
> But since you are actually suggesting a GC without
> any form of automatic clean-up function, we probably
> don't need to pick a term at all I guess.

I'm not at all confident of winning that argument :-)

I suspect we're going to end up with an immensely complex system, similar
to Microsoft's C#, only with more potential for undefined behaviour. It'll
be difficult to design, implement, test and teach. And no quality programs
will actually use it.


> onexit isn't standard: do you mean atexit? This is fine
> if we are happy with keeping everything allocated until
> program termination. Not so good if we want something
> more aggressive.

In that case we should not use finalisers, because they might be delayed
until the program termination anyway.

But the basic idea here is not keep "everything" allocated. Just to keep
track of things which need to release resources. Atexit() is the resource
manager of last resort.


> Weak pointers? These are not an alternative to GC
> but an adjunct to it.

Of course. I am not looking for an alternative to GC here. I am looking
for alternatives to finalisation.


> If the normal GC pointers cannot control objects that do
> automatic clean-up neither can the weak pointers. (The way a
> weak pointer works, or at least should work, is that it
> yields a, possibly null, GC pointer to the thing it points to.)

Weak-pointers include a mechanism for discovering when an object has been
reclaimed. When it has been reclaimed, its resources can be freed. For
example:

    struct File;

    struct FileResource {
       int fd;
       weak_ptr<File> pFile;
    };

    struct File {
        FileResource *pResource;
        // Use pResource->fd here.
    };

    struct FileManager : private vector<FileResource> {
        void CleanUpResources() {
            for (iterator i = begin(); i != end(); ++i)
                if (i->pFile == NULL)
                    close( fd );
        }
    };

Hyman Rosen had a similar idea. He provided the GC with a separate
call-back object, which gives a more timely release and avoids the need
for a polling function like CleanUpResources(). I am not too bothered
about the details.

Both this scheme and atexit() have in common that they both register the
object with some kind of table which manages resources explicitly. I see
them as working together. We should probably register CleanUpResources()
with atexit(), for example.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 24 Jun 2002 15:47:10 GMT Raw View

Dave Harris:
> > > C++ is the only language I know with destructors.

Garry Lancaster:
> > As I previously mentioned, C# has them and they
> > behave like what you call finalisers. So now you know
> > of at least one other.

Dave Harris:
> I looked into this. It turns out that C# calls them finalisers too.

Not really. If you look at the relevant ECMA standards C#
(ECMA-334) calls them destructors and, as I pointed out
previously, these map to .net (strictly "Common Language
Infrastructure") finalizers (ECMA-335).

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 24 Jun 2002 15:47:35 GMT Raw View

Garry Lancaster:
> > But since you are actually suggesting a GC without
> > any form of automatic clean-up function, we probably
> > don't need to pick a term at all I guess.

Dave Harris:
> I'm not at all confident of winning that argument :-)
>
> I suspect we're going to end up with an immensely complex
> system, similar to Microsoft's C#, only with more potential for
> undefined behaviour. It'll be difficult to design, implement, test
> and teach. And no quality programs will actually use it.

I agree that good GC systems are difficult to design,
implement and test. This is typically because they have
to do some very efficient handling of multiple threads of
execution, rather than the basic GC algorithms being at
all complex. (A not-particularly-efficient single-threaded GC
is, in contrast, not difficult.) As far as I know there is no
reason that a destructor-calling GC is significantly more
difficult to implement than a finaliser-calling one.

I am not a teacher, but my experience as a normal
programmer is that the GC in C# greatly simplifies
program design and implementation. I would imagine
that would make the system straightforward to teach.
The main thing it doesn't have that I consider important
for C++ is C++-like destruction.

> > onexit isn't standard: do you mean atexit? This is fine
> > if we are happy with keeping everything allocated until
> > program termination. Not so good if we want something
> > more aggressive.
>
> In that case we should not use finalisers, because they
> might be delayed until the program termination anyway.

Might be but won't be forced to be: with most GC algorithms
the object can be collected at any point after it becomes
unreachable, so many objects end up collected before
program termination (and the longer the program runs,
the truer this becomes). That's an important practical
difference.

There is no reason I can see that atexit-managed destruction
shouldn't be viewed as a degenerate form of GC and possibly
even one that standard requirements should permit. But it
certainly isn't going to win any awards.

[snip]

> > Weak pointers? These are not an alternative to GC
> > but an adjunct to it.
>
> Of course. I am not looking for an alternative to GC here. I am looking
> for alternatives to finalisation.
>
>
> > If the normal GC pointers cannot control objects that do
> > automatic clean-up neither can the weak pointers. (The way a
> > weak pointer works, or at least should work, is that it
> > yields a, possibly null, GC pointer to the thing it points to.)
>
> Weak-pointers include a mechanism for discovering when an object has been
> reclaimed. When it has been reclaimed, its resources can be freed. For
> example:
>
>     struct File;
>
>     struct FileResource {
>        int fd;
>        weak_ptr<File> pFile;
>     };
>
>     struct File {
>         FileResource *pResource;
>         // Use pResource->fd here.
>     };
>
>     struct FileManager : private vector<FileResource> {
>         void CleanUpResources() {
>             for (iterator i = begin(); i != end(); ++i)
>                 if (i->pFile == NULL)
>                     close( fd );
>         }
>     };

This solution is more complex than either destruction
or any finalisation system I am aware of. It's worse
than it seems too since you haven't shown the required
polling code, how you manage the FileManager or
how you initialise the File and FileResource. Put them
all together and I think it will be clearer why destructors
are really a simpler solution.

> Hyman Rosen had a similar idea. He provided the GC
> with a separate call-back object, which gives a more
> timely release and avoids the need for a polling function
> like CleanUpResources(). I am not too bothered about
> the details.

This is basically destruction semantics without using
destructors. His example would be much simpler just
written like so:

struct UsesFileDescriptor
{
int fd;
UsesFileDescriptor(char *file) : fd(open(file)) {}
~UsesFileDescriptor() { close( fd ); }
};
int main()
{
new UsesFileDescriptor("joe.txt");
GC_collect();
}

And with a destructor-calling GC, would behave in
exactly the same way. I suggest that my version is clearer
because it has fewer lines of code and works in the same
way as non-GC clean-up (which also means it can be used
outside the GC system).

[snip]

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 24 Jun 2002 15:49:29 GMT Raw View

In article <TwhR8.11242$L86.1738405@news11-gui.server.ntli.net>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:

>>All objects belonging to a certain GC must have time stamps. If the GC
>>discovers a batch of objects to be destroyed, then it should run the
>>destructor of the oldest object, which in turn may trigger newdestructors
>>to be called. When that destructor sequences has terminated, the process
>>is repeated until all objects has been deleted.

>This approach is known as topologically ordered
>destruction/finalisation. It is not a perfect solution, for the
>reasons Ryjek has mentioned, but it is useful enough that
>it is an option provided by the Boehm et al. collector.

>From the perhaps superficial analyzing I made in another post, such an
ordering does not seem to be required for a GC that is invoking
destructors, but more of a way to convenience the programmer. Instead, the
main requirement seems to be that the GC carefully keeps track of what has
initiated its destructor, in order to make sure it isn't applied twice.

Then objects must be written relative the type of GC in question so that
there is not a problem with the destruction order. Imposing conditions
such as topological ordering might then help the programmer with writing
for destruction order interdependence that would otherwise not be
possible.

But I do not know if that is really needed with properly written objects:

For example, with the reference count GC, there is not really a problem
with that the destruction order is not ordered in any particular way, if
the code is properly written. Instead, the effect of the reference count
GC is that the destruction request forwarding terminates if an object
should be kept alive because others are referencing it.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "William E. Kempf" <williamkempf@hotmail.com>
Date: Mon, 24 Jun 2002 15:50:46 GMT Raw View

"Dave Harris" <brangdon@cix.co.uk> wrote in message
> So I suppose we are both right. Microsoft sometimes pretends finalisers
> are destructors. I don't see this as an argument against the "finaliser"
> terminology, partly because deep down Microsoft knows what they really
> are, and partly because C# is surely too muddled to be a good model to
> follow.

Not that I think the .NET stuff is necessarily the shining star we should
look to for inspiration here, but there's more to this then just C# (what
ever you think of that language).  MC++ (managed C++, the .NET bridge for
C++) translates destructors to finalizers as well.

class __gc Foo
{
public:
   ~Foo() { ... } // this is the .NET GC finalizer
};

The big difference between this and, say the Boehm collector, is that only
__gc classes can be allocated on the GC heap (and that's the only place they
can be allocated).  Thus the danger of coding an improper
destructor/finalizer is vastly reduced, because you know that in the above
code it's always a finalizer and subject to those rules.  I could live with
such a system, though it means more work to wrap non-gc types if you need to
put them on the GC heap.

> Given that it has finalisation, the underlying GC system seems pretty
> good. It has methods like SuppressFinalize(), ReRegisterForFinalize(),
> WaitForPendingFinalizers(), Dispose(), a special interface for
> disposable objects, two kinds of weak pointer, as well as Finalize()
> itself. I am sure all this complexity is needed. If we want finalisation,
> we could do worse than copy Microsoft's core system.

There are some novel things they've included here that I've not seen in
other systems, so looking at this is surely worth while.  I don't have the
knowledge or experience to say whether or not what they've done is "good
enough" for a C++ standard, but I'd love to hear some discussion on this.
It may be that MS does have a model that solves most of the problems people
are concerned about.

> It doesn't have anything like SuppressDestructor() or
> WaitForPendingDestructors().

Nor should it.  Or was that your point?

Bill Kempf

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 20 Jun 2002 17:29:52 GMT Raw View

On this question what a GC might call, destructors, finalizers, whatever,
let me bring in an explicit example, a reference count GC:

The reference count is still deterministic, as one knows exactly when the
objects are deleted, but on the other hand, the destruction does not
follow what is directly induced by the C++ syntax. So reference counts are
something in-between C++'s destructor model and a fully non-deterministic
GC.

This may then help the analyzing when trying to figure out how a fully
non-deterministic GC should work in C++.

So assume that we have general C++ reference class
  template<class A, class GC = ...>
  class ref {
    ...
  };
admitting different types of GC's; the default GC is a reference count class.

The syntax when using the ref class would be something like:
  ref<std::ofstream> ofr1; // Becomes the 0 reference.
  ...
  { // open a file in a local C++ context:
    ref<std::ofstream> ofr2(new std::ofstream("out.txt"));
    ...
    ofr1 = ofr2; // Let ofr1 reference the same file.
    ...
  } // ofr2 gets out of context and its reference is deleted; as the file
    // is still referenced by ofr1, it remains open.
  ...
  return ofr1; // Keep file open if somebody bothers to take care
               // of the return reference.

It might be more complicated, by including referenced data in a class:
  class A {
    ref<std::ofstream> ofr_; // Maintain an output file.
  public:
    A() : ofr_(new std::ofstream("out.txt")) {}
    ...
    ref<std::ofstream> operator*() {   return ofr_; }
  };
Which then in turn may be used as a reference:
  ref<std::ofstream> ofr; // 0 reference
  {
    ref<A> ar(new A());  // Create a reference to an object A().
       ...
    ofr = *ar; // Set ofr to reference file that reference of object ar
               // is referencing.
  } // Object that ar is referencing goes out of scope, as not any longer
    // referenced by anyone; hence kill it.
    // However, the file that this ar referenced object is
    // referencing is now referenced by ofr; thus this file should
    // still be kept alive.

To sum up: An object of type A creates a reference to an open file. Then a
reference of this file may be handed over to object external to the object
of type A, and thus the file should be kept alive even if all reference to
it via the original object of type A is destroyed.

This makes the keeping track of the destruction process complicated. But
the reference count system can keep track of it, and the open file should
eventually have its destructor called when all references to it are
destroyed.

Now replace the reference count GC with a wholly non-deterministic GC: If
this GC should have any success in C++, it should be able to do at least
what one can do with reference counts, plus the ability to determine which
objects should be removed say by tracing the reachable elements from the
root system.

What one can say, it seems me though, is that if this GC is determining
which objects that are reachable from the root system, that way labelled
alive, then the GC must be somewhat more careful when deleting the objects
it finds are not reachable:

If I follow the example of the reference count GC above, then, when an
object is found to not be reachable (like the A object above), it should
have its destructor called. If this object has another ref<X> object in
its definition, then the destructor signal is sent to the ref<X> object,
which will in its turn determine what to do:

First one should consider objects that have had their destructors starting
running to be dead (like the A object above containing the ref<X> object).
After that, if this ref<X> object is found to be alive, then it should not
have its destructor run, but if it not reachable, then it should have its
destructor initiated.

This is at least what seem to happen in the reference count GC example.

One should note that if the GC is doing a batch search from the root set,
then it is not clear that one can start running the destructor at any
object at will. In the example above, it would be if the GC found that
both the A object and the std::ofstream object to not be reachable; which
destructor should one start running, that of the A object or the
std::ofstream object:

If one starts running the std::ofstream destructor, then when one starts
running the A destructor, it will reference an object which already has
had its destructor being run, and thus is dead. Thus, one must either
ensure that objects are written so that it is OK for not potentially dead
GC object to reference dead objects.

Or, one might introduce a notion of "locally reachable objects", which
would mean that among the potentially dead objects, one first runs the
destructors on the objects that are not referenced by other potentially
dead objects (i.e., the roots in the reference tree among the objects that
the GC finds are not reachable from the root set). This then starts to
become a bit complicated, and further does not have the capacity to remove
loops.

But both models may be important: If say the A object references an open
file, and the A destructor writes some stuff into this file before
attempting to close this possibly multiply referenced file, then it will
be imperative that the file is not closed before the object A runs its
destructor (unless the A destructor code is specially written in order to
handle that case).

But if one should adhere to this model, one must have a way to tell how
the destructors should be run for objects in a loop.

My conclusions:
- C++ should probably have just the ordinary destructors, at least for
regular user classes.
- If there will be "finalizers", then perhaps that is a tool needed for
the implementing GC's (just a hunch).
- Not just any GC can have success in a C++ environment: It must be
carefully written to be able to understand the C++ destructor model.
- Special attention must be given to the order destructors should be
called in the case a set of unreachable objects are discovered by the GC
via a batch search.

Also note that one can reword some ideas like "GC classes should have no
destructors" to "for some special types of GC's, the destructor must be
trivial", etc. Then that translates basically into saying that the
destruction of an object can only involve GC maintained resources (like
memory) but not object related resources (like the closing of a file). --
That seems to be too restrictive for general C++ programming, though, as
one can do better even with  a reference count GC.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Thu, 20 Jun 2002 17:30:22 GMT Raw View

Alexander Terekhov wrote:

> Garry Lancaster wrote:
>
>>Dave Harris:
>>
>>>We should keep a clear distinction between "finaliser"
>>>and "destructor". There are two concepts here which need
>>>separate names. They are not synonyms.


    No, they're not.  And if they're treated
as such, a whole new class of subtle, intermittent bugs will be
created.


>>This seems to be a very commonly held opinion.
>>The only snag is there is no general agreement
>>over what the clear distinction might be.


    I've been arguing that whatever a GC system
invokes should be viewed as a "resetter".
It can be called more than once, and might
not get called at all.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Thu, 20 Jun 2002 17:31:47 GMT Raw View

remove.haberg@matematik.su.se (Hans Aberg) wrote (abridged):
> Could some experts explain why one should not call the destructor when
> using a non-deterministic GC?
> [...]
> It then seems that in this process, the destructor std::~ofstream should
> be called, in order to make sure that the file is being closed. It then
> does not look like there is any difference between "destructors" and
> "finalizers".

May I ask if you think that would be good style?

Running user code from GC raises a few issues. The first is, what happens
if the code throws an exception? I imagine we will end up in
std::terminate(). In other words, once you have dropped the last reference
to your stream, your program is liable to be closed down at any moment,
without warning, and irrespective of any no-throw regions. Doesn't this
make you feel uneasy?

Secondly, what happens if the destructor (or code called from the
destructor) creates a reachable reference to the stream? As the stream is
now reachable, it shouldn't be collected. But we already started the
destructor! Part of the job of GC is to avoid dangling pointers, and it
cannot do that if it runs destructors.

Thirdly, you are presumably creating the file for someone else to read.
How long will they have to wait before they can start reading? The close
could be delayed indefinitely.

Fourthly, we are proposing that GC be optional, yet the semantics of your
code depends on whether GC is present. If GC is not present, the stream
will never be properly closed at all. Wouldn't it be better to write code
that works portably across all versions of the C++0x standard?


> Code that decides to not use the new C++ GC oriented reference construct
> would be unaffected.

Even so, there is a cost in language complexity. Why should compiler
vendors have to spend time on this, which will only help people who use
the feature, when they could be working on features (such as
optimisations) which will benefit everyone?

I suspect there will also be a run-time cost. And I'll always have a
sneaking suspicion that even though my code does not use the feature, some
library code does, and I'll have to sort out the problems.

In my view the burden of proof is on the advocates of GC-calls-destructors
to demonstrate that the benefits out weigh the costs. I find it hard to
see any benefit at all. What code /can/ safely be put into a GC'd
destructor?

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Thu, 20 Jun 2002 17:31:59 GMT Raw View

glancaster@ntlworld.com (Garry Lancaster) wrote (abridged):
> > We should keep a clear distinction between "finaliser"
> > and "destructor". There are two concepts here which need
> > separate names. They are not synonyms.
>
> This seems to be a very commonly held opinion.
> The only snag is there is no general agreement
> over what the clear distinction might be.

The distinction I refer to is that one of them destroys the object and the
other doesn't. In C++ the thing which destroys is the destructor. Perhaps
you would like to propose some term in mind for the other thing? I
recommend we use "finaliser".


> > The easiest solution is for GC never to run finalisers or
> > destructors, or any code at all on an object which has
> > become unreachable.
>
> If you are prepared to be that restrictive on what
> programmers can do you can solve the problem
> for that GC system. But that doesn't necessarily
> mean the problem is solved throughout your
> program since other systems will be needed to fulfil
> the needs you haven't satisfied with your restricted
> GC system.

Agreed. A couple of other mechanisms have been suggested: onexit(), and
weak pointers. I think both of these are needed anyway.

What problems do you see GC destructor/finalisers solving that cannot be
solved using these other means?

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Thu, 20 Jun 2002 17:32:00 GMT Raw View

remove.haberg@matematik.su.se (Hans Aberg) wrote (abridged):
> Could some experts explain why one should not call the destructor when
> using a non-deterministic GC?

I'm sorry - I missed an issue from my previous post. I'll just add it
here.

Fifthly, how can we prevent the stream's buffer being destroyed before the
stream's destructor uses it? Although there is a pointer from stream to
buffer, this pointer is not enough to prevent the buffer from being
collected because the stream itself is unreachable. Also, it is possible
(in general) for the buffer to have a pointer back to the stream. There
might be a complicated network of links between several objects. Indeed,
it is when ownership is unclear/complex that we most want to use GC. How
is the GC supposed to know which object it should destroy first?

Finally I should say that I am not claiming to be an expert here. Not many
real experts had replied to date, so I thought I'd write something anyway.
Also, I don't claim these 5 issues are insurmountable problems. They do
complicate the model, though, and they need to be addressed by GC
proposals.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 20 Jun 2002 17:32:51 GMT Raw View

[To Moderators: This is a re-post, as the copy I posted Tuesday has not
yet arrived.]

On this question what a GC might call, destructors, finalizers, whatever,
let me bring in an explicit example, a reference count GC:

The reference count is still deterministic, as one knows exactly when the
objects are deleted, but on the other hand, the destruction does not
follow what is directly induced by the C++ syntax. So reference counts are
something in-between C++'s destructor model and a fully non-deterministic
GC.

This may then help the analyzing when trying to figure out how a fully
non-deterministic GC should work in C++.

So assume that we have general C++ reference class
  template<class A, class GC = ...>
  class ref {
    ...
  };
admitting different types of GC's; the default GC is a reference count class.

The syntax when using the ref class would be something like:
  ref<std::ofstream> ofr1; // Becomes the 0 reference.
  ...
  { // open a file in a local C++ context:
    ref<std::ofstream> ofr2(new std::ofstream("out.txt"));
    ...
    ofr1 = ofr2; // Let ofr1 reference the same file.
    ...
  } // ofr2 gets out of context and its reference is deleted; as the file
    // is still referenced by ofr1, it remains open.
  ...
  return ofr1; // Keep file open if somebody bothers to take care
               // of the return reference.

It might be more complicated, by including referenced data in a class:
  class A {
    ref<std::ofstream> ofr_; // Maintain an output file.
  public:
    A() : ofr_(new std::ofstream("out.txt")) {}
    ...
    ref<std::ofstream> operator*() {   return ofr_; }
  };
Which then in turn may be used as a reference:
  ref<std::ofstream> ofr; // 0 reference
  {
    ref<A> ar(new A());  // Create a reference to an object A().
       ...
    ofr = *ar; // Set ofr to reference file that reference of object ar
               // is referencing.
  } // Object that ar is referencing goes out of scope, as not any longer
    // referenced by anyone; hence kill it.
    // However, the file that this ar referenced object is
    // referencing is now referenced by ofr; thus this file should
    // still be kept alive.

To sum up: An object of type A creates a reference to an open file. Then a
reference of this file may be handed over to object external to the object
of type A, and thus the file should be kept alive even if all reference to
it via the original object of type A is destroyed.

This makes the keeping track of the destruction process complicated. But
the reference count system can keep track of it, and the open file should
eventually have its destructor called when all references to it are
destroyed.

Now replace the reference count GC with a wholly non-deterministic GC: If
this GC should have any success in C++, it should be able to do at least
what one can do with reference counts, plus the ability to determine which
objects should be removed say by tracing the reachable elements from the
root system.

What one can say, it seems me though, is that if this GC is determining
which objects that are reachable from the root system, that way labelled
alive, then the GC must be somewhat more careful when deleting the objects
it finds are not reachable:

If I follow the example of the reference count GC above, then, when an
object is found to not be reachable (like the A object above), it should
have its destructor called. If this object has another ref<X> object in
its definition, then the destructor signal is sent to the ref<X> object,
which will in its turn determine what to do:

First one should consider objects that have had their destructors starting
running to be dead (like the A object above containing the ref<X> object).
After that, if this ref<X> object is found to be alive, then it should not
have its destructor run, but if it not reachable, then it should have its
destructor initiated.

This is at least what seem to happen in the reference count GC example.

One should note that if the GC is doing a batch search from the root set,
then it is not clear that one can start running the destructor at any
object at will. In the example above, it would be if the GC found that
both the A object and the std::ofstream object to not be reachable; which
destructor should one start running, that of the A object or the
std::ofstream object:

If one starts running the std::ofstream destructor, then when one starts
running the A destructor, it will reference an object which already has
had its destructor being run, and thus is dead. Thus, one must either
ensure that objects are written so that it is OK for not potentially dead
GC object to reference dead objects.

Or, one might introduce a notion of "locally reachable objects", which
would mean that among the potentially dead objects, one first runs the
destructors on the objects that are not referenced by other potentially
dead objects (i.e., the roots in the reference tree among the objects that
the GC finds are not reachable from the root set). This then starts to
become a bit complicated, and further does not have the capacity to remove
loops.

But both models may be important: If say the A object references an open
file, and the A destructor writes some stuff into this file before
attempting to close this possibly multiply referenced file, then it will
be imperative that the file is not closed before the object A runs its
destructor (unless the A destructor code is specially written in order to
handle that case).

But if one should adhere to this model, one must have a way to tell how
the destructors should be run for objects in a loop.

My conclusions:
- C++ should probably have just the ordinary destructors, at least for
regular user classes.
- If there will be "finalizers", then perhaps that is a tool needed for
the implementing GC's (just a hunch).
- Not just any GC can have success in a C++ environment: It must be
carefully written to be able to understand the C++ destructor model.
- Special attention must be given to the order destructors should be
called in the case a set of unreachable objects are discovered by the GC
via a batch search.

Also note that one can reword some ideas like "GC classes should have no
destructors" to "for some special types of GC's, the destructor must be
trivial", etc. Then that translates basically into saying that the
destruction of an object can only involve GC maintained resources (like
memory) but not object related resources (like the closing of a file). --
That seems to be too restrictive for general C++ programming, though, as
one can do better even with  a reference count GC.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_W@my-dejanews.com (Allan W)
Date: Thu, 20 Jun 2002 19:46:32 CST Raw View

brangdon@cix.co.uk (Dave Harris) wrote
> what happens if the destructor (or code called from the
> destructor) creates a reachable reference to the stream? As the stream is
> now reachable, it shouldn't be collected. But we already started the
> destructor! Part of the job of GC is to avoid dangling pointers, and it
> cannot do that if it runs destructors.

Unfair. That same problem exists today without GC.

    myClass *myObject = 0;
    void CallMeEveryTenSeconds() {
        if (myObject) myObject->doSomething();
    }
    myClass::~myClass() {
        // Keep using this object after destruction!
        myObject = this;
    }

Isn't this asking for trouble, GC or not?

Remember that (ignoring the "disguised pointer" issue) GC is only
supposed to delete the object when there were NO remaining pointers
or references to the object. That means that nobody was going to do
ANYTHING to it. If the only possible place to "resurrect" the object
is in the destructor, then the code is pretty goofy already. GC might
change the symptoms some, but not the cause of the problem.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 21 Jun 2002 15:37:11 GMT Raw View

In article <3D0E51DB.7020902@mail.com>, Hyman Rosen <hyrosen@mail.com> wrote:

>Hans Aberg wrote:
>> Could some experts explain why one should not  call
>> the destructor when using a non-deterministic GC?

>Because objects hold pointers to other objects,
>and you could access an already destructed object.

>struct A { int a; A(int a) : a(a) { } };
>struct B { A *p; B(A *p) p(p) { } ~B() { ++p->a; } };
>int main(int c, char **v) { new B(new A(c)); }

>In this code, if A is collected before B, B's destructor
>will access garbage.

The thing is that reference counts can handle such situations. Viewing
this as a type of GC, a "reference count GC", one would expect that C++
should be able to handle such GC's, even though it may be the case that
some GC's, in the name of efficiency, may not admit it. (Also see my other
post, treating reference counts as GC's, which now has appeared.)

I have an idea for how to resolve this problem:

All objects belonging to a certain GC must have time stamps. If the GC
discovers a batch of objects to be destroyed, then it should run the
destructor of the oldest object, which in turn may trigger new destructors
to be called. When that destructor sequences has terminated, the process
is repeated until all objects has been deleted.

I go this idea when reading the thread "Order of destruction across
files": C++ guarantees that global objects be destroyed in the reverse
order they are created. My suggestion is merely an extension of this to
dynamic objects controlled by a GC.

This will not guarantee that all programmed code is safe, but it will make
it possible to write GC code that calls destructors and is safe.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Fri, 21 Jun 2002 15:37:47 GMT Raw View

allan_W@my-dejanews.com (Allan W) wrote (abridged):
> > Part of the job of GC is to avoid dangling pointers, and it
> > cannot do that if it runs destructors.
>
> Unfair. That same problem exists today without GC.
>
>     myClass *myObject = 0;
>     void CallMeEveryTenSeconds() {
>         if (myObject) myObject->doSomething();
>     }
>     myClass::~myClass() {
>         // Keep using this object after destruction!
>         myObject = this;
>     }
>
> Isn't this asking for trouble, GC or not?

Yes. The point remains: GC is no longer able to do its job. The guarantees
offered by GC are weakened. This is not such a problem for systems without
GC, because they don't even try to offer those guarantees.

Reasonable men may differ about what "the job" of GC is, and I suppose
that is what is happening here. I certainly agree this is not a killer
objection to GC-destructors. Especially as there are other ways to create
dangling pointers:

    Item *p = new Item;
    delete p;
    p->doSomething();

Will we still be allowed to write code like this for GC'd items? If so,
then the GC cannot do its job anyway. Resurrection is just one more
loop-hole.

My feeling is that, even if resurrection is not a problem in practice, it
complicates the model in theory. It is a kind of catch-22. The object dies
only if there are no references to it, but killing it creates a reference.
If this contradiction gets into the language, it will consume the time of
students and teachers of C++ forever.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 21 Jun 2002 15:36:55 GMT Raw View

In article <memo.20020619232849.1523G@brangdon.madasafish.com>,
brangdon@cix.co.uk wrote:
>> It then seems that in this process, the destructor std::~ofstream should
>> be called, in order to make sure that the file is being closed....
...
>Running user code from GC raises a few issues.

On these issues, one should first note that it is never possible in a
language to implement features that guarantees that the output is safe.
This is even more true so in C++, where one wants to be able to write code
that, in the name of efficiency, is safe only by the efforts of the
programmer. What one wants in C++, is to be able to build increasingly
safe high level features, ideally so that one, if optimization is needed
in certain places, can crawl down the structurization ladder.

>Finally I should say that I am not claiming to be an expert here. Not many
>real experts had replied to date, so I thought I'd write something anyway.
>Also, I don't claim these 5 issues are insurmountable problems. They do
>complicate the model, though, and they need to be addressed by GC
>proposals.

Right, I also do not think that these problems are insurmountable, but
complications: I think that C++ should admit "high level" GC's which admit
not only control over memory resources or a very limited set of resources,
but a more general semantic set of resources, as though controlled via
C++'s destructors.

> The first is, what happens
>if the code throws an exception? I imagine we will end up in
>std::terminate(). In other words, once you have dropped the last reference
>to your stream, your program is liable to be closed down at any moment,
>without warning, and irrespective of any no-throw regions. Doesn't this
>make you feel uneasy?

This is an interesting comment: I figure that the problem is only if the
std::~ofstream is throwing an exception while the GC is trying to deleted
it. If the program should not terminate, some of the code currently
executing must catch it.

The function parameter stack at GC time will, in effect, be: The regular
code so far executing at the bottom, then the GC parameters are instead on
top of that; and of top of that any code initiated by the destructors of
the deleted object. When a destructor is throwing an exception, it might
be possible for another destructor on top of that to catch it.

But if that does not happen, this suggests that one should be able to
attack "catch" phrases to the GC itself: Then the GC can catch the
exception, and return to normal execution.

If the GC is not able to catch the exception, then the GC execution is
randomly inserted into the other code, but it may still be inserted at
certain localized programming regions. So if one admits that GC time can
only happen in certain specified programming regions, then those
programming regions can catch the exceptions that the GC or destructor
cannot handle.

>Secondly, what happens if the destructor (or code called from the
>destructor) creates a reachable reference to the stream? As the stream is
>now reachable, it shouldn't be collected. But we already started the
>destructor! Part of the job of GC is to avoid dangling pointers, and it
>cannot do that if it runs destructors.

(Also see my other post.) One of my ideas is that one extends the C++
destructor behavior of global objects to GC controlled objects:

All objects belonging to a certain GC must have time stamps (i.e., the GC
must be able to somehow keep track of the creation order). If the GC
discovers a batch of objects to be destroyed, then it should run the
destructor of the oldest object, which in turn may trigger new destructors
to be called. When that destructor sequences has terminated, the process
is repeated until all objects has been deleted.

The programmer must program against this model in order to make sure the
resulting code is safe. But this still opens up possibilities of using
destructors in connection with GC's.

Thus, in your example, the programmer must be sure to create references
that are safely handled by the programming model that C++ is eventually
going to use.

>Thirdly, you are presumably creating the file for someone else to read.
>How long will they have to wait before they can start reading? The close
>could be delayed indefinitely.

This is not really a big problem: In the example, on preemptive
multitasking OS, which includes all PC's by now, it is possible several
users can open the same file and manipulate it. By contrast, this is not
possible on more simple OS's.

If one wants to make sure that the file is closed as soon as the last
reference is removed, then one can use say a reference count GC.

So the answer to this point is that it depends in part of type of resource
one is using, how it is implemented on a particular platform, and what
type of GC one is using. If one needs more control over when an object is
going to be removed, then one should use a GC that admits that, but there
might be a penalty in performance.

>Fourthly, we are proposing that GC be optional, yet the semantics of your
>code depends on whether GC is present. If GC is not present, the stream
>will never be properly closed at all. Wouldn't it be better to write code
>that works portably across all versions of the C++0x standard?

The thing is that if one is implementing say a computer language (for
example, a functional one) or something else that requires certain dynamic
behavior, then one ends up on the equation that one must be guaranteed of
a certain GC behavior.

Therefore, the picture I am working is where one writes new code that is
attached to a GC, where the GC has a certain specified behavior.

I do not at all believe in the idea that GC's should be somehow
"automatically" inserted into old C++ code in a way that old code suddenly
works automatically with a wide range of GC's. This does not need to mean
that rewriting old code in a GC safe way need to be difficult or complex:
Compare this with the situation of writing exception or thread safe code.

>Fifthly, how can we prevent the stream's buffer being destroyed before the
>stream's destructor uses it?

I indicated this above by the idea of using time stamps on GC controlled
objects. One then cannot program code in just any way and expect it to be
GC-safe, but this opens up the possibility of writing quite some GC safe
code. Or so I hope. :-)

>> Code that decides to not use the new C++ GC oriented reference construct
>> would be unaffected.
>
>Even so, there is a cost in language complexity. Why should compiler
>vendors have to spend time on this, which will only help people who use
>the feature, when they could be working on features (such as
>optimisations) which will benefit everyone?

It is really the increased need for dynamic programming that drives it,
which in its turn hangs together with the fact that generic programming
techniques saves increasingly expensive programming time on the expense of
increasingly cheaper computer time and memory.

>In my view the burden of proof is on the advocates of GC-calls-destructors
>to demonstrate that the benefits out weigh the costs. I find it hard to
>see any benefit at all. What code /can/ safely be put into a GC'd
>destructor?

The benefit shows up for example when implementing a functional language.
One way to do that is to make use of C++'s class construction features to
create the function closures. This is very convenient, because C++
transparently puts temporary objects into "registers" in a way that the
programmer does not need to worry about it.

But if one is doing that, current C++ does not give any way to attach
those objects to a GC and then have them conveniently destroyed.
Therefore, one ends up on the equation to write explicit code for those
temporary object, and one can just as well switch to C, as the whole C++
user class construction idea goes down the drain.

One might think that a whole functional language does not affect you
really: But if you use say the Bison parser generator to first create a
simple calculator, and then wants to have some functions that can admit
variables whose values are assigned later, bang you are there, creating
such function closures! In C++, this is not so difficult, but right now
the only possibility is to use a reference count.

Thus, if one should go beyond the reference count GC, this is needed also
on a quite elementary programming level.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ryjek <ryjek@cox.net>
Date: Fri, 21 Jun 2002 16:52:55 GMT Raw View

Hans Aberg wrote:
> In article <3D0E51DB.7020902@mail.com>, Hyman Rosen <hyrosen@mail.com> wrote:
>>Hans Aberg wrote:
>>
>>>Could some experts explain why one should not  call
>>>the destructor when using a non-deterministic GC?
>>
>>Because objects hold pointers to other objects,
>>and you could access an already destructed object.
>
>>struct A { int a; A(int a) : a(a) { } };
>>struct B { A *p; B(A *p) p(p) { } ~B() { ++p->a; } };
>>int main(int c, char **v) { new B(new A(c)); }
>
>>In this code, if A is collected before B, B's destructor
>>will access garbage.
>
> I have an idea for how to resolve this problem:
>
> All objects belonging to a certain GC must have time stamps. If the GC
> discovers a batch of objects to be destroyed, then it should run the
> destructor of the oldest object, which in turn may trigger new destructors
> to be called. When that destructor sequences has terminated, the process
> is repeated until all objects has been deleted.

Unfortunately nothing forces the original code to destroy objects in the
order opposite to their construction. Consider this code:

struct A {
     B *b_;
     A(B*b) : b_(b) { b_->a_ = this; }
};

struct B {
     A *a_;
     B() a_(0) {};
     ~B() { delete a_; }
};

B *b = new B;
A *a = new A(b);

Since a was constructed after b, in your scheme it will be destroyed
before b, leading to a problem in the B's destructor. As a result,
adding GC to the above code will introduce undefined behavior where
previously there was none. Similar configurations are very likely to
appear in more complex data structures.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Fri, 21 Jun 2002 17:18:32 GMT Raw View

remove.haberg@matematik.su.se (Hans Aberg) wrote (abridged):
> [To Moderators: This is a re-post, as the copy I posted Tuesday has not
> yet arrived.]

Both copies arrived here, eventually. My reply to your earlier article
(which asked what the problems were) took a day or two too. I don't think
articles are being lost. It's probably best to be patient, at least if you
got the RECEIVED notification.


> The reference count is still deterministic, as one knows exactly when
> the objects are deleted, but on the other hand, the destruction does
> not follow what is directly induced by the C++ syntax. So
> reference counts are something in-between C++'s destructor model and
> a fully non-deterministic GC.

Agreed. And they have some of the problems I mentioned.

(1) There is still no good way to handle failures in destructors, so your
example of a destructor which flushes a buffer to disk is dubious.

(2) We still have the resurrection issue.

(3) At least now destructors will be called in a timely way.

(4) And we no longer depend on whether the implementation supports GC.

(5) We still have the problem of loops.

The improvements in (3) and (4) mean that reference counted destructors
can safely be used for some simple things.


>   template<class A, class GC = ...>

Allowing for several different kinds of GC policy will create more
opportunities for incompatibilities between code which uses different
policies. Too many standards is no standard at all.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Mon, 17 Jun 2002 21:15:23 GMT Raw View

Hans Aberg wrote:

> Could some experts explain why one should not call the destructor when
> using a non-deterministic GC? -- Here is an example:
....
> It then seems that in this process, the destructor std::~ofstream should
> be called, in order to make sure that the file is being closed. It then
> does not look like there is any difference between "destructors" and
> "finalizers".


     This why we need to make a distinction between destructors and
finalizers, assuming we allow GC.  Even the people posting here
don't all get it.

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Tue, 18 Jun 2002 16:45:11 GMT Raw View

Dave Harris:
> We should keep a clear distinction between "finaliser"
> and "destructor". There are two concepts here which need
> separate names. They are not synonyms.

This seems to be a very commonly held opinion.
The only snag is there is no general agreement
over what the clear distinction might be.

> C++ is the only  language I know with destructors.

As I previously mentioned, C# has them and they
behave like what you call finalisers. So now you know
of at least one other.

Garry Lancaster:
> > For example in Boehm et. al system, where what you call
> > a finalizer typically calls a destructor.
>
> You mean like:
>
>     void Object::finalise() {
>         delete this;
>     }
> as a choice of the programmer? Or do you mean that Boehm's
> system calls destructors directly instead of finalisers?

As his system is C-based, I believe there must be a
choice. If you want to know for sure how it works, read
the docs.

[snip]

> The easiest solution
> is for GC never to run finalisers or destructors, or any
> code at all on an object which has become unreachable.

If you are prepared to be that restrictive on what
programmers can do you can solve the problem
for that GC system. But that doesn't necessarily
mean the problem is solved throughout your
program since other systems will be needed to fulfil
the needs you haven't satisfied with your restricted
GC system.

It is worth pointing out that, all other things being equal,
for those objects with trivial destructors, a GC system that
permits non-trivial destructors is always going to be just
as safe as one that doesn't. The true safety comparison
is really between non-GC objects with non-trivial
destructors (your only choice if your GC system doesn't
permit non-trivial destructors) and GC objects with
non-trivial destructors.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Tue, 18 Jun 2002 18:17:13 GMT Raw View

Garry Lancaster wrote:
>
> Dave Harris:
> > We should keep a clear distinction between "finaliser"
> > and "destructor". There are two concepts here which need
> > separate names. They are not synonyms.
>
> This seems to be a very commonly held opinion.
> The only snag is there is no general agreement
> over what the clear distinction might be.

In my rather UNEDUCATED opinion:

a) finalizer(s)/pre-d-tor(s) can resurrect 'this' object
   (and/or other non-this objects they have access to),
   but d-tors can't resurrect 'this';

b) resurrector(s)/post-c-tor(s) can easily 'undo' whatever
   was 'done' by corresponding finalizer(s)/pre-d-tor(s)
   [but should better NOT 'undo' resurrection(s) ;-)], but
   nothing can 'undo' or stop object destruction once it's
   started;

c) there might be infinite number of resurrection cycles
   manifested via finalizer(s)/pre-d-tor(s) and
   resurrector(s)/post-c-tor(s) invocation chain(s), but
   construction and destruction could happen only once.

> > C++ is the only  language I know with destructors.
>
> As I previously mentioned, C# has them and they
> behave like what you call finalisers. So now you know
> of at least one other.

Well, but is it really worth even mentioning it? To me,
it's the same 'quality' as:

http://groups.google.com/groups?threadm=3D0B3459.7A7D7998%40web.de
(Subject: Re: C# threading model?)

[...]
> > The easiest solution
> > is for GC never to run finalisers or destructors, or any
> > code at all on an object which has become unreachable.
>
> If you are prepared to be that restrictive on what
> programmers can do you can solve the problem
> for that GC system. But that doesn't necessarily
> mean the problem is solved throughout your
> program since other systems will be needed to fulfil
> the needs you haven't satisfied with your restricted
> GC system.
>
> It is worth pointing out that, all other things being equal,
> for those objects with trivial destructors, a GC system that
> permits non-trivial destructors is always going to be just
> as safe as one that doesn't. The true safety comparison
> is really between non-GC objects with non-trivial
> destructors (your only choice if your GC system doesn't
> permit non-trivial destructors) and GC objects with
> non-trivial destructors.

Yup; and the only difference [in addition to sometimes-less-
expensive garbage 'detection'] is that GC system can be really
helpful to 'easily handle' resurrection and all kinds of deps.
cycles w.r.t. GC objects, AFAICS.

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Thu, 20 Jun 2002 17:25:15 GMT Raw View

John Levon wrote:

> Hans Aberg wrote:
>
>> In article <Yq%N8.18694$sC4.2529865@news6-win.server.ntlworld.com>,
>> "Garry
>> Lancaster" <glancaster@ntlworld.com> wrote:
>>
>>> Just to check we understand some common GC terms.
>>>
>>> conservative: assumes any data that might be a pointer
>>> is a pointer. Boehm et al.'s collector works conservatively.
>>
>>
>> In my mind a "conservative GC" is merely a GC that collects the dead
>> objects at a point in time it determines suitable.



> This is at odds with the standard terminology as described by Garry. In
> fact, I have never seen anything use your terminology. Please don't
> muddy the waters with non-standard terminology - this area has enough
> problems with terminology already.


     Agreed.

     See the Memory Management Glossary at

http://www.memorymanagement.org/glossary/c.html#conservative.garbage.collection

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Thu, 20 Jun 2002 17:28:36 GMT Raw View

Hans Aberg wrote:
> Could some experts explain why one should not  call
> the destructor when using a non-deterministic GC?

Because objects hold pointers to other objects,
and you could access an already destructed object.

struct A { int a; A(int a) : a(a) { } };
struct B { A *p; B(A *p) p(p) { } ~B() { ++p->a; } };
int main(int c, char **v) { new B(new A(c)); }

In this code, if A is collected before B, B's destructor
will access garbage.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Sat, 15 Jun 2002 16:25:53 GMT Raw View

Garry Lancaster:
> > Object  A is not reachable and contains a pointer to object B.
> > Object B is reachable and contains a pointer that doesn't
> > initially point to A.
> >
> > A is not reachable so is finalized. It's finalizer sets the
> > pointer within object B to point to A.
> >
> > B's pointer now points to the bag of bits which used
> > to be object A.

Dave Harris:
> No; A is still a proper object even after its finaliser has been run. This
> is the crucial difference between a finaliser and a destructor.
>
> Destructors destroy. They change the dynamic type of their object
> successively from most-derived to base, and end up with raw memory. To use
> an object after it has been destroyed yields undefined behaviour. The
> vtable is no longer there.
>
> Finalisers don't do that. Finalisation doesn't change the type of the
> object. It's still well-defined, at least at the language level, to call
> virtual functions of a finalised object. It may even be finalised twice or
> more times.

I think you will find that this is not generally true. For
example in Boehm et. al system, where what you call
a finalizer typically calls a destructor. And in C#, where
a destructor behaves like what you call a finalizer. Your
definitions are very Java-centric.

> > As far as I know this can happen in Java, C# or with
> > the Boehm et al. collector.
>
> Java behaves as above.

OK. I accept that I have a C++-centric view. In Java, I
believe it is as you say. It doesn't solve the underlying
problem of resurrection for the programmer though.
It's a bit like "solving" the problem of burglary by legalising
it (and expecting anyone at risk to fit iron bars to their
windows).

> Some objects may be written to not-work after they've been finalised, but
> that's a choice of the object's author. For example:
>
>     void File::read( void *buf, int len ) {
>         assert( isOpen() );
>         // ...
>     }
>
>     void File::finalise() {
>         if (isOpen()) {
>             close();
>         }
>     }
>
> so that calling read() after finalise() does no good. However, this isn't
> much different to calling read() after a manual close(). It's an issue of
> the class design. Many of Java's library classes are, alas, poorly written
> with regard to finalisation. That's not the language's fault, though.

It's not a language defect in a strictly defined sense
because the behaviour is specified and the specification
is self-consistent. Whether it's a desirable behaviour is
another question, but as I wrote earlier, resurrection is
an issue that all GC systems struggle with. I don't think
there is a perfect solution.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Sat, 15 Jun 2002 16:26:06 GMT Raw View

Garry Lancaster:
> >Just to check we understand some common GC terms.
> >
> >conservative: assumes any data that might be a pointer
> >is a pointer. Boehm et al.'s collector works conservatively.

Hans Aberg:
> In my mind a "conservative GC" is merely a GC that collects the dead
> objects at a point in time it determines suitable.
>
> This might be done say by tracing the root system, marking down the live
> objects, then removing those not live. (And if objects does not have
> finalizers, one can just forget about them.)
>
> The conservative GC still needs to know what is a pointer and what is not.
> The compiler can aid in this by supplying information of where the root
> system is located, which current C++ does not. It can also help by
> supplying information about which pointer to trace, and which one should
> not trace, also not provided by C++.
>
> Therefore, the Boehm conservative GC goes in on a layer between the C++
> compiler and the OS to find the root system, and trace the pointers. But
> this is not very efficient and one cannot target the GC to just certain
> classes of objects.
>
> -- This is what I thought. Have I misunderstood something?

I'm afraid so. You have misunderstood what is meant
by  "conservative". Check out the definitions page of David
Chase's GC FAQ:

http://www.iecc.com/gclist/GC-algorithms.html

Most of the other points you have raised on this thread
concerning GC are unaffected: they are still valid.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Sat, 15 Jun 2002 16:26:53 GMT Raw View

Dave Harris wrote:
[...]
> Destructors destroy. They change the dynamic type of their object
> successively from most-derived to base, and end up with raw memory. To use
> an object after it has been destroyed yields undefined behaviour. The
> vtable is no longer there.

Bingo!

> Finalisers don't do that. Finalisation doesn't change the type of the
> object. It's still well-defined, at least at the language level, to call
> virtual functions of a finalised object. It may even be finalised twice or
> more times.

Yup -- after each resurrection... and 'resurrectors' of all 'already finalized'
classes in the object's dynamic type hierarchy should basically 'undo' whatever
was done by the corresponding 'finalizers'; that has nothing to do with object
destruction and construction. Post-constructors and pre-destructors RULE! ;-)

> > As far as I know this can happen in Java, C# or with
> > the Boehm et al. collector.
>
> Java behaves as above.

Nah, Java is 'brain-dead' all around it [e.g. it also lacks post[1]/pre stuff],
sorry. ;-)

regards,
alexander.

[1] http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1040.html
    ("> or, even better: ....")

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sat, 15 Jun 2002 16:26:39 GMT Raw View

In article <yKiO8.14030$gm5.2287816@news11-gui.server.ntli.net>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:
>> Then in your example, the finalizer of object A sends a request to object
>> B to set its pointer to object A. Then object B must be able to determine
>> whether this is safe or not, discovering that A is finalizing.
>
>So, you are banning finalizers from calling member
>functions of any other object? Or is there some more
>subtle criteria for determining the safety of the call?

The object methods, which are the valid "requests" then, should be
developed so that calling them are safe. Usually, when one breaks
"safeness", it depends on a too liberal view of what manipulations that
are admitted of an object.

In your case, to admitted object A to, "imperative" style, to go into the
interiors of object B as it pleases, namely to set a pointer to itself,
despite the fact it has already started finalizing. It looks as though
this problem can be solved that B first checks where the object it is
asked to first check whether the object exists. When doing so, it will
discover that A is finalizing, which suggests that one should not allow,
in general, on the user level, to set references to objects that are
finalizing (or probably have had their destructors initiated running).

Then, again, it suggests that one designs a high-level type of references
that are safe in this respect: They check that the object pointed to
exists, and objects that have had their destructors and/or finalizers
initiated are objects that are safe anymore, as parts of their expected
structure is in the process of being destroyed. So, when discovering that,
this type of reference should throw an exception, as the problem probably
isn't detectable statically via the compiler.

You would probably not always want to use this type of reference, as it
isn't effective: For example, if the reference is a URL (and supporting
distributed programming is a task of the next C++ revision), then one
would have to check the existence of the object over the Internet, which
takes time.

So one ends up on a tradeoff between safeness and effectiveness, and C++
should usually be able to supply a wide range of possible tradeoff
solutions in this respect.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sat, 15 Jun 2002 21:16:16 GMT Raw View

In article <_UCO8.69$Oj3.18589@news8-gui.server.ntli.net>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:
>> -- This is what I thought. Have I misunderstood something?
>
>I'm afraid so. You have misunderstood what is meant
>by  "conservative". Check out the definitions page of David
>Chase's GC FAQ:
>
>http://www.iecc.com/gclist/GC-algorithms.html
>
>Most of the other points you have raised on this thread
>concerning GC are unaffected: they are still valid.

Yes, when I used the word "conservative", I really meant (when speaking
about my own ideas) "non-deterministic" as opposed to C++ deterministic
collection.

-- I misunderstood the "conservative" of the Boehm collector:

In my mind, I thought that a collector would scan the root set, tracing
forward like in Tarjan's SCC algorithm (which is determining the strongly
connected components (SCC's) of a graph):
  http://www1.ics.uci.edu/~eppstein/161/960220.html
(This algorithm scans each edge exactly once, which is the minimum if one
would want to determine the reachable data exactly. -- This is the most
effective way to compute FIRST and FOLLOW used in parsing algorithms,
which is how I encountered it.)

For this to work, one would need to know the root set, and from that, with
respect to a chosen GC, which pointers to trace forward. (For example, if
the "pointer" is a URL, one would normally not trace it.)

I see now the point of Boehm's conservative GC. -- My hunch is that such
techniques will be too crude for C++, as one then cannot blend garbage
collection techniques.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Levon <moz@compsoc.man.ac.uk>
Date: Sun, 16 Jun 2002 05:48:14 GMT Raw View

Hans Aberg wrote:
> In article <Yq%N8.18694$sC4.2529865@news6-win.server.ntlworld.com>, "Garry
> Lancaster" <glancaster@ntlworld.com> wrote:
>>Just to check we understand some common GC terms.
>>
>>conservative: assumes any data that might be a pointer
>>is a pointer. Boehm et al.'s collector works conservatively.
>
> In my mind a "conservative GC" is merely a GC that collects the dead
> objects at a point in time it determines suitable.

This is at odds with the standard terminology as described by Garry. In
fact, I have never seen anything use your terminology. Please don't
muddy the waters with non-standard terminology - this area has enough
problems with terminology already.

regards
john

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Sun, 16 Jun 2002 05:49:30 GMT Raw View

glancaster@ntlworld.com (Garry Lancaster) wrote (abridged):
> Your definitions are very Java-centric.

Also Smalltalk-centric. Probably Lisp-centric and many other languages too
:-)

We should keep a clear distinction between "finaliser" and "destructor".
There are two concepts here which need separate names. They are not
synonyms. C++ is the only language I know with destructors.


> For example in Boehm et. al system, where what you call
> a finalizer typically calls a destructor.

You mean like:

    void Object::finalise() {
        delete this;
    }

as a choice of the programmer? Or do you mean that Boehm's system calls
destructors directly instead of finalisers?


> It doesn't solve the underlying problem of resurrection for
> the programmer though. It's a bit like "solving" the problem of
> burglary by legalising it (and expecting anyone at risk to fit
> iron bars to their windows).

It puts the means for a solution into the hands of the programmer.

This is the mid-way course. Programmers who don't need finalisation at all
can just never write finalisers. Hopefully implementations will be smart
enough to optimise away most of the overhead, and they won't be paying
much for something they don't use.

Programmers who want sane resurrection semantics can implement it
themselves, if necessary by using the finalise() method to set a flag in
their object which other methods can test. (Probably there should be an
"unfinalise" method which tells an object when it has been resurrected.)

Programmers who want deletion semantics can call delete in their finalise
method, as above.

There are two other courses. The first, as I've advocated elsewhere, is
for GC not to call either finalisers or destructors. I prefer this because
it is simplest and makes least change to the current semantics, and
because I believe there is little useful that can be done in a finaliser
anyway.

The last is for GC to invoke destructors instead of finalisers. I see
nothing good about this. It makes the resurrection problem worse, because
now we have pointers to destructed objects. It is a bigger change in
semantics; it changes the observable behaviour of leaks. I suspect it will
be harder to write objects which work both with and without GC because of
it.


> [...] resurrection is an issue that all GC systems struggle with.
> I don't think there is a perfect solution.

And not just GC systems. We already have the resurrection problem. An
object is already able to create dangling pointers to itself in its
destructor. There is an argument for saying that if you write code like:

    Object::~Object() {
        global_ptr = this;
    }

you deserve everything you get.

GC doesn't create the problem. GC gives us a means to solve it. The
easiest solution is for GC never to run finalisers or destructors, or any
code at all on an object which has become unreachable.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 17 Jun 2002 18:02:28 GMT Raw View

Could some experts explain why one should not call the destructor when
using a non-deterministic GC? -- Here is an example:

I write a class ref<X> which maintains references (pointers) which will
become collected at a time by a GC when if feels for it. I use it in order
to open a file:
    ref<new std::ofstream("out.txt")> ofr;
    ...
I will use this reference as a pointer which I do not delete, for example
    ref<std::ofstream*> ofr1 = ofr;
    ...
    ref<std::ofstream*> ofr2 = ofr1;
    ...
    return ofr2;
and so on, in functions calling each other.

At some point in time all these references becomes unreachable from the
root set. My non-deterministic GC wakes up at some point in time, and
decide to remove the resource created by the line
    new std::ofstream("out.txt")
It then seems that in this process, the destructor std::~ofstream should
be called, in order to make sure that the file is being closed. It then
does not look like there is any difference between "destructors" and
"finalizers".

-- C++ would suffice that one merely writes new code with a suitable
ref<X> construct, making it easy to insert into newly written code. Code
that decides to not use the new C++ GC oriented reference construct would
be unaffected.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Wed, 12 Jun 2002 18:25:48 GMT Raw View

Hans Aberg:
> But ref counts are no substitute for a true conservative GC, because ref
> counts cannot reclaim loops.

Since we are talking about standardizing GC, we
can be bolder. The main reason Boehm et al.'s
GC is conservative is because it gets no help from
the compiler in identifying pointer locations. With
compiler support we can have a precise GC or,
probably more likely, a mostly-precise GC that knows
the locations of pointers in the heap but treats registers
(and, perhaps, the stack) conservatively.

Boehm et al.'s stuff is good and there is a lot to learn from
there in any case, but standardizing a purely conservative
GC algorithm would be somewhat of a missed
opportunity in my opinion.

Of course, the algorithm itself is unlikely to be standardized.
What we would end up with would be, as normal, a set of
basic requirements that can be met by a variety of different
implementations. But we need to bear in mind some likely
implementations when setting those requirements.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Wed, 12 Jun 2002 19:16:33 GMT Raw View

In article <qNFN8.1385$69.53010@newsfep1-win.server.ntli.net>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:
>> But ref counts are no substitute for a true conservative GC, because ref
>> counts cannot reclaim loops.
>
>Since we are talking about standardizing GC, we
>can be bolder. The main reason Boehm et al.'s
>GC is conservative is because it gets no help from
>the compiler in identifying pointer locations.

No, this is not the reason, I think. Evidently, he wanted to implement a
conservative GC in C++, and a found a (rather ineffective) way around the
compiler to identify the pointers via the OS.

One would want GC awareness in part in order to make more effective GC
implementations possible, and in part because there are situations where a
GC is needed which can reclaim all dynamically created dead objects.

> With
>compiler support we can have a precise GC or,
>probably more likely, a mostly-precise GC that knows
>the locations of pointers in the heap but treats registers
>(and, perhaps, the stack) conservatively.

Or so is the hope:

In real life, multiple GC's must be synchronized and be aware if each
other if one wants pointers belonging to one allocation/deallocation
method to be traceable by another such method.

>Boehm et al.'s stuff is good and there is a lot to learn from
>there in any case, but standardizing a purely conservative
>GC algorithm would be somewhat of a missed
>opportunity in my opinion.

See the other thread "Boehm Garbage Collection Questions..." by Hans Boehm
for a list of options. Current Boehm is about (2) in the list mentioned
there. But for implementation of functional languages and the like, one
would need at least (3) of that list.

>Of course, the algorithm itself is unlikely to be standardized.
>What we would end up with would be, as normal, a set of
>basic requirements that can be met by a variety of different
>implementations. But we need to bear in mind some likely
>implementations when setting those requirements.

GC technology is evolving, so I figure that one would want C++ to make
possible for GC libraries to be implemented, so that one can choose the
particular method that is best for you application.

As one then can call libraries using different allocation/deallocation
methods, it seems to call for that multiple GC's must be present.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Wed, 12 Jun 2002 19:56:59 GMT Raw View

Garry Lancaster wrote:

> Hans Aberg:
>
>>But ref counts are no substitute for a true conservative GC, because ref
>>counts cannot reclaim loops.


    Ref counts can reclaim loops if at least one
pointer in the loop is "weak" (in the Perl, not
the Java, sense.)  If tree algorithms are coded
with weak pointers to parents, tree structures
will be collected properly.  That's the most
common case.

    It's possible to design systems which guarantee
at compile time that there are no strong pointer
loops, but that requires global analysis.


> Since we are talking about standardizing GC, we
> can be bolder. The main reason Boehm et al.'s
> GC is conservative is because it gets no help from
> the compiler in identifying pointer locations. With
> compiler support we can have a precise GC or,
> probably more likely, a mostly-precise GC that knows
> the locations of pointers in the heap but treats registers
> (and, perhaps, the stack) conservatively


    It's worth noting that there's a scaling issue
with conservative GC.  If the
address space is close to fully populated with
memory (for 32-bit machines, we're there now),
a random bit pattern is very likely to appear to be a
valid pointer to something.  Conservative GC looked
like a better idea twenty years ago, when we had
4MB VAXes with 4GB address spaces.

    As a security issue, a denial-of-service attack
may be possible by passing around content which
looks like pointers.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 13 Jun 2002 16:25:08 GMT Raw View

> > Hans Aberg:
> >>But ref counts are no substitute for a true conservative GC,
> >>because ref counts cannot reclaim loops.

John Nagle:
>     Ref counts can reclaim loops if at least one
> pointer in the loop is "weak" (in the Perl, not
> the Java, sense.)  If tree algorithms are coded
> with weak pointers to parents, tree structures
> will be collected properly.  That's the most
> common case.
>
>     It's possible to design systems which guarantee
> at compile time that there are no strong pointer
> loops, but that requires global analysis.

The point is that ref counts alone are not sufficient
for the job.

Garry Lancaster:
> > Since we are talking about standardizing GC, we
> > can be bolder. The main reason Boehm et al.'s
> > GC is conservative is because it gets no help from
> > the compiler in identifying pointer locations. With
> > compiler support we can have a precise GC or,
> > probably more likely, a mostly-precise GC that knows
> > the locations of pointers in the heap but treats registers
> > (and, perhaps, the stack) conservatively

John Nagle:
>     It's worth noting that there's a scaling issue
> with conservative GC.  If the
> address space is close to fully populated with
> memory (for 32-bit machines, we're there now),
> a random bit pattern is very likely to appear to be a
> valid pointer to something.  Conservative GC looked
> like a better idea twenty years ago, when we had
> 4MB VAXes with 4GB address spaces.

Good point. However, once 64 bit machines become
the norm, conservative GC will again benefit from a
lower incidence of false positives (at least until we start
commonly using up significant chunks of a 64 bit
address space: sounds mad but we haven't seen
Windows 2020 yet, have we ;-)

>     As a security issue, a denial-of-service attack
> may be possible by passing around content which
> looks like pointers.

I think this is more questionable. In most mark-and-
sweep systems, pointers are usually followed at most
once per collection cycle, so it would have to be a very
large block of dodgy data to cause anything more
than a slight slowdown of the collector. Is there any
concrete evidence to suggest a problem?

One point that people do tend to forget is that without
at least some non-standard compiler guarantees
conservative GC algorithms aren't 100% guaranteed to
work correctly since, at least in theory, C++ compilers
can generate optimized code that hides pointers.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 13 Jun 2002 16:25:50 GMT Raw View

In article <3D07A310.5060500@animats.com>, John Nagle <nagle@animats.com> wrote:
>>>But ref counts are no substitute for a true conservative GC, because ref
>>>counts cannot reclaim loops.
...
>    Ref counts can reclaim loops if at least one
>pointer in the loop is "weak" (in the Perl, not
>the Java, sense.)

One can clearly reclaim everything that is not a true, unstructured loop.

One way to reclaim all lost memory is to put it into a program, and take
down the program every once in a while, replacing with a new program.

So if the memory leakage problem is not too big, relative program life
span, one can live with it.

>    It's possible to design systems which guarantee
>at compile time that there are no strong pointer
>loops, but that requires global analysis.

>From you description, it sounds as though one will have to do pretty much
the same thing as a conservative GC, tracing from the root system: The
first encountered pointer of a structure will be the "weak one".

It sounds as just another GC technique to me: Keeping track of this extra
information takes, so it is a question of when one wants to do it.

>    It's worth noting that there's a scaling issue
>with conservative GC.  If the
>address space is close to fully populated with
>memory (for 32-bit machines, we're there now),

I think that one will have to switch to 64-bit address machines fairly soon;

>a random bit pattern is very likely to appear to be a
>valid pointer to something.

then this problem goes away. -- But I am not sure what is has to do with
the GC issue:

>  Conservative GC looked
>like a better idea twenty years ago, when we had
>4MB VAXes with 4GB address spaces.

Conservative GC's do generate a certain overhead, and one might not want
to use it for such a reason.

>    As a security issue, a denial-of-service attack
>may be possible by passing around content which
>looks like pointers.

I am not sure what you mean here: Security issues can be solved by
properly encapsulating the objects and sending requests between them,
rather allowing a global, "imperative" manipulation of objects from the
outside, which is commonly done in the name of efficiency. Ensuring
security takes CPU time, that is clear already on the OS implementation
level.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 13 Jun 2002 16:27:25 GMT Raw View

> > Garry Lancaster wrote:
> > > However, there is an issue here and one which all
> > > GC systems struggle with: what to do when a finalizer
> > > resurrects an object (not necessarily the finalizer's
> > > object) by setting a reference to it outside the finalizing
> > > object. In languages that are supposed to offer total
> > > pointer safety this is a real issue,

Hans-J. Boehm:
> Why?  This doesn't violate pointer-safety in any way, unless you
> combine it with explicitly deallocated memory (which is itself
> unsafe).

This is not true. For example:

Object  A is not reachable and contains a pointer to object B.
Object B is reachable and contains a pointer that doesn't
initially point to A.

A is not reachable so is finalized. It's finalizer sets the
pointer within object B to point to A.

B's pointer now points to the bag of bits which used
to be object A. Object A's de-allocation may not be
performed in the same cycle as its finalization, but even
so we still have a reachable non-NULL pointer to memory
that is no longer a fully constructed object. An unsafe
pointer in other words.

As far as I know this can happen in Java, C# or with
the Boehm et al. collector.

>  And it's occasionally useful if the object contains state
> that's difficult to reconstruct, but is likely to be reusable for a
> new object of the same type.

Finalizer resurrection may be sufficient to help out here
but I don't believe it is *necessary*.

> > > but in C++ there is
> > > a long tradition of allowing the programmer to shoot
> > > themselves in the foot if they really want to. In fact,
> > > unlike C#, in C++ I favour a system that destructs and
> > > de-allocates an object in the same collection cycle,
> > > partly because it will draw more of these types of errors
> > > out into the open during testing, through GPFs or similar.

> What about objects referenced by finalizable objects?

Those too. The approach I am currently experimenting with
is to say that using a GC pointer within what I still call a
destructor is not permitted. This works quite well for a
smart pointer system where GC pointers are different from
raw pointers. It wouldn't work very well for your system.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Thu, 13 Jun 2002 17:30:32 GMT Raw View

Garry Lancaster wrote:

>John Nagle wrote
>>    As a security issue, a denial-of-service attack
>>may be possible by passing around content which
>>looks like pointers.
>
> I think this is more questionable. In most mark-and-
> sweep systems, pointers are usually followed at most
> once per collection cycle, so it would have to be a very
> large block of dodgy data to cause anything more
> than a slight slowdown of the collector. Is there any
> concrete evidence to suggest a problem?


    Has anybody ever deployed conservative GC in a server
deployed in any quantity?

    It's not a slowdown issue.  It's a memory leak issue.


> One point that people do tend to forget is that without
> at least some non-standard compiler guarantees
> conservative GC algorithms aren't 100% guaranteed to
> work correctly since, at least in theory, C++ compilers
> can generate optimized code that hides pointers.


    That's more of a user problem.  The best known
code that violates the requirements of conservative GC
is "Numerical Recipies in C".  NR in C is really
NR in FORTRAN, transliterated.  The subscripts
still count from 1, like FORTRAN.  NR in C supports
such subscripts with an allocation library that offsets
the beginning of each array so that the subscripts
count from 1.  A similar hack is provided for 2D
arrays.  Thus, the pointer to an array points somewhere
previous to the array.  NR in C does this pervasively
through its entire subroutine library.  Conservative
GC won't consistently recognize such arrays as in use.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Thu, 13 Jun 2002 20:18:12 GMT Raw View

Hans Aberg wrote:

> In article <3D07A310.5060500@animats.com>, John Nagle <nagle@animats.com> wrote:

[...snip...]

>>It's worth noting that there's a scaling issue
>>with conservative GC.  If the
>>address space is close to fully populated with
>>memory (for 32-bit machines, we're there now),
>>
>
> I think that one will have to switch to 64-bit address machines fairly soon;

[...snip...]
 > then this problem goes away.

   not in the realm of embedded systems, which still use 8-bit and
16-bit controllers and where 32-bit processors are sometimes considered
a luxury

   Furthermore, Intel's Itanium isn't the only 64-bit processor which is
possible in a future computer near you.  AMD's state-of-the-art
8th-generation SledgeHammer/Opteron & ClawHammer/Athlon64 have both a
64-bit mode and a 32-bit mode.

   Furthermore, the size of sum-total address-space addressable by the
processor's MMU is not what is at issue with regard to C++, but rather
the size of address-space of a single process (in the UNIX & Windows NT
sense) as presented by the combination of the operating system and the
underlying processor.  Some people would say that, when the median PC is
equipped with greater than 4GiB of RAM, 32-bit processors' days are near
an end in the PC market.  But this ignores the capabilities of segmented
memory.  The size of a pointer in a process on a given processor does
not imply any restriction on the total addressable address-space by that
processor's MMU.  It would be possible for a processor &
operating-system which presents a 32-bit pointer (and thus 32-bit
address space) as a flat memory model within each process to run more
than one such process as each process's address-space is mapped into a
larger 48-bit address-space, such as is possible in in the 386 processor
architecture family.  This application of segmented memory permits
larger sum-total address-spaces in the MMU than the per-process 4GiB
limit on some 32-bit processors.  Thus it is not until the median
address-space usage within *each* *process* approaches some significant
fraction of 4GiB that 32-bit pointers are nearing the end of their
mainstream lifetime.

   Thus 32-bit address-spaces will likely be alive & kicking for far far
into the foreseeable future in embedded systems and within each
4GiB-addressable process on 32-bit processors.  Because of this
continued longevity of 32-bit address-spaces it would be thoroughly
unacceptable for a C++ standardized GC to impose even a soft
efficiency-oriented expectation (let alone an overt requirement) that
the address-space of each process is 64-bits in size.

   [ GLOSSARY ]

   GiB = gibibyte, IEC's replacement for gigabyte
   MMU = memory management unit

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 13 Jun 2002 23:55:13 GMT Raw View

Hans Aberg:
> >> But ref counts are no substitute for a true conservative GC, because
ref
> >> counts cannot reclaim loops.

Garry Lancaster:
> >Since we are talking about standardizing GC, we
> >can be bolder. The main reason Boehm et al.'s
> >GC is conservative is because it gets no help from
> >the compiler in identifying pointer locations.

Hans Aberg:
> No, this is not the reason, I think. Evidently, he wanted
> to implement a conservative GC in C++, and a found a
> (rather ineffective) way around the compiler to identify the
> pointers via the OS.

I'm not sure I understand you. If the system knew about
pointer locations it wouldn't be conservative.

Just to check we understand some common GC terms.

conservative: assumes any data that might be a pointer
is a pointer. Boehm et al.'s collector works conservatively.

precise: knows which data is a pointer and which is not.

mostly-precise: most memory areas are treated precisely
(typically the heap), but some (typically registers and stack(s) )
are treated conservatively.

mark-and-sweep: one of the main GC algorithms that is
able to deal correctly with cyclic references. Can be used by
conservative, precise or mostly-precise collectors. This
is the algorithm used by Boehm et al.'s collector.

Unless a mark-and-sweep collector can obtain pointer
location information from somewhere it has to be conservative.

You are not confusing "conservative" with "mark-and-sweep"
by any chance?

> One would want GC awareness in part in order to make more effective GC
> implementations possible, and in part because there are situations where a
> GC is needed which can reclaim all dynamically created dead objects.

> > With
> >compiler support we can have a precise GC or,
> >probably more likely, a mostly-precise GC that knows
> >the locations of pointers in the heap but treats registers
> >(and, perhaps, the stack) conservatively.
>
> Or so is the hope:
>
> In real life, multiple GC's must be synchronized and be
> aware if each other if one wants pointers belonging to
> one allocation/deallocation method to be traceable by
> another such method.

This point is not related to the conservative vs. precise
discussion: I'm only suggesting that the GC system
supported by the compiler could be (mostly-)precise.

[snip]

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 13 Jun 2002 23:55:44 GMT Raw View

John Nagle:
> >     I'd recommend total pointer safety.  We may be
> >better off insisting that GC objects can't have
> >destructors or finalizers.  If you want cleanup,
> >you should have to use explicit allocation or reference
> >counts.  That rule yields semantics you can
> >explain to the typical new C++ programmer.

Francis Glassborow:
> Indeed, the problem with dtors and finalizers being 'called' during the
> process of GC is that your program acquires non-deterministic behaviour.
>
> Just releasing memory is one thing but running code at
> some arbitrary time makes any attempt at 'program proving' futile.

Actually, GC does not require any non-deterministic
behaviour. It is quite possible to create a GC which is
completely deterministic. boost::shared_ptr is one
such example (at least when running single-threaded),
but the non-deterministic approach may be used by the
more powerful mark-and-sweep algorithm and, as far
as I know, the copying algorithm also.

The destruction order of objects governed by these
entirely deterministic GCs may be a little tricky for a human
programmer to predict, but that is because of the complexity
of the interaction between the GC algorithm and the data
structure linkage, not non-deterministism.

Most real-life GCs tend to run at least partially on a
separate thread. The truth of the matter is that it is
typical OS multi-threading scheduling algorithms that
cause the non-determinism of these GCs, not the
garbage collection itself. (Scheduling also makes most
multi-threaded programs that don't use GC non-
deterministic so it is wrong to single out GC for special
blame.)

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 13 Jun 2002 23:57:47 GMT Raw View

In article <o31O8.20162$sC4.2616859@news6-win.server.ntlworld.com>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:

>Object  A is not reachable and contains a pointer to object B.
>Object B is reachable and contains a pointer that doesn't
>initially point to A.
>
>A is not reachable so is finalized. It's finalizer sets the
>pointer within object B to point to A.
>
>B's pointer now points to the bag of bits which used
>to be object A.

Such problems can be resolved by a runtime model I developed:

In this model, object bounce requests between each other which each object
safely can handle, but one does not allow for any structure to
"imperatively" step into another object and manipulate it. From this
model, one can then try to find a more efficient implementation.

Then in your example, the finalizer of object A sends a request to object
B to set its pointer to object A. Then object B must be able to determine
whether this is safe or not, discovering that A is finalizing.

I'm not sure such problems have a general solution, but one could design
special pointers that cannot be set to an object that is finalizing, or
even have had its destructor initiated (assuming that one in C++ will make
use of both destructors and finalizers). Then this would be a high level
type of reference, different from the ones that may be used in
implementing a GC, which may need more brute ways to handle pointers.

The main point is that this line of thought leads to a classification of
a  series of safe runtime objects.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 13 Jun 2002 23:57:55 GMT Raw View

John Nagle <nagle@animats.com> wrote in message
news:3D08D560.9040009@animats.com...
> Garry Lancaster wrote:
>
> >John Nagle wrote
> >>    As a security issue, a denial-of-service attack
> >>may be possible by passing around content which
> >>looks like pointers.

Garry Lancaster:
> > I think this is more questionable. In most mark-and-
> > sweep systems, pointers are usually followed at most
> > once per collection cycle, so it would have to be a very
> > large block of dodgy data to cause anything more
> > than a slight slowdown of the collector. Is there any
> > concrete evidence to suggest a problem?

John Nagle:
>     Has anybody ever deployed conservative GC in a server
> deployed in any quantity?
>
>     It's not a slowdown issue.  It's a memory leak issue.

There will always be false positives in conservative
collection. Once the dodgy data is processed and there
are no more reachable references to it, any allocations
it was keeping reachable stand the same chance as any
other non-live conservative GC allocation of being collected.
If you think delayed collection == leak, you are never going
to be happy with any GC system I think, but especially not
with a conservative one.

> > One point that people do tend to forget is that without
> > at least some non-standard compiler guarantees
> > conservative GC algorithms aren't 100% guaranteed to
> > work correctly since, at least in theory, C++ compilers
> > can generate optimized code that hides pointers.
>
>
>     That's more of a user problem.  The best known
> code that violates the requirements of conservative GC
> is "Numerical Recipies in C".  NR in C is really
> NR in FORTRAN, transliterated.  The subscripts
> still count from 1, like FORTRAN.  NR in C supports
> such subscripts with an allocation library that offsets
> the beginning of each array so that the subscripts
> count from 1.  A similar hack is provided for 2D
> arrays.  Thus, the pointer to an array points somewhere
> previous to the array.  NR in C does this pervasively
> through its entire subroutine library.  Conservative
> GC won't consistently recognize such arrays as in use.

This may be true, but it is not the problem I was referring
to. No especially tricky code is required to demonstrate
the problem. An example given in a paper by Ellis and
Detlefs, "Safe, Efficient Garbage Collection for C++",
goes like so:

char* a = new char [ 10 ];
int i = 20, j = 19;
....a[ i - j ]...

might generate the following assembly

# a is in register $a, i in $i, j in $j
# result will be placed in $r
addu $a, $i
subu $a, $j
lb $r, 0($a)

After the first instruction $a points past the end of
the array, so the array is not reachable even though it
is still live. Were a collection to occur asynchronously
at this point the array could be collected prematurely.

Were GC to be standardized, the standard would
obviously not permit premature collection so compiler
vendors would have to author their GCs and/or modify
their optimizers accordingly.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 14 Jun 2002 14:15:50 GMT Raw View

In article <3D08FDA7.3030102@tellabs.com>, Daniel Miller
<daniel.miller@tellabs.com> wrote:

>Hans Aberg wrote:

>> I think that one will have to switch to 64-bit address machines fairly soon;
...
> > then this problem goes away.

>   not in the realm of embedded systems, which still use 8-bit and
>16-bit controllers and where 32-bit processors are sometimes considered
>a luxury

As for the GC question, I tend to think about large ones like in Hugs
http://haskell.org/hugs, which is a two-space copier, I think.

It is not only too big for embedded systems but too big for ordinary programs.

Perhaps languages like Java admits one to use GC's in smaller programs,
but in general, I think of it as for places where one has more amply of
memory.

>   Furthermore, the size of sum-total address-space addressable by the
>processor's MMU is not what is at issue with regard to C++, but rather
>the size of address-space of a single process (in the UNIX & Windows NT
>sense) as presented by the combination of the operating system and the
>underlying processor.

In the past, when computers grow bigger and more powerful, so has OS's
changed accordingly: Why would the situation be anything different now?

>  Some people would say that, when the median PC is
>equipped with greater than 4GiB of RAM, 32-bit processors' days are near
>an end in the PC market.  But this ignores the capabilities of segmented
>memory.

I think that one would switch to 64-bit overall when technology has
developed so far that isn't costly anymore. At that point in time, it
becomes more costly trying to stick to hybrid methods, as programmer time
is more costly.

-- Essentially, one ends up with complicated OS's (and MMU techniques)
that are difficult and costly to maintain. And then it is cheaper to use
64-bit.

Otherwise, one can always push for methods that squeezes the bits a bit extra.

>   Thus 32-bit address-spaces will likely be alive & kicking for far far
>into the foreseeable future in embedded systems and within each
>4GiB-addressable process on 32-bit processors.  Because of this
>continued longevity of 32-bit address-spaces it would be thoroughly
>unacceptable for a C++ standardized GC to impose even a soft
>efficiency-oriented expectation (let alone an overt requirement) that
>the address-space of each process is 64-bits in size.

To me it is clear that C++ should not impose any model such conservative
GC requirements, but have it as an option in code that needs it: some code
though expects such requirements like GC. Then one wants the code to be
portable to a particular set of sufficiently powerful machines:

I have in my mind a program like Hugs: In the beginning of the nineties, I
think it was aimed at mainly 16-bit machines. There might have been even a
precursor that ran on 8-bit machines. But then in the mid-nineties, one
started to switch to implement it for mainly 32-bit computers.

I did the Mac port, and at that time both 16 and 32 bit machines were
common on the Mac side. But I was unable to compile Hugs for 16-bit
machines (due to compile segmentation problems etc), and I recall that
no-body else did: I think it was compilable only on some of the latest
32-bit computers.

At this point in time (about 2001), on the Mac side, one switched from
cooperative multitasking to a true Mach/UNIX based OS, which requires even
more powerful RISC processors (G3/G4) than old cooperative MacOS. At this
point, my old Mac port work is essentially outdated, because it is not
worth the programmer effort to support it anymore: It is better to make
sure to have an up-to-date computer and an up-to-date OS, and then compile
the Hugs program using that setup.

But one can see the pattern here: The overall Hugs development hangs on a
about the same level in computer capacity, the PC level, even though this
capacity significantly increases over the years. It is not worth the
programmer effort trying to implement this program in its up to date
version on smaller computers, even though it once in earlier versions ran
on much smaller computers.

Now to the C++ GC question: At the beginning to mid-nineties, dynamic
programming languages could not fit into those commonly available (say
student sized) PC's, only those that computer scientists beefed up with a
lot of money. (This happened with say the NJ SML Mac port.)

But this is not the situation now: Right now every PC sold has very ample
of memory available for such dynamic language implementations. And on this
particular computer powerfulness level, GC's are a necessity.

But it is pretty much impossible to implement a GC in C++ proper, because
one does not get any help in tracing the C++ created objects. (Recall,
Boehm's GC is not within C++, but goes underneath, on a layer between the
C++ compiler and the OS, which is also not as efficient as one would want
it to be.) -- One can just as well switch to C, doing all the stuff that
C++ does automatically for you (like temporary stuff into "registers") by
hand in C.

But then, the need for a GC transforms something that could have been an
inexpensive single man programming project into an expensive
multiprogrammer project.

So it seems ridiculous that one should be able to use C++ for implementing
dynamic structures as they are expected to be.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 14 Jun 2002 14:16:00 GMT Raw View

In article <521O8.20155$sC4.2615891@news6-win.server.ntlworld.com>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:
>Actually, GC does not require any non-deterministic
>behaviour. It is quite possible to create a GC which is
>completely deterministic. boost::shared_ptr is one
>such example (at least when running single-threaded),
>but the non-deterministic approach may be used by the
>more powerful mark-and-sweep algorithm and, as far
>as I know, the copying algorithm also.

I use to distinguish out conservative GC's (which might be called a CGC
then), a class which contains the non-deterministic mark and sweep GC's
you are mentioning.

Strictly speaking, every C++ "delete" is a form of GC. -- One wants to
extends this C++ syntactic deterministic GC to include also the
non-deterministic CGC'c when the need arises. (I suspect that C++ code
must somehow be made CGC aware for this to work properly.)

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Sat, 15 Jun 2002 06:17:17 GMT Raw View

    If we're going to have GC, I suggest the
following semantics for "finalization".

    1. GC objects cannot have destructors.  But
 they can have a "reset_object" function,
 which is called by the GC system.

    2.   An object is guaranteed that its "reset_object"
 function will be called at least once before
 garbage collection.  The "reset_object" function
 may be called more than once.

    3. No object will not be collected while any pointer to
 it exists, even if that pointer is created from a
 "reset_object" function.  GC is thus prohibited from
 ever creating a dangling pointer situation.

    4. An object is not guaranteed that its "reset_object"
 function will be called on program exit.

    5. The "reset_object" function will only be called on
 objects that have completed construction.  If a
 constructor aborts via an exception, the "reset_object"
 function must not be called.  Thus, the "reset_object"
 function can assume that the object is fully constructed.

This is safe with regard to dangling pointers.  It handles
"resurrection", where an object being finalized links itself
to a live object.  (This is rare, but should be handled
correctly.) It's consistent with existing constructor and
destructor semantics.

The name "reset_object" is arbitrary.  The idea is that
the object is being reset to some ground state, something
that can happen more than once, and might not happen at all.

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Sat, 15 Jun 2002 06:17:55 GMT Raw View

Garry Lancaster:
> >Object  A is not reachable and contains a pointer to object B.
> >Object B is reachable and contains a pointer that doesn't
> >initially point to A.
> >
> >A is not reachable so is finalized. It's finalizer sets the
> >pointer within object B to point to A.
> >
> >B's pointer now points to the bag of bits which used
> >to be object A.

Hans Aberg:
> Such problems can be resolved by a runtime model I developed:
>
> In this model, object bounce requests between each other which each object
> safely can handle, but one does not allow for any structure to
> "imperatively" step into another object and manipulate it. From this
> model, one can then try to find a more efficient implementation.
>
> Then in your example, the finalizer of object A sends a request to object
> B to set its pointer to object A. Then object B must be able to determine
> whether this is safe or not, discovering that A is finalizing.

So, you are banning finalizers from calling member
functions of any other object? Or is there some more
subtle criteria for determining the safety of the call?

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sat, 15 Jun 2002 06:18:04 GMT Raw View

In article <Yq%N8.18694$sC4.2529865@news6-win.server.ntlworld.com>, "Garry
Lancaster" <glancaster@ntlworld.com> wrote:
>Just to check we understand some common GC terms.
>
>conservative: assumes any data that might be a pointer
>is a pointer. Boehm et al.'s collector works conservatively.

In my mind a "conservative GC" is merely a GC that collects the dead
objects at a point in time it determines suitable.

This might be done say by tracing the root system, marking down the live
objects, then removing those not live. (And if objects does not have
finalizers, one can just forget about them.)

The conservative GC still needs to know what is a pointer and what is not.
The compiler can aid in this by supplying information of where the root
system is located, which current C++ does not. It can also help by
supplying information about which pointer to trace, and which one should
not trace, also not provided by C++.

Therefore, the Boehm conservative GC goes in on a layer between the C++
compiler and the OS to find the root system, and trace the pointers. But
this is not very efficient and one cannot target the GC to just certain
classes of objects.

-- This is what I thought. Have I misunderstood something?

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Sat, 15 Jun 2002 06:18:28 GMT Raw View

glancaster@ntlworld.com (Garry Lancaster) wrote (abridged):
> Object  A is not reachable and contains a pointer to object B.
> Object B is reachable and contains a pointer that doesn't
> initially point to A.
>
> A is not reachable so is finalized. It's finalizer sets the
> pointer within object B to point to A.
>
> B's pointer now points to the bag of bits which used
> to be object A.

No; A is still a proper object even after its finaliser has been run. This
is the crucial difference between a finaliser and a destructor.

Destructors destroy. They change the dynamic type of their object
successively from most-derived to base, and end up with raw memory. To use
an object after it has been destroyed yields undefined behaviour. The
vtable is no longer there.

Finalisers don't do that. Finalisation doesn't change the type of the
object. It's still well-defined, at least at the language level, to call
virtual functions of a finalised object. It may even be finalised twice or
more times.


> As far as I know this can happen in Java, C# or with
> the Boehm et al. collector.

Java behaves as above.

Some objects may be written to not-work after they've been finalised, but
that's a choice of the object's author. For example:

    void File::read( void *buf, int len ) {
        assert( isOpen() );
        // ...
    }

    void File::finalise() {
        if (isOpen()) {
            close();
        }
    }

so that calling read() after finalise() does no good. However, this isn't
much different to calling read() after a manual close(). It's an issue of
the class design. Many of Java's library classes are, alas, poorly written
with regard to finalisation. That's not the language's fault, though.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hans_Boehm@hp.com (Hans-J. Boehm)
Date: Tue, 11 Jun 2002 15:02:23 GMT Raw View

John Nagle <nagle@animats.com> wrote in message news:<3D043B10.8040600@animats.com>...
> Garry Lancaster wrote:
>
> > However, there is an issue here and one which all
> > GC systems struggle with: what to do when a finalizer
> > resurrects an object (not necessarily the finalizer's
> > object) by setting a reference to it outside the finalizing
> > object. In languages that are supposed to offer total
> > pointer safety this is a real issue,
Why?  This doesn't violate pointer-safety in any way, unless you
combine it with explicitly deallocated memory (which is itself
unsafe).  And it's occasionally useful if the object contains state
that's difficult to reconstruct, but is likely to be reusable for a
new object of the same type.

> > but in C++ there is
> > a long tradition of allowing the programmer to shoot
> > themselves in the foot if they really want to. In fact,
> > unlike C#, in C++ I favour a system that destructs and
> > de-allocates an object in the same collection cycle,
> > partly because it will draw more of these types of errors
> > out into the open during testing, through GPFs or similar.
What about objects referenced by finalizable objects?

I would argue that the point of the collector is in large part to
eliminate references to dangling pointers.  Why reintroduce them here?
>
>
>       It's worth considering that GC-related bugs are
> inherently expensive to find, because they're not usually
> brought out by simple test cases.  You have to run out
> of memory, which requires stress testing with
> a reduced amount of memory available.  That's a type
> of testing not usually required for C++ programs.
It depends on what you mean by "run out of memory".  A reasonable GC
in a non-embedded environment will garbage-collect long before it runs
out of swap space.  If it's potentially not the only running process,
it won't wait until it's out of primary memory either.  That would be
disastrous for the other processes.  Ours typically starts with
something like 128 KB heap and expands that when it allocates too
little memory (e.g. < 1/3 of the heap size) between collections.

I think any memory-allocation or memory-overwrite related bugs are
expensive to find.  Memory smashes may also not show up in small test
cases.  I've seen duplicate and improper calls to free() that didn't
show up until the malloc/free implementation in a dynamic C library
was replaced.

Garbage collectors make the debugging issues a bit worse (mostly since
memory overwrite errors can smash GC data structures), but are likely
to dramatically decrease their frequency.
>
>       I'd recommend total pointer safety.
That's one reasonable solution.  But other techniques (relying on STL
containers instead of raw arrays and pointer arithmetic, Purify-like
tools) get you part-way there.

> We may be
> better off insisting that GC objects can't have
> destructors or finalizers.  If you want cleanup,
> you should have to use explicit allocation or reference
> counts.  That rule yields semantics you can
> explain to the typical new C++ programmer.

But for large systems, it may mostly negate the benefit of the
collector.  If you have a large complicated data structure, that may
contains one or objects holding a system resource, you potentially
need to reference count the whole complex structure in order to manage
the system resources.

Hans

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Tue, 11 Jun 2002 16:12:08 GMT Raw View

Daniel Miller wrote:

>   Remember that I define the term "GC" to be applicable to
> finalizer-based collectors.



> By retrofitting different end-of-lifetime
> semantics via finalizers into a language which has since its early years
> revolved around dtors for its end-of-lifetime semantics, there is much
> opportunity for C++0x standardization to dangerously fly by the seat of
> the pants (in auto_ptr style or in export keyword style or in
> internationalization/locale/wchar_t style) with just-in-time incomplete
> research, rather than standardize long-standing time-honored practices.


     Agreed.


>   This is the case for smart pointers which invoke dtors (i.e., without
> finalizers).  Many C++ source-code bases in the world are once again
> either rolling their own, using Loki's, using Boost's, or using GNU's.
> There is obvious pent-up demand for smart pointers without finalizers.
> Also, there is much pent-up criticism of a canonical smart-pointer's
> inability to deal with cycles.  More research is now needed into
> cycle-capable dtor-invoking smart-pointers.  We must not jump too
> quickly to standardizing a finalizer philosophy which is foreign to C++
> (which would have been analogous to standardizing Smalltalk-style
> containers, pinching off STL research too soon).


     Agreed.  There are many smart-pointer libraries, most of
which work if used carefully.  Care is required because the
smart pointer systems depend on the programmer not doing certain
things, and because smart pointers seem to have to expose raw pointers
to get anything done.  Perhaps the language and compiler can check to
see that the programmer isn't doing those things.


>   The case for finalizer-based GC is much weaker than C++'s containers
> and much weaker than smart pointers.


     I tend to agree, although I don't feel as strongly about this
as I used to.  I could live with GC for destructorless objects.

     Strostrup writes (C.9.1.3, Destructors) that a garbage
collector should never call a destructor.  This simulates
infinite memory, he says.  That's a good starting point.

     Beyond basic philosophy, if we have more than one type
of memory management (reference counted, garbage collected,
manual) where are they distingushed?  In the class
declaration, or at the call to "new"?


     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Fri, 7 Jun 2002 18:01:59 GMT Raw View

Hans-J. Boehm wrote:

> More importantly I claim that, unlike for stack allocated objects,
> running destructors for heap objects synchronously can lead to very
> unexpected behaviour and intermittent program failures.  A more
> detailed discussion of this is at
> http://www.hpl.hp.com/personal/Hans_Boehm/gc/det_destr.html .
> Thus I claim you should insist on asynchronous destruction for heap
> objects (e.g. by a separate thread), so that the destructors can
> safely acquire locks.  (The Java specification had this part right
> from the beginning, though I think it took years before any of the
> implementations did.)


      Hans has a good point here.  The biggest trouble spot
is the calling of a destructor which needs a lock but
can't get it because the same thread has that lock.
This deadlocks.  This is not likely to be a common
situation, but it is possible.  It's well-defined;
when it happens, you know what happened, unlike a race
condition or a bad pointer.  And it's likely to occur
in a deterministic situation, so it can be detected during
debugging.

      Asynchronous finalizers, on the other hand, lead
to concurrency in what the programmer may have thought
was a single thread program.  Failure to lock leads to
intermittent errors very difficult to find.  This tends
not to come up in Java because, as Hans points out,
finalizers aren't used much.  But C++ has a history of
heavily using destructors.  That's the main incompatibility.

      Related items:

      C++ traditionally doesn't make much of a distinction
between heap objects and stack objects.  Should it?

      Who catches an exception from a finalizer?

      C++, unlike Java, allows objects to be members of
other objects.  Those inner objects have to have their
destructors run when the outer object goes, so they
partake of some of the semantics of stack objects.

      Finalizers can lead to some very wierd semantics.
See "http://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp",
which describes how finalization works for Microsoft Managed
Objects for C++.  See especially the section on "resurrection".
Microsoft allows finalizers to run more than once in some
situations.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Sun, 9 Jun 2002 16:30:30 GMT Raw View

> Hans-J. Boehm wrote:
> > More importantly I claim that, unlike for stack allocated objects,
> > running destructors for heap objects synchronously can lead to very
> > unexpected behaviour and intermittent program failures.  A more
> > detailed discussion of this is at
> > http://www.hpl.hp.com/personal/Hans_Boehm/gc/det_destr.html .
> > Thus I claim you should insist on asynchronous destruction for heap
> > objects (e.g. by a separate thread), so that the destructors can
> > safely acquire locks.  (The Java specification had this part right
> > from the beginning, though I think it took years before any of the
> > implementations did.)

John Nagle:
>       Hans has a good point here.  The biggest trouble spot
> is the calling of a destructor which needs a lock but
> can't get it because the same thread has that lock.
> This deadlocks.

Not necessarily. Some synchronization primitives allow
nested same-thread locks and unlocks. In the example
Hans gives on his web page there is still a problem even
if it nested locks are supported because although the
finalizer runs to completion, when considered in the
context of the non-finalizer code that it interrupted, it
calls some functions that rely on global state in the
wrong order.

It is disingenuous to blame this problem on
synchronous destruction alone. At least part of the
blame rests on the tricky global-state-reliant interface
that is being used by both the destructor and
normal code and on the programmer for not realising
how the system they are using actually works.

> This is not likely to be a common
> situation, but it is possible.  It's well-defined;
> when it happens, you know what happened, unlike a race
> condition or a bad pointer.  And it's likely to occur
> in a deterministic situation, so it can be detected during
> debugging.
>
>       Asynchronous finalizers, on the other hand, lead
> to concurrency in what the programmer may have thought
> was a single thread program.

True: either way there are issues. Unless a GC system
is limited to multi-threaded C++ platforms it needs to at
least offer synchronous destruction as an option. And
even on multi-threaded systems many C++ programs
are single-threaded.

> Failure to lock leads to
> intermittent errors very difficult to find.  This tends
> not to come up in Java because, as Hans points out,
> finalizers aren't used much.

And to be fair, because finalizers that actually cause
problems to occur are probably rarer than those that
don't. For a start, any finalizer that restricts its activities
to the non-shared state of the object it is finalizing will
be OK.

> But C++ has a history of
> heavily using destructors.  That's the main incompatibility.
>
>       Related items:
>
>       C++ traditionally doesn't make much of a distinction
> between heap objects and stack objects.  Should it?

No.

>       Who catches an exception from a finalizer?

At the moment I reckon this should be a fatal error, but
I'm open to persuasion. If we do go with asynchronous
destruction/finalization I think this is the only feasible
option.

>       C++, unlike Java, allows objects to be members of
> other objects.  Those inner objects have to have their
> destructors run when the outer object goes, so they
> partake of some of the semantics of stack objects.

What do you mean by that?

>       Finalizers can lead to some very wierd semantics.
> See "http://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp",
> which describes how finalization works for Microsoft Managed
> Objects for C++.  See especially the section on "resurrection".
> Microsoft allows finalizers to run more than once in some
> situations.

Actually, it's not quite as bad as you make it sound.
As the article makes clear, finalizers are only run more
than once at the programmer's explicit request (if they
call ReRegisterForFinalize).

However, there is an issue here and one which all
GC systems struggle with: what to do when a finalizer
resurrects an object (not necessarily the finalizer's
object) by setting a reference to it outside the finalizing
object. In languages that are supposed to offer total
pointer safety this is a real issue, but in C++ there is
a long tradition of allowing the programmer to shoot
themselves in the foot if they really want to. In fact,
unlike C#, in C++ I favour a system that destructs and
de-allocates an object in the same collection cycle,
partly because it will draw more of these types of errors
out into the open during testing, through GPFs or similar.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Sun, 9 Jun 2002 23:29:06 CST Raw View

Garry Lancaster wrote:

> There is a confusion here between garbage
> collection and pointer safety. The source of that
> confusion is undoubtedly Java, which provides both.


     That's a good point.

     Garbage collection without pointer safety is
scary.  Trouble is likely to appear in the form
of intermittent, very difficult to find bugs.
However, C# and Microsoft's "C++ managed
objects for .NET" are currently exploring that space.
What are we hearing from over there?

     I continue to argue for safe reference counting,
with some optimizations to get unnecessary reference
count updates out of inner loops, and weak pointers
to allow the expression of back pointers.  The
semantics are predictable.  It works well for Perl.
Most of the work can be done in template libraries,
but some language support is needed.

         John Nagle
    Animats


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Mon, 10 Jun 2002 00:54:59 CST Raw View

Garry Lancaster wrote:

> John Nagle wrote
>>      Finalizers can lead to some very wierd semantics.
>>See "http://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp",
>>which describes how finalization works for Microsoft Managed
>>Objects for C++.  See especially the section on "resurrection".
>>Microsoft allows finalizers to run more than once in some
>>situations.
>>
>
> Actually, it's not quite as bad as you make it sound.
> As the article makes clear, finalizers are only run more
> than once at the programmer's explicit request (if they
> call ReRegisterForFinalize).
>
> However, there is an issue here and one which all
> GC systems struggle with: what to do when a finalizer
> resurrects an object (not necessarily the finalizer's
> object) by setting a reference to it outside the finalizing
> object. In languages that are supposed to offer total
> pointer safety this is a real issue, but in C++ there is
> a long tradition of allowing the programmer to shoot
> themselves in the foot if they really want to. In fact,
> unlike C#, in C++ I favour a system that destructs and
> de-allocates an object in the same collection cycle,
> partly because it will draw more of these types of errors
> out into the open during testing, through GPFs or similar.


      It's worth considering that GC-related bugs are
inherently expensive to find, because they're not usually
brought out by simple test cases.  You have to run out
of memory, which requires stress testing with
a reduced amount of memory available.  That's a type
of testing not usually required for C++ programs.

      I'd recommend total pointer safety.  We may be
better off insisting that GC objects can't have
destructors or finalizers.  If you want cleanup,
you should have to use explicit allocation or reference
counts.  That rule yields semantics you can
explain to the typical new C++ programmer.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Mon, 10 Jun 2002 07:07:27 CST Raw View

In article <3D043B10.8040600@animats.com>, John Nagle
<nagle@animats.com> writes
>     I'd recommend total pointer safety.  We may be
>better off insisting that GC objects can't have
>destructors or finalizers.  If you want cleanup,
>you should have to use explicit allocation or reference
>counts.  That rule yields semantics you can
>explain to the typical new C++ programmer.

Indeed, the problem with dtors and finalizers being 'called' during the
process of GC is that your program acquires non-deterministic behaviour.
Just releasing memory is one thing but running code at some arbitrary
time makes any attempt at 'program proving' futile.

--
Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 10 Jun 2002 07:07:41 CST Raw View

> Garry Lancaster wrote:
> > There is a confusion here between garbage
> > collection and pointer safety. The source of that
> > confusion is undoubtedly Java, which provides both.

John Nagle:
>      That's a good point.
>
>      Garbage collection without pointer safety is
> scary. Trouble is likely to appear in the form
> of intermittent, very difficult to find bugs.

Pointer safety is not a yes/no thing though. It's a
continuum. Java is at one end of the continuum
but if you try hard enough you can still do some
unpleasant things by resurrecting objects within
finalizers.

Most C++ smart pointers provide a weak form of
encapsulation, providing certain operations, such
as get(), that if used inappropriately can cause
trouble. I don't think a GC smart pointer with this
level of pointer safety is really any scarier than (or
likely to lead to worse bugs than) std::auto_ptr or
boost::shared_ptr. Indeed, by simplifying/removing
source code related to memory management, GC
is likely to make avoiding some types of bug much
easier.

It would be possible to encapsulate more tightly so
that most unsafe operations were not permitted, but
you would lose the ability to interoperate efficiently (and
in some cases, at all) with current code that deals,
however briefly, with raw pointers. As long as common
APIs continue to require some arguments to be passed
by raw pointer this is unlikely to be a very popular
approach.

> However, C# and Microsoft's "C++ managed
> objects for .NET" are currently exploring that space.
> What are we hearing from over there?

Yes, they are about half way between Java and normal
C++ in terms of pointer safety. They permit the potentially
unsafe operations but the required syntax draws more
attention to them, making them harder to write by accident.

>      I continue to argue for safe reference counting,
> with some optimizations to get unnecessary reference
> count updates out of inner loops, and weak pointers
> to allow the expression of back pointers.  The
> semantics are predictable.  It works well for Perl.
> Most of the work can be done in template libraries,
> but some language support is needed.

Note that weak pointers are potentially unsafe. If you
want a GC algorithm that can deal correctly with cyclic
dependencies without relying on potentially unsafe
constructs, reference counting is not sufficient.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 10 Jun 2002 07:07:57 CST Raw View

In article <3D043B10.8040600@animats.com>, John Nagle <nagle@animats.com> wrote:

>      I'd recommend total pointer safety.  We may be
>better off insisting that GC objects can't have
>destructors or finalizers.  If you want cleanup,
>you should have to use explicit allocation or reference
>counts.  That rule yields semantics you can
>explain to the typical new C++ programmer.

The way I achieve pointer safety in my dynamic programming is to write
classes that wraps the usual pointers up. Then I use only these safe
classes.

Therefore I tend to think that C++ should not put restrictions on this low
level stuff, but instead one first writes suitable high level constructs
simplifying the use of such low level constructs. Then, with that
experience in hand, C++ may be in the need of some new constructs in order
to simplify the use of these high level constructs.

In order to illustrate this idea on at least something explicit, there is
a common request that C++ should provide safe null pointers. Now, in high
level polymorphic programming, one may write something like:
  class object_root { ... };  // Root class of polymorphic hierarchy.

  class object {        // Polymorphic variable class
    object_root data_;  // maintaining a pointer to the polymorphic object,
  public:
    object() : data_(0) {} // which by default is set to 0.
    ...
  };

Now, if one wants to implement safe null pointers in C++, one can merely write
  const object null();
or something. -- The class "object" can then be built to implement the
safe handling of null polymorphic pointers.

>From this idea, when developing the C++ language, it then seems clear that
one should concentrate on facilitate the use of such a class "object". --
It then becomes unimportant what happens with the old T* pointers, as
these will normally not be needed in polymorphic programming. But it
simplifies not having to deal with these old heritage concepts.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 10 Jun 2002 16:25:54 GMT Raw View

In article <3D00E7EC.9080203@animats.com>, John Nagle <nagle@animats.com> wrote:
>     I continue to argue for safe reference counting,
>with some optimizations to get unnecessary reference
>count updates out of inner loops, and weak pointers
>to allow the expression of back pointers.  The
>semantics are predictable.

I have used reference counting extensively, and I think that C++ should
include some libraries for that. -- There are some variations, for
example, if object should be able to self-mutate to an object of another
size, one should use handles. But if such self-mutation is not needed,
there is no point in using such handles.

But ref counts are no substitute for a true conservative GC, because ref
counts cannot reclaim loops. Thus, if one implements something big with
sufficiently general dynamic structure (like in a functional language),
then one should use a GC. Also, some say a true GC should be faster than a
ref count.

But again, if one does not need a true conservative GC, then it might be
unnecessary with all the overhead it generates. Then a ref count is a very
good alternative.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Mon, 10 Jun 2002 16:25:49 GMT Raw View

John Nagle wrote:
[...]
>       Finalizers can lead to some very wierd semantics.
> See "http://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp",
> which describes how finalization works for Microsoft Managed
> Objects for C++.  See especially the section on "resurrection".

Well, "resurrection" isn't really 'entirely BAD idea', I think.
Here is an example [sort-of pseudo-C++-code, just an illustration]
even without full-blown 'garbage collectors' -- using simple reference
counting smart ptrs and object 'caching'... including 'zombies' ;-)

http://groups.google.com/groups?threadm=3D021EA4.E2217C09%40web.de
(Subject: Re: Objects in container: creation and deletion details)

"....

 ~ChunkPtr() throw() // smart 'thread shared object' ptr
 {
   Chunk::ZombieKeyHolder zkHolder; // uninitialized storage

   // Wanna 'kill yet another zombie'?! ;-)
   if ( m_pChunk->unref( zkHolder ) ) {

     // Destroy zombie (kill'm if he's still 'alive' and NOT 'in use' ;-))
     delete Chunk::s_registry.removeZombie( zkHolder.key() );

     // Zombie's key cleanup
     zkHolder.key().~Key();

   }
 }

 bool Chunk::unref( Chunk::ZombieKeyHolder& zkHolder ) throw()
 {
   Guard guard( m_mutex );

   // Let's strictly follow 4.10-rules... ;-)
   if ( 0 != --m_count )
     return false;

   // Save zombie's key -- should better NOT throw! ;-)
   new( &zkHolder.key() ) Key( m_key,zkHolder );

   // Return 'zombie' status
   return true;
 }

 Chunk* Chunk::Registry::removeZombie( const Chunk::Key& key ) throw()
 {
   Guard guard1( m_mutex );

   // Is our 'zombie' still 'alive'?
   if ( !m_coll.locateElementWithKey( key,m_cursor ) )
     return 0; // Schade [both ger. && eng.] :-(

   Chunk* pZombie = m_cursor.element();

   // 'Swap' semantics -- now guard1 'owns' the zombie's lock
   Guard guard2( guard1,pZombie->m_mutex );

   // Is it really 'zombie'?
   if ( 0 != pZombie->m_count )
     return 0; // God rules -- I just wish to have the same 'resurrection' fate! ;-)

   // Unregister zombie
   m_coll.removeElementAt( m_cursor );

   // Unlock registry FIRST and afterwards unlock zombie
   return pZombi;
 }

 Are there any 'bugs' [races, etc.] and/or some things to 'improve', folks?
 ...."

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hans_Boehm@hp.com (Hans-J. Boehm)
Date: Tue, 4 Jun 2002 15:25:35 GMT Raw View

John Nagle <nagle@animats.com> wrote in message news:<3CF5388D.2080401@animats.com>...

>      The overhead of reference counting can be brought down
> down.  I've made some proposals in this area.  The key is hoisting
> reference count maintenance out of loops.  This can be done
> either automatically or manually.
>
I agree.  But there seem to be some costs to doing so:

1. Since reference count updates touch memory, I think it's hard to
optimize a smart pointer-based reference count implementation that the
compiler doesn't know about.  (This is especially true in the
multithreaded case.  For something like a smart pointer implementation
in g++, the actual reference count updates will involve inline
assembly code. See below.)

2. Manual optimizations risk the same kind of memory management bugs
that reference counts were designed to protect against.

3. The fastest reference count collectors (cf. the paper by Levanoni
and Petrank in OOPSLA 2001, or the one by Bacon et al in PLDI 2001)
defer reference count updates.  Thus their performance characteristics
(e.g. the time until an object is actually deallocated) start to look
more like tracing collectors.

In general, I think we have a reasonable idea of what simple reference
counts cost.  There are single-threaded timings of various candidate
Boost smart pointer implementations at
http://www.boost.org/libs/smart_ptr/smarttests.htm.  In addition to
that, you need atomicity in 2 or 3 places for a simple pointer
assignments if the target objects may be shared:

1) If the pointer itself may be shared, and if you want to preserve
atomicity of pointer asignment, (not guaranteed by pthreads or on some
old hardware, but guaranteed by Java, necessary for type-safety, and
very useful for things like caches that don't need to be locked to be
read) you need to do the actual pointer replacement with an atomic
exchange.

2) Atomic increment of reference count.

3) Atomic decrement and test of reference count.

On most modern hardware any sort of atomic memory update costs on the
order of 20-30 cycles.  And you need 2 or 3 of those operations.  If
you compare that to the cycle or two (or less) that might be taken by
a raw pointer update, it's clear that the cost is substantial.  If you
need portable code and acquire some locks instead, the costs are
generally appreciably higher, at least if you want the code to be
multiprocessor-safe.

The above assumes all cache hits.  But reference counting also has a
significant effect on cache behavior, since additional memory
locations must be written, even for something that used to be purely a
register operation.  (Clearly tracing also has substantial costs in
this area.  But they're proportional to the amount of allocation, not
pointer updates.)

Even simple reference counts are still very competitive for large
objects, if they're not updated too often, such as in a file system.
But they're not cheap if the objects are small and pointer assignments
are frequent.

Hans

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Tue, 4 Jun 2002 15:56:24 GMT Raw View

John Nagle <nagle@animats.com> writes:

|>       The basic problem with adding GC to C++ is that, in order to
|>  make it work right, you have to make C++ be more like Java or C#.

Why?  In practice, garbage collection works with C++ already.  There
are a few things that one can officially do which will cause it to
fail, but in practice, for various other reasons, programmers don't to
those things anyway.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Do you need real expertise in C++? in Java? in OO design?
                       I am available, see my CV at www.gabi-soft.de
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)69 63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Sat, 1 Jun 2002 00:07:21 GMT Raw View

     The basic problem with adding GC to C++ is that, in order
to make it work right, you have to make C++ be more like Java
or C#.  It's not clear that the result would be an improvement
over either of those alternatives.

     Reference counting, though, is more consistent with the
C++ way of doing things.  The language retains deterministic
destructors, so all those "resource allocation is initialization"
idioms continue to work.  Destructors get called when you expect
them to be called.

     The main problem with reference counting is performance.
Let's focus on that for a bit.

     I'd argue that it's possible to hoist reference count updates
out of most inner loops automatically, and out of many other
constructs with some programmer help.  Comments from people
who currently implement compilers (my compiler days were years
ago) would be helpful here.

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Sun, 2 Jun 2002 00:59:52 GMT Raw View


Hans-J. Boehm wrote:

> So would you care to propose a smart pointer scheme that
>
> a) Doesn't complicate the user's job, e.g. by requiring that I have
> enough global understanding of the whole program to avoid pointer
> cycles, or requiring me to understand lots of extra usage rules that
> don't apply to regular pointers.


    Perl achieves that, although the performance is
unimpressive.

    What would you think of reference counting for objects
with destructors, and garbage collection for objects
without destructors?

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Sun, 2 Jun 2002 15:43:29 GMT Raw View

James Kanze wrote:

> "Garry Lancaster" <glancaster@ntlworld.com> writes:
>
> |>  People looking for a C++ GC system shouldn't overlook
> |>  boost::shared_ptr. It can't easily cope with cyclic references
> |>  (although at a pinch there are various ways to break the cycle)
> |>  but if you can live with that limitation it may do everything you
> |>  want.
>
> Not really.  It doesn't work in quite a number of real programs.  The
> problem is simple: in C++, you *have* raw pointers.  You can't avoid
> it, because "this" is a raw pointer.  So you inevitably end up with
> some raw pointers that are involved in ownership.  With some counted
> pointers, you can convert these into a counted pointer without
> problems, but with the Boost shared_ptr, you must convert the raw
> pointer exactly once; converting it a second time will lead to
> prematurely freeing the memory.


      Exactly.  I went down this road some time back.  See

        http://www.animats.com/papers/languages/index.html

I proposed an attribute for raw pointers which prevented
making a copy of the pointer that could outlive the original.
This makes safe, low-overhead reference-counted pointers possible,
at some cost in work for the programmer.

Doing this with total backwards compatibility turned out
to be hard, but if the concept seems reasonable, it might
be worth another look at the problem.

In C and C++, some pointers own memory, and some don't.
This is an important distinction, and the language provides
no help in dealing with it.  It probably should.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ross Smith <r-smith@ihug.co.nz>
Date: Mon, 3 Jun 2002 04:11:32 GMT Raw View

John Nagle wrote:
>
>      The main problem with reference counting is performance.
> Let's focus on that for a bit.

A big win here would be move semantics, so that putting smart pointers
into containers didn't require reference counting.

--
Ross Smith ...................................... Auckland, New Zealand
r-smith@ihug.co.nz ....................................................
       "...and to destroy the earth, type 'make destroy_earth'"
                                            -- ClanLib makefile

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Mon, 3 Jun 2002 09:53:21 GMT Raw View

John Nagle wrote:
[...]
>      The overhead of reference counting can be brought down down.

Yup, I think so too. Perhaps somewhat relevant stuff w.r.t. this
topic/discussion:

http://groups.google.com/groups?selm=3CF8C9B2.FBD02BF%40web.de
(Subject: Re: Object access synchronization using Reference Counting)

and, perhaps also:

http://groups.yahoo.com/group/Boost-Users/message/278
("On the other hand placing a memory barrier in shared_ptr...")

regards,
alexander.

P.S. Hey boost clients.. 'alert' (well 'sort-of' ;-)):

     http://lists.boost.org/MailArchives/boost/msg30294.php
     (Subject: [boost] Re: boost::shared_ptr and atomic_count_linux.hpp)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "William E. Kempf" <williamkempf@hotmail.com>
Date: Mon, 3 Jun 2002 15:00:47 GMT Raw View

"John Nagle" <nagle@animats.com> wrote in message
news:3CF6A5BD.4090409@animats.com...
>      The basic problem with adding GC to C++ is that, in order
> to make it work right, you have to make C++ be more like Java
> or C#.  It's not clear that the result would be an improvement
> over either of those alternatives.

Why?  Are you familiar with the Boehm collector
(http://www.hpl.hp.com/personal/Hans_Boehm/gc/)? In what ways does this
implementation either make C++ more like Java/C# or not work right?

>      Reference counting, though, is more consistent with the
> C++ way of doing things.  The language retains deterministic
> destructors, so all those "resource allocation is initialization"
> idioms continue to work.  Destructors get called when you expect
> them to be called.

Is it the determinism that you think makes GC bad for C++?  Why can't we
have both deterministic management for the cases where it matters and we can
do it (i.e. most of our code) but GC in the cases where determinism isn't
important but cyclic references are?  The Boehm collector allows for this,
and even allows you to mix-n-match (i.e. it allows you to delete objects
allocated on the GC heap).

Bill Kempf

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 3 Jun 2002 17:05:01 GMT Raw View

Garry Lancaster:
> > |>  People looking for a C++ GC system shouldn't overlook
> > |>  boost::shared_ptr. It can't easily cope with cyclic references
> > |>  (although at a pinch there are various ways to break the cycle)
> > |>  but if you can live with that limitation it may do everything you
> > |>  want.

James Kanze:
> > Not really.  It doesn't work in quite a number of real programs.  The
> > problem is simple: in C++, you *have* raw pointers.  You can't avoid
> > it, because "this" is a raw pointer.  So you inevitably end up with
> > some raw pointers that are involved in ownership.  With some counted
> > pointers, you can convert these into a counted pointer without
> > problems, but with the Boost shared_ptr, you must convert the raw
> > pointer exactly once; converting it a second time will lead to
> > prematurely freeing the memory.

John Nagle:
>       Exactly.  I went down this road some time back.  See
>
>         http://www.animats.com/papers/languages/index.html
>
> I proposed an attribute for raw pointers which prevented
> making a copy of the pointer that could outlive the original.
> This makes safe, low-overhead reference-counted pointers possible,
> at some cost in work for the programmer.
>
> Doing this with total backwards compatibility turned out
> to be hard, but if the concept seems reasonable, it might
> be worth another look at the problem.
>
> In C and C++, some pointers own memory, and some don't.
> This is an important distinction, and the language provides
> no help in dealing with it.  It probably should.

This work is interesting (especially the idea that we
could have a form of temporary raw pointer with scope
control that it would not be possible to call delete upon)
but I don't see how it relates to my point that
boost::shared_ptr is useful as a form of "GC-lite".
There is a confusion here between garbage
collection and pointer safety. The source of that
confusion is undoubtedly Java, which provides both.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hans_Boehm@hp.com (Hans-J. Boehm)
Date: Mon, 3 Jun 2002 17:56:09 CST Raw View

John Nagle <nagle@animats.com> wrote in message news:<3CF6A5BD.4090409@animats.com>...
>
>      Reference counting, though, is more consistent with the
> C++ way of doing things.  The language retains deterministic
> destructors, so all those "resource allocation is initialization"
> idioms continue to work.  Destructors get called when you expect
> them to be called.
>
My problem is that this sort of synchronous destruction for heap
objects doesn't work, at least not in general.  Either tracing garbage
collectors or smart-pointer-based reference counting is designed to
allow you to ignore deallocation times.  Things go away when you drop
the last reference to them.  Dropping the last reference to an A
object may transitively cause objects of class B to be deallocated
eventhough your code knowns nothing about class B. The class B objects
just happened to be referenced by a long pointer chain from the object
you dropped.

Thus from the programmer's perspective, logically B's destructor runs
asynchrously; you have no practical way of knowing what might be
deallocated in response to a simple pointer assignment.  Once you make
that step, you can't correctly run the destructors synchrously in a
thread that may already hold locks.  Destructors must be able to
acquire locks, and running them synchronously in this way leads to
deadlocks.

This is completely and inherently different from stack allocated
objects, where synchronous destruction makes perfect sense.

Thus I think there is no reason to go out of your way to run
destructors at a precise program point by e.g. adding a reference
counting collector.  All that does is replace asynchronous
finalization, which may be late to reclaim resources, with a mechanism
that's fundamentally incorrect and can lead to a variety of
intermittent failures.

I tried to write this up at
http://www.hpl.hp.com/personal/Hans_Boehm/gc/det_destr.html.  This is
subtle enough that I may try to do a more complete job in the future.

Hans

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: John Nagle <nagle@animats.com>
Date: Wed, 29 May 2002 20:55:34 GMT Raw View

Ross Smith wrote:

 > As far as I know, C++ is the _only_ language -- certainly the only
> widely used one -- that has deterministic destructors. Doing away with

> that would drop the number of languages in which it's possible to do
> reliable resource management from one to zero.


    Perl has deterministic destructors.  Perl is reference-counted,
with strong and weak pointers.  This works out reasonably well.
When the reference count goes to 0, the "destructor" is called.
Immediately.  Thus, destructor execution is deterministic.

    I have many other disagreements with Perl, but I think they
are headed in the right direction on memory management.

    Perl's weak pointers are a recent addition, but provide a
way to avoid memory leaks.  As long as at least one pointer in
a cycle is weak, the memory involved will be released at
some point.

     The overhead of reference counting can be brought down
down.  I've made some proposals in this area.  The key is hoisting
reference count maintenance out of loops.  This can be done
either automatically or manually.

     John Nagle
     Animats



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "David Abrahams" <david.abrahams@rcn.com>
Date: Wed, 29 May 2002 21:33:50 GMT Raw View

"John Nagle" <nagle@animats.com> wrote in message
news:3CF5388D.2080401@animats.com...
> Ross Smith wrote:
>
>  > As far as I know, C++ is the _only_ language -- certainly the only
> > widely used one -- that has deterministic destructors. Doing away with
>
> > that would drop the number of languages in which it's possible to do
> > reliable resource management from one to zero.
>
>
>     Perl has deterministic destructors.  Perl is reference-counted,
> with strong and weak pointers.  This works out reasonably well.
> When the reference count goes to 0, the "destructor" is called.
> Immediately.  Thus, destructor execution is deterministic.
>
>     I have many other disagreements with Perl, but I think they
> are headed in the right direction on memory management.
>
>     Perl's weak pointers are a recent addition, but provide a
> way to avoid memory leaks.  As long as at least one pointer in
> a cycle is weak, the memory involved will be released at
> some point.

Python has all that, plus garbage collection of reference cycles.
However, it's a little bit undermined by the fact that an exception carries
the entire stack backtrace including the execution context at the point the
exception was raised. Destructors don't actually execute until the backtrace
is released.

>      The overhead of reference counting can be brought down
> down.  I've made some proposals in this area.  The key is hoisting
> reference count maintenance out of loops.  This can be done
> either automatically or manually.

That won't help with the reference-counting that goes on at function
boundaries, which I think is the real killer.

-Dave



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Dan Smart <cppdan@dansmart.com>
Date: Thu, 30 May 2002 14:54:51 GMT Raw View

"David Abrahams" <david.abrahams@rcn.com> wrote in
news:ad3gc3$bt8$1@bob.news.rcn.net:
> "John Nagle" <nagle@animats.com> wrote in message
> news:3CF5388D.2080401@animats.com...
> Python has all that, plus garbage collection of reference cycles.
> However, it's a little bit undermined by the fact that an exception
> carries the entire stack backtrace including the execution context at
> the point the exception was raised. Destructors don't actually execute
> until the backtrace is released.
>
In general, Python's memory managment is schweeet. The cost of reference
counting is over-rated, I'll elaborate on just how efficient it can be in
email if anyone is interested. The handling of exceptions in python is
pretty sucky though...
>>      The overhead of reference counting can be brought down
>> down.  I've made some proposals in this area.  The key is hoisting
>> reference count maintenance out of loops.  This can be done
>> either automatically or manually.
>
> That won't help with the reference-counting that goes on at function
> boundaries, which I think is the real killer.
>
This can be alleviated by distinguishing "const reference to handle to non-
const object" from "any reference to handle to const object". Syntactically
this is pretty ugly today, but I'm finding more elegant solutions...

> -Dave
>
Dan "Is that a gun const*, a gun *const, or are you just pleased to see me"
Smart


--
Dan Smart. C++ Programming and Mentoring.
cppdan@dansmart.com
ADDvantaged

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "David Abrahams" <david.abrahams@rcn.com>
Date: Thu, 30 May 2002 12:20:00 CST Raw View

"Dan ``Is that a gun const*, a gun *const, or are you just pleased to see
me'' Smart" <cppdan@dansmart.com> wrote in message
news:Xns921E3AC0678FMe@167.206.112.134...

> "David Abrahams" <david.abrahams@rcn.com> wrote in
> news:ad3gc3$bt8$1@bob.news.rcn.net:

> This can be alleviated by distinguishing "const reference to handle
> to non-const object" from "any reference to handle to const object".

Not for return values, though...

-Dave



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]