Thread

Topic: strict" mode for C++0x - early draft

Author: John Nagle <nagle@animats.com>
Date: 26 Jul 2001 09:00:12 -0400 Raw View

Hans Boehm wrote:
> The arguments about finalizers seem to get complicated.  Finalizers
> themselves don't seem very complicated to me, assuming you have threads to
> start with, and you use something similar to the Modula-3 semantics, i.e.
> you finalize objects after the ones they depend on..  They have the big
> advantage that typically a tiny fraction of code needs  to deal with them.

    The problem is that C++ destructors are widely used in ways
that assume the destructor gets executed when the program is done
with the object, not at some later time.  Finalizers are fine in
languages that only have finalizers, like Java.
But mixing finalizers and destructors in the same language
creates a mess.  Microsoft tried this with .NET managed objects,
and the semantics are both painfully complicated and unsafe.

    There's nothing wrong with GC in principle; it's retrofitting
it to C++ as generally used that's painful.
> >
> >     It's also worth noting that conservative garbage collection
> > needs an address space much larger than physical memory,

> I have yet to find the perfect memory management technique.   Conservative
> garbage collectors are definitely not perfect, especially not in dense
> address spaces.
>
> But I disagree that reference counting has a better track record.
> Otherwise it has some serious disadvantages that we probably all know:
>
> 1) It slows down pointer assignments and the like.  In the case of portable
> multithreaded code, and if you require that deallocations happen at a
> predictable point, I would claim it slows them down enough that the result
> isn't very usable in many contexts.   What used to be a register to register
> move may now become a 200+ cycle sequence, in the best case.  A
> machine-specific implementation might get that down to 50-60 cycles.  If you
> carry out reference count updates in a separate thread, you can do much
> better, but you lose any sort of promptness guarantees.  (See the paper by
> Bacon et al in this year's PLDI Proceedings.)

     How well can we do for reference counting on common platforms?
What does the BOOST crowd say?  Is atomic INC and DEC enough for
thread-safe reference counts?  Would compiler support help?

     The "auto scope" mechanism I've proposed is basically a way
to cut reference counting overhead, by obtaining a temporary pointer
to a object that can't outlive the object.  This allows
pulling reference count updates out of most inner loops.

> 2) Without some much more complex backup mechanism, it doesn't reclaim
> cycles.  Unlike the data structures that can cause problems with
> conservative (or some generational) collectors, cycles are very common in
> real code, and painful to avoid or handle separately.  Cycles can be created
> across objects allocated by different modules; they're hard to avoid with
> any sort of systematic methodology.

     True.  I'm suggesting that weak pointers (a la Perl, not
a la Java) be provided to make it possible to avoid cycles.
This is not airtight; memory leaks are still possible.  But
they can be avoided.

     There's been work on detecting potential cycles by static
analysis, but that's too researchy for C++.  It might show up
in CASE tools in time, though.

     John Nagle
     Animats
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: "Hans Boehm" <Hans_Boehm@hp.com>
Date: 25 Jul 2001 05:36:48 GMT Raw View

I'm not arguing against John's proposal.  I agree that there are cases in
which this approach makes sense.  But I disagree with some of the details in
the justification:

John Nagle <nagle@animats.com> wrote in message
news:3B5A4ECB.FD3F529B@animats.com...
>
>     I'll get the complexity of the programmer-level explaination
> down in the next rewrite.  From a user perspective, it's
> relatively simple, far more so than anything involving
> finalizers.
The arguments about finalizers seem to get complicated.  Finalizers
themselves don't seem very complicated to me, assuming you have threads to
start with, and you use something similar to the Modula-3 semantics, i.e.
you finalize objects after the ones they depend on..  They have the big
advantage that typically a tiny fraction of code needs  to deal with them.
>
>     I knew Detlefs and Boehm when they were working on Safe
> C++ at Xerox PARC.  I liked many of their ideas.  PARC and
> DEC SRL did some great work in language design.  But Safe
> C++ never went anywhere, and it's been a decade.
But I think this argument is a bit circular.  No part of this approach was
ever integrated into the standard, largely because the timing didn't fit the
standards process.  Anything not sanctioned by the standard tends to get
limited use, which isn't always a bad thing.
>
>     It's also worth noting that conservative garbage collection
> needs an address space much larger than physical memory, or the
> probability that a random bit pattern is a valid pointer becomes
> high.  Now that it's quite possible to populate 4GB of address
> space with physical memory, in a big program any random number
> is likely to be considered a pointer to something.  That
> idea looked better when memory was smaller.  It may again look
> good if 64-bit machines ever take over.  But right now,
> the odds aren't always good.

I personally would give 64 bit machines a good chance of taking over sooner
rather than later (except for embedded applications).  The price of 4GB RAM
seems to be down to < $1000 at your local discount memory store, at least
for some kinds of memory.  It's nice to have a machine that basically
doesn't need to touch the disk.  Once you routinely have machines with that
much memory, my guess is that people will find reasons to address it all at
once.

A smaller address space clearly causes more memory retention for a
conservative collector.  The paper by Hirzel and Diwan in the last ISMM
measures that effect.  Nonetheless, I'm not sure that makes it useless.
Wentworth's 1990 paper on "Pitfalls of Conservative Garbage Collection"
actually seems to get somewhat reasonable results in a dense (16 bit)
address space, except with lazily evaluated potentially infinite data
structures.  Newer techniques would have improved things further.  We now
know that you shouldn't use those with conservative garbage collectors, and
you should be careful with those in combination with generational
collectors.  They also seem to be fairly unique in provoking these problems.

I have yet to find the perfect memory management technique.   Conservative
garbage collectors are definitely not perfect, especially not in dense
address spaces.

But I disagree that reference counting has a better track record.  It's more
common with C++.  But my impression is that a reasonable number of projects
have gotten into serious trouble by going down that road.

It currently has the advantage that you can do it entirely in a user program
in a standard-conforming way.  That's why it's more common.  But I'm not
sure how relevant that is in the context of changing the standard.  It also
sometimes works well with very large objects, and may be the right solution
in very space-constrained situations.

Otherwise it has some serious disadvantages that we probably all know:

1) It slows down pointer assignments and the like.  In the case of portable
multithreaded code, and if you require that deallocations happen at a
predictable point, I would claim it slows them down enough that the result
isn't very usable in many contexts.   What used to be a register to register
move may now become a 200+ cycle sequence, in the best case.  A
machine-specific implementation might get that down to 50-60 cycles.  If you
carry out reference count updates in a separate thread, you can do much
better, but you lose any sort of promptness guarantees.  (See the paper by
Bacon et al in this year's PLDI Proceedings.)

This overhead isn't limited to programs that allocate a lot.  And it appears
to me that thread-unsafe code is becoming less and less interesting.

2) Without some much more complex backup mechanism, it doesn't reclaim
cycles.  Unlike the data structures that can cause problems with
conservative (or some generational) collectors, cycles are very common in
real code, and painful to avoid or handle separately.  Cycles can be created
across objects allocated by different modules; they're hard to avoid with
any sort of systematic methodology.

Hans



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: warrens@seanet.com (Warren)
Date: 21 Jul 2001 15:17:58 GMT Raw View

plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: posting.google.com 995647577 12418 127.0.0.1 (20 Jul 2001 16:46:17
GMT)
X-Complaints-To: groups-support@google.com
NNTP-Posting-Date: 20 Jul 2001 16:46:17 GMT
Content-Length: 2139
X-UID: 0000000005
ReSent-Date: Fri, 20 Jul 2001 10:06:32 -0700 (PDT)
ReSent-From: Steve Clamage <clamage@eng.sun.com>
ReSent-To: c++-submit@netlab.cs.rpi.edu
ReSent-Message-ID: <Pine.SOL.3.96.1010720100632.17836A@taumet>

The whole proposal approaches several areas, memory safety,
ease-of-use, variations in memory paradigm.

You seem to have taken a good look at smart pointers and come up with
suggestions.  Overall, your proposal in the right direction but too
complex.  One of the purposes of smart pointers and GC is reducing the
burden on the programmer.

GC has become very popular in a very short amount of time.  If it's a
fad it's been a long time building.  There is more demand for GC than
reference counting. We need to look more closely at GC efforts for C++
(Safe C++ and Great Circle) and see if there was some limitation in
the C++ language that prevented success.

The choice between reference counting and GC and standard ctor/dtor
usage is not something that I want to cement into my application or my
language or my library if I can help it.  That is, We'll need as much
interoperability as possible. How much is possible?  Threading and
synchronization issues interact with memory managent

If the Boost library uses one model and an OS interface library uses
another, how do I design my app? It is necessary to have some
flexibility with regard to memory management models.

On the other hand, it probably isn't realistic to write both
destructors and finalizers for the same class, "Just in case".
Does the memory management model need to be encoded in the decorated
names of objects to prevent nasty mis-matches?  Can the language
prevent me from putting a pointer to your reference-counted object
into my GC object?  My delayed de-allocation will mess up the
reference counted facility. Can we add memory-type safety to the
existing type-safety and const-correctness?  Should we?

OTHER ISSUES:

The algorithms for reference-counting and for GC must not be fixed by
the compiler.  Programmers must be able to override these algorithms
much as they override new and delete.

Microquill.com makes a living selling memory management libraries
optimized for paged-memory environments and multi-thread and multi-CPU
installations.  The presence of these argues that the solutions should
not be built into the compiler in a permanent way.



      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: John Nagle <nagle@animats.com>
Date: 22 Jul 2001 16:33:44 GMT Raw View

Warren wrote:
> ReSent-From: Steve Clamage <clamage@eng.sun.com>

    Is this Steve Clamage posting?

[ No, it's not. It must be an artifact of the moderation software. -sdc ]

> The whole proposal approaches several areas, memory safety,
> ease-of-use, variations in memory paradigm.
>
> You seem to have taken a good look at smart pointers and come up with
> suggestions.  Overall, your proposal in the right direction but too
> complex.

    I'll get the complexity of the programmer-level explaination
down in the next rewrite.  From a user perspective, it's
relatively simple, far more so than anything involving
finalizers.

> One of the purposes of smart pointers and GC is reducing the
> burden on the programmer.

> We need to look more closely at GC efforts for C++
> (Safe C++ and Great Circle) and see if there was some limitation in
> the C++ language that prevented success.

    I knew Detlefs and Boehm when they were working on Safe
C++ at Xerox PARC.  I liked many of their ideas.  PARC and
DEC SRL did some great work in language design.  But Safe
C++ never went anywhere, and it's been a decade.

    It's also worth noting that conservative garbage collection
needs an address space much larger than physical memory, or the
probability that a random bit pattern is a valid pointer becomes
high.  Now that it's quite possible to populate 4GB of address
space with physical memory, in a big program any random number
is likely to be considered a pointer to something.  That
idea looked better when memory was smaller.  It may again look
good if 64-bit machines ever take over.  But right now,
the odds aren't always good.

> The choice between reference counting and GC and standard ctor/dtor
> usage is not something that I want to cement into my application or my
> language or my library if I can help it.  That is, We'll need as much
> interoperability as possible. How much is possible?

> If the Boost library uses one model and an OS interface library uses
> another, how do I design my app? It is necessary to have some
> flexibility with regard to memory management models.

    My current draft on "strict C++" requires that in strict mode,
you use some library that encapsulates allocation.  But you can
use several different smart pointer libraries in the same
application.  If the libraries reference count correctly and
return "auto" scoped pointers rather than raw pointers, the
allocation system should be safe.

> Threading and
> synchronization issues interact with memory management.

    The C++ standard deals with threading and synchronization
by chanting "it's an operating system issue".  I consider
that a lack, but don't want to fight that battle this week.

    If a smart pointer class is thread-safe, then programs
that use it get thread-safe allocation.

> On the other hand, it probably isn't realistic to write both
> destructors and finalizers for the same class, "Just in case".

     Questions like that convince me more than ever that
you don't want destructors and finalizers in the same language.

> Does the memory management model need to be encoded in the decorated
> names of objects to prevent nasty mis-matches?

     No, ordinary type enforcement handles that.  If you created
"smart_ptr<foo> p;", type enforcement operates as usual.  If you
create "auto foo* q = p;", you lose the memory management
model info.  But that's OK, because you can't do anything
with "q" that involves allocation.

> Can the language
> prevent me from putting a pointer to your reference-counted object
> into my GC object?

    I don't recommend mixing the two, but if you put a smart
pointer into a garbage collected object, the object pointed
to by the smart pointer has to stay around until GC destroys
the garbage collected object.  If an object managed by smart
pointers points to a GC object, then the garbage-collector's
marker should find the object via the smart pointer.

     John Nagle
     Animats


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: John Nagle <nagle@animats.com>
Date: 07 Jul 01 06:46:16 GMT Raw View

Jonathan Thornburg wrote:
>
> In article <3B40D4F8.67223DE@animats.com>,
> John Nagle  <nagle@animats.com> wrote:
> >    In response for requests for a more detailed writeup, here's
> >an early draft:
> >
> >    http://www.animats.com/papers/languages/index.html
> >
> >This provides something concrete to comment on.
>
> Thanks for posting this!
>
> One generic issue which I'd be interested to see addressed, is how
> to handle the *implementation* of STL containers and other features.
> Presumably the implementation will use "unsafe" raw pointers, arrays,
> etc, which is ok as it's separately compiled in "unstrict mode".
>
> The issue arises when we consider inlined implementations which live
> (in significant part) in header files.  We'd need a mechanism to mark
> these header files as "unstrict mode" even when #included from within
> a "strict mode" compilation unit.

    Good point.  How should "strict mode" be turned on and off?

    Some options:

 unsafe {
  // code
 };

 extern "unsafe" {
 };

It's not a key issue right now.

    What I'd like people to focus on is 1) will the model I outlined
work, and 2) is it tolerable to use?  There are many smart pointer
class libraries, but none have reached the level of acceptance of
the STL.  Experience with those indicates that it can't be done right
without some language-level support.

     John Nagle
     Animats

      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: John Nagle <nagle@animats.com>
Date: 03 Jul 01 17:29:03 GMT Raw View

    In response for requests for a more detailed writeup, here's
an early draft:

    http://www.animats.com/papers/languages/index.html

This provides something concrete to comment on.

     John Nagle
     Animats

      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: jthorn@galileo.thp.univie.ac.at (Jonathan Thornburg)
Date: 5 Jul 2001 15:15:26 -0400 Raw View

In article <3B40D4F8.67223DE@animats.com>,
John Nagle  <nagle@animats.com> wrote:
>    In response for requests for a more detailed writeup, here's
>an early draft:
>
>    http://www.animats.com/papers/languages/index.html
>
>This provides something concrete to comment on.

Thanks for posting this!

One generic issue which I'd be interested to see addressed, is how
to handle the *implementation* of STL containers and other features.
Presumably the implementation will use "unsafe" raw pointers, arrays,
etc, which is ok as it's separately compiled in "unstrict mode".

The issue arises when we consider inlined implementations which live
(in significant part) in header files.  We'd need a mechanism to mark
these header files as "unstrict mode" even when #included from within
a "strict mode" compilation unit.  Similarly, we'd need to allow STL
templates to use raw pointers, even if instantiated from "strict mode"
code.

This implies that a compiler needs to be able to switch modes at a
fairly fine grain.

--
-- Jonathan Thornburg <jthorn@thp.univie.ac.at>
   Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
   Golm, Germany             http://www.thp.univie.ac.at/~jthorn/home.html
   "It's every man for himself, said the elephant as he danced on the
    anthill!" -- T. C. "Tommy" Douglas's description of ((you guess))
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]