Topic: How volatile should work (was: Singleton Pattern Problem on clcm)


Author: James Kanze <James.Kanze@dresdner-bank.de>
Date: 2000/11/15
Raw View
Gerhard Menzl wrote:

> James Kanze wrote:

> > The most frequent use I've seen of volatile *has* been memory
> > mapped IO.  The second most frequent has been for spin locks
> > waiting for an interrupt (effectively a separate thread started by
> > hardware).  I've never used volatile in application code running
> > in user mode.  It does have some uses in multithreaded
> > applications, but for the most part, it's granularity is too low,
> > and generally, you need an atomic load/modify/store which volatile
> > doesn't guarantee anyway.  For that matter, volatile doesn't even
> > guarantee an atomic load, except for the type sig_atomic_t.

> Does it, really? And if it does, doesn't it also guarantee an atomic
> write? All the standard has to say about it is that

> "When the processing of the abstract machine is interrupted by
> receipt of a signal, the values of objects with type other than
> volatile sig_atomic_t are unspecified, and the value of any object
> not of volatile sig_atomic_t that is modified by the handler becomes
> undefined." (1.9/9)

> On the one hand, this subclause is about signal handlers, not about
> threads, so we cannot safely assume that volatile sig_atomic_t also
> guarantees atomic reads and writes in multithreaded programs. On the
> other hand, I find it hard to imagine how an implementer could
> achieve the former without accidentally providing the latter as
> well. I raised this question several months ago, but none of the
> resident compiler gurus took the bait. Maybe this time?

To begin with, neither the C nor the C++ standard acknowledge the
existance of threads.  So *anything* to do with threads is
implementation defined.

In practice, of course, it would be almost impossible for an
implementation to guarantee the atomicity of sig_atomic_t without it
also being atomic for threads.  In practice, too, there are bound to
be many other types with atomic access.  Your point is well taken,
however, that pointers are not bound to be among those types; on older
machines, char* and void* often were *not* atomic.  I know of no
modern, or even half-modern large machine where a class T* would not
be atomic, but C/C++ also targets small, 8 bit embedded processors,
where often, no pointer accesses will be atomic.  (In practice, I
don't think you can reasonably expect to write applications which are
portable between 8 bit embedded processors and the largest
mainframes.)

> In want of a truly portable solution, the best I have been able to
> come with so far is:

Since a truly portable solution can't use threads, the solution is
trivial:-).

>    class Singleton
>    {
>    public:
>       static Singleton& Instance ();

>    private:
>       Singleton () {}
>       static Singleton* InstanceImpl ();
>
>       static Singleton* theInstance;
>       static volatile sig_atomic_t initialized;
>       static Lock lock;      // ultimately OS-dependent, of course
>    };

>    Singleton* Singleton::theInstance = 0;
>    volatile sig_atomic_t Singleton::initialized = 0;
>    Lock Singleton::lock;

>    Singleton& Singleton::Instance ()
>    {
>       if (initialized == 0)
>       {
>          Guard g (lock);
>
>          if (initialized == 0)
>          {
>             if ((theInstance = InstanceImpl () ) != 0)
>                initialized = 1;
>          }
>       }

>       return *theInstance;
>    }

>    Singleton& Singleton::InstanceImpl ()

Singleton*, of course, as the return value.

>    {
>       static Singleton instance;
>       return &instance;
>    }

> Note that I have separated the flag from the pointer, that I have
> moved the actual instantiation of the single instance into a
> separate function, and that I test its return value (although this
> is unnecessary from a purely logical point of view). The question
> is: could a conforming compiler still rearrange writes in such a way
> that the flag would be set before the pointer? I would think that,
> for all practical purposes, it couldn't, but I stand to be
> corrected.

Of course it can.  Strictly speaking, there is nothing that would
prevent the compiler from defering the writes in the constructor until
well after the return from instance.

In practice, it can only defer the writes as far as it can prove that
no other code (in the thread) accesses the variables.  As an absolute
limit, it must stop at a system call, since it has no access to the
code for the system call.  As I said before, either it doesn't know
the semantic of the call, and thus must suppose that the call could
access this data, or it knows the semantic, and since it knows that a
freeing a Mutex has repercussions on threading, it will insert a write
barrier.

None of which helps here, however, since there is no hidden code
(system calls, etc.) between the constructor and setting the flag.
*IF* the constructor is phyically in another translation unit, most
compilers will not defer the writes, but this is simply because most
compilers don't do inter-module optimization (although I know of at
least one exception).

You've solved the problem of the non-atomicity of the write to the
pointer, but in practice, this isn't a problem, and in fact, I hadn't
considered it.  You've still done nothing (from a standards point of
view) to prevent the writes in the constructor from being deferred.

> By the way, this problem is not confined to the Singleton pattern -
> it is a general issue with lazy initialization. The regularity with
> which it comes up and our seeming inability to provide a portable,
> bullet-proof solution have convinced me that at least guaranteed
> atomic reads and writes in the presence of multiple threads should
> be part of the next C++ standard.

I agree that some form of support for threading should be added to the
standard, or at least, that the standard recognized multi-threading,
and specify the behavior in a multi-threaded context.  Don't expect
too much, however.  As long as C/C++ aim to target small, embedded
processors, there will be definite restrictions as to what can or
cannot be required to be atomic.

Perhaps the solution lies along the lines of an additional qualifier,
atomic, something like const or volatile.  It would work as a sort of
a super-volatile: atomic implies volatile, but also that the compiler
must arrange that all accesses be atomic, and that an access to an
atomic introduces an implicite read guard before and write guard
after.

--
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627


      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: Gerhard Menzl <gerhard.menzl@sea.ericsson.se>
Date: 2000/11/14
Raw View
James Kanze wrote:

> The most frequent use I've seen of volatile *has* been memory mapped
> IO.  The second most frequent has been for spin locks waiting for an
> interrupt (effectively a separate thread started by hardware).  I've
> never used volatile in application code running in user mode.  It does
> have some uses in multithreaded applications, but for the most part,
> it's granularity is too low, and generally, you need an atomic
> load/modify/store which volatile doesn't guarantee anyway.  For that
> matter, volatile doesn't even guarantee an atomic load, except for the
> type sig_atomic_t.

Does it, really? And if it does, doesn't it also guarantee an atomic
write? All the standard has to say about it is that

"When the processing of the abstract machine is interrupted by receipt
of a signal, the values of objects with type other than volatile
sig_atomic_t are unspecified, and the value of any object not of
volatile sig_atomic_t that is modified by the handler becomes
undefined." (1.9/9)

On the one hand, this subclause is about signal handlers, not about
threads, so we cannot safely assume that volatile sig_atomic_t also
guarantees atomic reads and writes in multithreaded programs. On the
other hand, I find it hard to imagine how an implementer could achieve
the former without accidentally providing the latter as well. I raised
this question several months ago, but none of the resident compiler
gurus took the bait. Maybe this time?

In want of a truly portable solution, the best I have been able to come
with so far is:

   class Singleton
   {
   public:
      static Singleton& Instance ();

   private:
      Singleton () {}
      static Singleton* InstanceImpl ();

      static Singleton* theInstance;
      static volatile sig_atomic_t initialized;
      static Lock lock;      // ultimately OS-dependent, of course
   };

   Singleton* Singleton::theInstance = 0;
   volatile sig_atomic_t Singleton::initialized = 0;
   Lock Singleton::lock;

   Singleton& Singleton::Instance ()
   {
      if (initialized == 0)
      {
         Guard g (lock);

         if (initialized == 0)
         {
            if ((theInstance = InstanceImpl () ) != 0)
               initialized = 1;
         }
      }

      return *theInstance;
   }

   Singleton& Singleton::InstanceImpl ()
   {
      static Singleton instance;
      return &instance;
   }

Note that I have separated the flag from the pointer, that I have moved
the actual instantiation of the single instance into a separate
function, and that I test its return value (although this is unnecessary
from a purely logical point of view). The question is: could a
conforming compiler still rearrange writes in such a way that the flag
would be set before the pointer? I would think that, for all practical
purposes, it couldn't, but I stand to be corrected.

By the way, this problem is not confined to the Singleton pattern - it
is a general issue with lazy initialization. The regularity with which
it comes up and our seeming inability to provide a portable,
bullet-proof solution have convinced me that at least guaranteed atomic
reads and writes in the presence of multiple threads should be part of
the next C++ standard.

Gerhard Menzl
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]






Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: 15 Nov 00 02:04:40 GMT
Raw View
In article <3A0ED155.137FA647@wizard.net>, James Kuyper
<kuyper@wizard.net> writes
>kanze@gabi-soft.de wrote:
>...
>> That sounds like a very good idea.  I wonder even, can you declare a
>> constructor volatile.  (I don't think so.  I know you cannot declare it
>> const.)
>
>9.3.2p5: "Constructors ... shall not be declared ... volatile ..."

However some of us are considering proposing const constructors as a
future extension. I guess that will allow volatile ones as well.


Francis Glassborow      Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]




Author: Todd Greer <todd.greer@ni.com>
Date: 2000/11/11
Raw View
James Kanze <James.Kanze@dresdner-bank.de> writes:

> > My serious question is still if that rearrange is legal if _instance
> > is volatile qualified.
>
> Yes.  It would not be legal if both instance and the members of the
> object were volatile, however; there is a sequence point at the return
> from the constructor, and any accesses to volatile within the
> constructor must occur before that sequence point.  (The fact that the
> constructor is inline doesn't affect the semantics.)

It sounds as if this is saying that the double-checked pattern is safe
according to the C++ standard if you make all of the shared data
volatile.  If you have a class whose only purpose is to be used as a
singleton, perhaps making everything volotile would be reasonable.

Or, perhaps the ctor could do nothing other than call a volatile
init() method, which would (I think) cause all members to be treated
as volatile, thus forcing the insertion of memory barriers.

OTOH, I've heard that all previous debates about the double-checked
pattern have concluded that it cannot be portably made safe, so
perhaps there's something wrong with my reasoning?

--
Todd Greer <todd.greer@ni.com>
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]






Author: kanze@gabi-soft.de
Date: 2000/11/12
Raw View
Todd Greer <todd.greer@ni.com> writes:

|>  Or, perhaps the ctor could do nothing other than call a volatile
|>  init() method, which would (I think) cause all members to be treated
|>  as volatile, thus forcing the insertion of memory barriers.

That sounds like a very good idea.  I wonder even, can you declare a
constructor volatile.  (I don't think so.  I know you cannot declare it
const.)

Of course, this is only valid for simple classes -- you have no
influence over the constructor of a base class, for example.  In
practice, however, as soon as the constructor becomes a bit complicated,
the compiler will not effectively be able to verify that reordering the
writes is safe, so you should be OK.

|>  OTOH, I've heard that all previous debates about the double-checked
|>  pattern have concluded that it cannot be portably made safe, so
|>  perhaps there's something wrong with my reasoning?

Well, almost by definition, the presence of volatile means
non-portable:-).  Seriously, however, I've never heard this suggestion,
so I don't know if anyone has considered it.

--
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: James Kuyper <kuyper@wizard.net>
Date: 2000/11/12
Raw View
kanze@gabi-soft.de wrote:
...
> That sounds like a very good idea.  I wonder even, can you declare a
> constructor volatile.  (I don't think so.  I know you cannot declare it
> const.)

9.3.2p5: "Constructors ... shall not be declared ... volatile ..."
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]






Author: James.Kanze@dresdner-bank.com
Date: 2000/11/10
Raw View
In article <3A0ABDEE.85D20EEC@nortelnetworks.com>,
  "John Hickin" <hickin@nortelnetworks.com> wrote:
> James Kanze wrote:

> > Note that the compiler optimization is applied to the function
> > *after* any expansion of inline's.  This means that the double
> > lock pattern will be safe with most compilers if and only if the
> > constructor of the object is not inlined.  Not declared inline,
> > but really inlined; a

> I thought that a compiler, when inlining a function, was not allowed
> to elide any sequence points that may have been present in the code
> as the programmer wrote it. Does this have any bearing on the
> problem?

Only if everything is volatile.  Otherwise, the compiler can optimize
to its heart's content.

Typically, most good compilers do rearrange, or even completely
suppress writes.  They do it accross the borders of inlined
functions.  One of the big advantages of inlining is that it gives the
compiler the chance to do these optimizations.

All that the standard requires is that the observable behavior (system
calls and accesses to volatile variables) be unchanged by the
rearrangement.

--
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient   e objet/
                  Beratung in objekt orientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627


Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]






Author: "Balog Pal" <pasa@lib.hu>
Date: 2000/11/07
Raw View
"James Kanze" <James.Kanze@dresdner-bank.de> wrote in message

> > >>        Guard guard(lock);
> > >>         if (!_instance)
> > >>         {
> > >>              _instance = new myobject;
> > >>          }

> There is a requirement that the abstract machine finish all side
> effects preceding a sequence point before it begins evaluating
> anything after that sequence point.  All that the standard requires,
> however, is that the implementation give the same observable behavior
> as the abstract machine.  In this case, there is no observable
> behavior, so the compiler is free to do what it wishes.

Well, that leads to the question what "observable behavior" really is. I
always thought about that the compiler will do a lot of fun with locals
(that are hidden from other observers, unless I take addresses of
something), but the freedom stops for most other stuff. Modifications going
to objects in an outer scope can be deferred, but should happen before an
actual call to an outside function takes place.

> This is no abstract consideration, either.  Optimizers *do* rearrange
> writes, extensively.  The freedom to do so, in the general case, is
> essential for good optimization.

Yes, definitely. I remember even early msc compilers optimized multiple
pages of code to a sinlgle return 0; if nothing used the result. Instruction
reordering also happens in the code generation phase, unfortunately not
always without bugs. I recently found a bug in msvc5 code generation, when
it moved some instructions to a place between a check and a conditional jump
killing the flags. :-o

> > Especially if _instance is volatile, what it must be in order to
> > make the thing work.

Well, that should have been phrased without 'especially'. :)

> Making _instance volatile doesn't really help.  It means that the
> write to _instance is an observable behavior, and *must* take place
> before any observable behavior after the next sequence point, but the
> writes within the constructor can still be deferred.

Is that so? I tried to find some solid description on what exactly
requirements arund volatile should be, vitwout mush success. Therer a thread
about it, but couldn't make me wise either. But I think the obligation
should be more than what you write, the compiler should (better ;) fix
sequence points bots before and after access to the volatile variable, not
just one after. If it only influences the next point, we'd need code like:

   _instance; // force flush
   _instance = new myobject; // now works correctly (?)

Looks quite stupid to me.

I'd be glad if someone could state which behavior is going by the standard,
(and if the standard is ambiguous here, it should be fixed as a defect).

> The problem could take place because the hardware reordered the
> writes.  But most often, it will be the optimizer.

I didn't think about the optimizer believing it's forbidden to optimize
around volatiles.

> Consider the
> simple case where the object consists of two int's, both initialized
> to 0.  On a typical RISC architecture, with no indexed addressing, the
> optimial sequence for _instance = new MyType would be:
>
>     call     operator_new
>     or       #8, g0, o0     ; actually executes before the call
>     store    o0, _instance  ; really three instructions on a Sparc
>     or       g0, g0, @o0    ; g0 always contains 0
>     add      o0, #4, o0
>     or       g0, g0, @o0

That's indeed the optimal code. (However in practice I'd expect something
like
     call     operator_new
     or       #8, g0, o0
     or       g0, g0, @o0
     add      o0, #4, o1
     or       g0, g0, @o1
     store    o0, _instance

I observed good register allocation was never a strong point in the
optimizers. Certainly assuming an environment with all registers stuffed
including the o line and a really good optimizer, we could see it.)

My serious question is still if that rearrange is legal if _instance is
volatile qualified.

....
> The problem of actually issuing memory barriers (as hardware
> instructions) is a real one, but in this case, you first have the
> problem of defining where they must occur, and preventing some
> reorderings of writes.

Yes, definitely. I thought volatile is the language element in C++ that
serve that exact purpose. If not, what is it good for? (Well, that
memory-mapped I/O is mentioned fo often, but I think that might be a
fragment of a percent of use.)

Paul





---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]






Author: "Balog Pal" <pasa@lib.hu>
Date: 2000/11/07
Raw View
"James Kanze" <James.Kanze@dresdner-bank.de> wrote in message

> > >>        Guard guard(lock);
> > >>         if (!_instance)
> > >>         {
> > >>              _instance = new myobject;
> > >>          }

> There is a requirement that the abstract machine finish all side
> effects preceding a sequence point before it begins evaluating
> anything after that sequence point.  All that the standard requires,
> however, is that the implementation give the same observable behavior
> as the abstract machine.  In this case, there is no observable
> behavior, so the compiler is free to do what it wishes.

Well, that leads to the question what "observable behavior" really is. I
always thought about that the compiler will do a lot of fun with locals
(that are hidden from other observers, unless I take addresses of
something), but the freedom stops for most other stuff. Modifications going
to objects in an outer scope can be deferred, but should happen before an
actual call to an outside function takes place.

> This is no abstract consideration, either.  Optimizers *do* rearrange
> writes, extensively.  The freedom to do so, in the general case, is
> essential for good optimization.

Yes, definitely. I remember even early msc compilers optimized multiple
pages of code to a sinlgle return 0; if nothing used the result. Instruction
reordering also happens in the code generation phase, unfortunately not
always without bugs. I recently found a bug in msvc5 code generation, when
it moved some instructions to a place between a check and a conditional jump
killing the flags. :-o

> > Especially if _instance is volatile, what it must be in order to
> > make the thing work.

Well, that should have been phrased without 'especially'. :)

> Making _instance volatile doesn't really help.  It means that the
> write to _instance is an observable behavior, and *must* take place
> before any observable behavior after the next sequence point, but the
> writes within the constructor can still be deferred.

Is that so? I tried to find some solid description on what exactly
requirements arund volatile should be, vitwout mush success. Therer a thread
about it, but couldn't make me wise either. But I think the obligation
should be more than what you write, the compiler should (better ;) fix
sequence points bots before and after access to the volatile variable, not
just one after. If it only influences the next point, we'd need code like:

   _instance; // force flush
   _instance = new myobject; // now works correctly (?)

Looks quite stupid to me.

I'd be glad if someone could state which behavior is going by the standard,
(and if the standard is ambiguous here, it should be fixed as a defect).

> The problem could take place because the hardware reordered the
> writes.  But most often, it will be the optimizer.

I didn't think about the optimizer believing it's forbidden to optimize
around volatiles.

> Consider the
> simple case where the object consists of two int's, both initialized
> to 0.  On a typical RISC architecture, with no indexed addressing, the
> optimial sequence for _instance = new MyType would be:
>
>     call     operator_new
>     or       #8, g0, o0     ; actually executes before the call
>     store    o0, _instance  ; really three instructions on a Sparc
>     or       g0, g0, @o0    ; g0 always contains 0
>     add      o0, #4, o0
>     or       g0, g0, @o0

That's indeed the optimal code. (However in practice I'd expect something
like
     call     operator_new
     or       #8, g0, o0
     or       g0, g0, @o0
     add      o0, #4, o1
     or       g0, g0, @o1
     store    o0, _instance

I observed good register allocation was never a strong point in the
optimizers. Certainly assuming an environment with all registers stuffed
including the o line and a really good optimizer, we could see it.)

My serious question is still if that rearrange is legal if _instance is
volatile qualified.

....
> The problem of actually issuing memory barriers (as hardware
> instructions) is a real one, but in this case, you first have the
> problem of defining where they must occur, and preventing some
> reorderings of writes.

Yes, definitely. I thought volatile is the language element in C++ that
serve that exact purpose. If not, what is it good for? (Well, that
memory-mapped I/O is mentioned fo often, but I think that might be a
fragment of a percent of use.)

Paul
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]






Author: James Kanze <James.Kanze@dresdner-bank.de>
Date: 2000/11/08
Raw View
Balog Pal wrote:

> "James Kanze" <James.Kanze@dresdner-bank.de> wrote in message

> > > >>        Guard guard(lock);
> > > >>         if (!_instance)
> > > >>         {
> > > >>              _instance = new myobject;
> > > >>          }

> > There is a requirement that the abstract machine finish all side
> > effects preceding a sequence point before it begins evaluating
> > anything after that sequence point.  All that the standard
> > requires, however, is that the implementation give the same
> > observable behavior as the abstract machine.  In this case, there
> > is no observable behavior, so the compiler is free to do what it
> > wishes.

> Well, that leads to the question what "observable behavior" really is.

Observable behavior is a system call, or an access to a volatile
variable.  According to the standard.

> I
> always thought about that the compiler will do a lot of fun with locals
> (that are hidden from other observers, unless I take addresses of
> something), but the freedom stops for most other stuff.

This is a side effect of the fact that most compilers optimize a
function at a time, without looking further.  This describes the
actual situation with a lot (but not all) of current compilers.

Note that the compiler optimization is applied to the function *after*
any expansion of inline's.  This means that the double lock pattern
will be safe with most compilers if and only if the constructor of the
object is not inlined.  Not declared inline, but really inlined; a
number of common compilers, including, I think, g++, will inline
functions when the definition is visible, even if the function is not
declared inline.  To prevent inlining with such compilers, it is
necessary to place the function in a separate compilation unit.  And
at least one compiler I know of *will* optimize across function and
even module boundaries -- if the constructor is simple enough, the
compiler will inline it regardless of where it is defined.  (The one
compiler I know of, however, only does this depending on profiling
data, when the function is frequently called.  So it wouldn't affect
this case.)

> Modifications going
> to objects in an outer scope can be deferred, but should happen before an
> actual call to an outside function takes place.

That's not what the standard says, and it isn't what the state of the
art compilers do.  It is what most compilers do if (but only if) the
function isn't inlined.

> > This is no abstract consideration, either.  Optimizers *do*
> > rearrange writes, extensively.  The freedom to do so, in the
> > general case, is essential for good optimization.

> Yes, definitely. I remember even early msc compilers optimized
> multiple pages of code to a single return 0; if nothing used the
> result.

Live analysis is a standard optimization technique.  I remember
similar results when we tried the original Dhrystone benchmark with
one C compiler -- it basically generated the same code as if we'd
written a single puts with the results, and nothing else.  The
Dhrystone was modified shortly thereafter to avoid this.

All good benchmarks make sure that all calculations participate
somehow in a result that is output, in order to foil this.

> Instruction reordering also happens in the code generation
> phase, unfortunately not always without bugs. I recently found a bug
> in msvc5 code generation, when it moved some instructions to a place
> between a check and a conditional jump killing the flags. :-o

Optimization is tricky.  It's a common source of bugs in a compiler.

> > > Especially if _instance is volatile, what it must be in order to
> > > make the thing work.

> Well, that should have been phrased without 'especially'. :)

> > Making _instance volatile doesn't really help.  It means that the
> > write to _instance is an observable behavior, and *must* take
> > place before any observable behavior after the next sequence
> > point, but the writes within the constructor can still be
> > deferred.

> Is that so? I tried to find some solid description on what exactly
> requirements arund volatile should be, vitwout mush success.

Not surprising.  The exact semantics of volatile are pretty much
implementation dependant, although the intent is clear.  From the C
standard: "An object that has volatile-qualified type may be modified
in ways unknown to the implementation, or have other unknown side
effects.  Therefore any expression referring to such an object shall
be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3.  Furthermore, at every sequence point the
value last stored in the object shall agree with that prescribed by
the abstract machine, except as modified by the unknown factors
mentioned previously.  What constitutes an access to an object that
has volatile-qualified type is implementation defined."

The standard also says that access to a volatile object is "observable
behavior."

> Therer a thread
> about it, but couldn't make me wise either. But I think the obligation
> should be more than what you write, the compiler should (better ;) fix
> sequence points bots before and after access to the volatile variable,
not
> just one after.

The volatile qualifier only affects the object it qualifies.
Optimizations (deferal of a write, for example) concerning other
objects are still permitted.  Declaring the pointer volatile ensures
that it will be written before the sequence point ending the full
expression.  It does not ensure that the writes to the non-volatile
constructed object will have taken place.

> If it only influences the next point, we'd need code like:

>    _instance; // force flush
>    _instance = new myobject; // now works correctly (?)

> Looks quite stupid to me.

I'm not sure why the initial flush.

In addition to volatile, most compilers do implement some sort of
write guards around certain function calls; if they cannot trace all
of the effects of the function call, they must do so for any variables
which may be accessible from the function, which generally means
anything static or whose address has been taken.

This is important to begin with.  Consider the code in question:

    T*
    T::instance()
    {
        if ( myInstance == NULL ) {
            Guard m( aLock ) ;
            if ( myInstance == NULL ) {
                myInstance = new T ;
            }
        }
        return myInstance ;
    }

There are at least two function calls here that the compiler
presumably cannot trace: the constructor and the destructor for
Guard.  It cannot trace them because at some level, the involve a
system call.  Either the compiler knows the semantics of the system
call, in which case, it knows that a write guard is necessary, or it
doesn't, in which case, it cannot know that the system call does not
modify myInstance.  In both cases, it must implement a write barrier.
(Since the newly constructed object is accessible through myInstance,
the write barrier must also affect its members.)

Without the write barrier, most compilers would suppress the second if
completely, since it must give the same results as the first.

In order for this idiom to work correctly, it is necessary to create a
second write guard between the constructor of T and the assignment to
myInstance.  If the compiler cannot trace the constructor of T
(because it is not inlined, and the compiler does not do
interfunctional analysis, for example), it must insert a write guard,
since the constructor of T could presumably access myInstance, and it
must see the value before the assign.  (And interesting idea: I'll bet
that adding a std::cout.flush() to the end of the constructor of T
will make the idiom work for almost all, if not all, compilers.)

> I'd be glad if someone could state which behavior is going by the
> standard, (and if the standard is ambiguous here, it should be fixed
> as a defect).

The standard is intentionally ambiguous with regards to volatile.
This is not a defect, but a feature.  (It sounds like I forgot a
smiley, but I'm serious.  The goal is to not restrict what a volatile
object can actually be or do.)

> > The problem could take place because the hardware reordered the
> > writes.  But most often, it will be the optimizer.

> I didn't think about the optimizer believing it's forbidden to
> optimize around volatiles.

> > Consider the
> > simple case where the object consists of two int's, both initialized
> > to 0.  On a typical RISC architecture, with no indexed addressing, the
> > optimial sequence for _instance = new MyType would be:
> >
> >     call     operator_new
> >     or       #8, g0, o0     ; actually executes before the call
> >     store    o0, _instance  ; really three instructions on a Sparc
> >     or       g0, g0, @o0    ; g0 always contains 0
> >     add      o0, #4, o0
> >     or       g0, g0, @o0

> That's indeed the optimal code. (However in practice I'd expect something
> like
>      call     operator_new
>      or       #8, g0, o0
>      or       g0, g0, @o0
>      add      o0, #4, o1
>      or       g0, g0, @o1
>      store    o0, _instance

You'd be surprised what a good peep-hole optimizer can do.  It's true
that in this case, there's no real point to the optimization unless
there is heavy register presure, which wouldn't be the case in the
instance function.  But I could imagine a peep-hole optimizer making
it systematically, rather than first evaluating whether it improves
things.  (It can't make them worse.)

> I observed good register allocation was never a strong point in the
> optimizers. Certainly assuming an environment with all registers
> stuffed including the o line and a really good optimizer, we could
> see it.)

> My serious question is still if that rearrange is legal if _instance
> is volatile qualified.

Yes.  It would not be legal if both instance and the members of the
object were volatile, however; there is a sequence point at the return
from the constructor, and any accesses to volatile within the
constructor must occur before that sequence point.  (The fact that the
constructor is inline doesn't affect the semantics.)

> ....
> > The problem of actually issuing memory barriers (as hardware
> > instructions) is a real one, but in this case, you first have the
> > problem of defining where they must occur, and preventing some
> > reorderings of writes.

> Yes, definitely. I thought volatile is the language element in C++
> that serve that exact purpose. If not, what is it good for? (Well,
> that memory-mapped I/O is mentioned fo often, but I think that might
> be a fragment of a percent of use.)

The most frequent use I've seen of volatile *has* been memory mapped
IO.  The second most frequent has been for spin locks waiting for an
interrupt (effectively a separate thread started by hardware).  I've
never used volatile in application code running in user mode.  It does
have some uses in multithreaded applications, but for the most part,
it's granularity is too low, and generally, you need an atomic
load/modify/store which volatile doesn't guarantee anyway.  For that
matter, volatile doesn't even guarantee an atomic load, except for the
type sig_atomic_t.  On many machines, loading a double will entail two
memory accesses -- if a thread switch can occur between those
accesses, then the double could be modified between the two accesses.
I don't know the modern i80x86 that well, but on the OS I wrote for
the 8086, a thread switch *could* occur between those two accesses.

Threading is part of Java; I'd recommend carefully reading the chapter
of the Java Specification concerning threading (chapter 17).  It's
interesting, because with regards to threading, Java takes the
classical C/C++ attitude of caveat empor: there's undefined behavior
aglore.  The description of what is and is not guaranteed is fairly
good, and corresponds to what most C++ implementations probably do in
practice.

--
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627


      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: "John Hickin" <hickin@nortelnetworks.com>
Date: 2000/11/09
Raw View
James Kanze wrote:

>
> Note that the compiler optimization is applied to the function *after*
> any expansion of inline's.  This means that the double lock pattern
> will be safe with most compilers if and only if the constructor of the
> object is not inlined.  Not declared inline, but really inlined; a

I thought that a compiler, when inlining a function, was not allowed to
elide any sequence points that may have been present in the code as the
programmer wrote it. Does this have any bearing on the problem?


Regards, John.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]