Thread

Topic: Threading issue in next standard

Author: "DocInverter" <google@icemx.net>
Date: Thu, 14 Sep 2006 19:07:43 CST Raw View

John Nagle wrote:
> James Dennett wrote:
> > DocInverter wrote:
[...]
> >> I believe that it is not necessary for the language/compiler to know
> >> about lock/unlock operations being special:
[...]
> > But compilers can know more, and can do inlining between
> > different translation units.
[...]
> >  Relying on quirks of simple
> > implementations isn't good enough; if we want to ensure
> > that code isn't moved across certain function calls, that
> > needs to be specified by the standard.
>
>     Yes.  This needs to be done both correctly and efficiently.
> With multiprocessors becoming far more common, hand-waving
> isn't good enough any more.

Hm, all my thread-safe primitives are GCC "asm volatile" inlines that
clobber "memory", which is safer than hiding in another translation
unit and works good enough for me (8-way Sparc).

Which is to say: I fully agree with your original statement; I'm so
used to GCC extensions that I sometimes forget they are not standard
(and of course GCCs "asm volatile" is not even perfect for the job).

Regards, Colin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: n1135837634.ch@chch.demon.co.uk (Charles Bryant)
Date: Fri, 15 Sep 2006 00:10:17 GMT Raw View

In article <n_OMg.10003$yO7.3500@newssvr14.news.prodigy.com>,
John Nagle <nagle@animats.com> wrote:
>Joe Seigh wrote:
>> John Nagle wrote:
>>>     Take a look at our "http://www.overbot.com/public/qnx/mutexlock.h",
>>
>> Is there a possible race condition in atomic_incmod_value?  It seems
>> to assume that loc will never get larger than 2*mod - 1 before the
>> decrement occurs.
>
>     Yes, that's a problem, but because that function is only used
>for the bounded buffer code, which uses a Semaphore to limit the
>number of items in the queue, that's not a bug.  But I'm not
>happy with it.

The code in question is:

inline unsigned atomic_incmod_value(volatile unsigned* loc,
                unsigned mod)
{ unsigned oldval = atomic_add_value(loc, 1);// add 1, return value, atomic operation
 if (oldval >= mod)// if overflow
 { if (oldval == mod)// if exactly at overflow
  { atomic_sub(loc,mod); }// must reduce by one cycle
  oldval %= mod;// must reduce result
 }
 return(oldval);
}

I believe the following would be more correct, in the sense that it
avoids the problem of hitting 2*mod:

inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
{
 unsigned oldval = atomic_add_value(loc, 1);
 if (oldval >= mod) { // if overflow
  do {  // very unlikely to loop
   oldval -= mod;
  } while (oldval >= mod);
  if (!oldval) atomic_sub(loc,mod);
 }
 return oldval;
}

The idea is that the value must be reduced by 'mod' once for every time it
hits a multiple of 'mod'. Every time it hits a multiple of 'mod', this
must be because it was incremented there by some thread, so we get
that thread to do the corresponding reduction.

However, there is a remaining theoretical problem. If sufficient
threads execute this concurrently such that *loc wraps, then it fails
unless mod is a power of two because the result is 2^B + oldval which
is not equivalent to oldval modulo mod. However, since that would
require at least 65536 threads, it is really only a problem in theory.

Neither function is fully atomic - meaning that its effect is either
completely observable or completely not. An intermediate value of the
location being incremented can be observed, even if read with an 'atomic
read' operation.  The functions could be called self-atomic - meaning
that concurrent invocations of the same function behave as if they
were executed sequentially. One of the things that makes atomic
primitives so difficult to use correctly is the fact that the
composition of atomic operations is very rarely atomic.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jseigh_01@xemaps.com (Joe Seigh)
Date: Fri, 15 Sep 2006 17:51:22 GMT Raw View

Charles Bryant wrote:
> The code in question is:
>
> inline unsigned atomic_incmod_value(volatile unsigned* loc,
>                 unsigned mod)
> { unsigned oldval = atomic_add_value(loc, 1);// add 1, return value, atomic operation
>  if (oldval >= mod)// if overflow
>  { if (oldval == mod)// if exactly at overflow
>   { atomic_sub(loc,mod); }// must reduce by one cycle
>   oldval %= mod;// must reduce result
>  }
>  return(oldval);
> }
>
> I believe the following would be more correct, in the sense that it
> avoids the problem of hitting 2*mod:
>
> inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
> {
>  unsigned oldval = atomic_add_value(loc, 1);
>  if (oldval >= mod) { // if overflow
>   do {  // very unlikely to loop
>    oldval -= mod;
>   } while (oldval >= mod);
>   if (!oldval) atomic_sub(loc,mod);
>  }
>  return oldval;
> }
>
> The idea is that the value must be reduced by 'mod' once for every time it
> hits a multiple of 'mod'. Every time it hits a multiple of 'mod', this
> must be because it was incremented there by some thread, so we get
> that thread to do the corresponding reduction.

You have a race condition between the "(oldval >= mod)" and "(!oldval)"
and the "atomic_sub(loc, mod)".  You need compare and swap to make the
compare and update atomic.  Otherwise use

inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
{
 unsigned oldval = atomic_add_value(loc, 1)%mod;
 if (oldval == 0) { // overflow
  atomic_sub(loc, mod);
 return oldval;
}

The original code appears to be trying to avoid using division in
the mainline code so compare and swap may be the way to go.



>
> However, there is a remaining theoretical problem. If sufficient
> threads execute this concurrently such that *loc wraps, then it fails
> unless mod is a power of two because the result is 2^B + oldval which
> is not equivalent to oldval modulo mod. However, since that would
> require at least 65536 threads, it is really only a problem in theory.

The number of threads wouldn't make any difference if you have
more than one producer thread and one consumer thread.
>
> Neither function is fully atomic - meaning that its effect is either
> completely observable or completely not. An intermediate value of the
> location being incremented can be observed, even if read with an 'atomic
> read' operation.  The functions could be called self-atomic - meaning
> that concurrent invocations of the same function behave as if they
> were executed sequentially. One of the things that makes atomic
> primitives so difficult to use correctly is the fact that the
> composition of atomic operations is very rarely atomic.
>

The writes are atomic AFAICT.  Or are you referring to the fact that
reading the array contents is a separate operation than the operation
to adjust the index into the array?  If one thread pauses between
those two operations, other threads can wrap the index around and
over the index the paused thread is using, making the paused thread's
next operation operation incorrectly.  You can only get away with
using semaphores w/o mutexes for single producer, single consumer
situations.  And in that case, you can use a distributed algorithm
to eliminate interlocked updates of the array indices.

Lock-free FIFO queues are essentially producer/consumer algorithms
and they use some tricky techniques to avoid the above problem.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: usenet@leapheap.co.uk
Date: Sat, 16 Sep 2006 10:22:17 CST Raw View

Joe Seigh wrote:
> Charles Bryant wrote:
> > The code in question is:
> >
> > inline unsigned atomic_incmod_value(volatile unsigned* loc,
> >                 unsigned mod)
> > { unsigned oldval = atomic_add_value(loc, 1);// add 1, return value, atomic operation
> >  if (oldval >= mod)// if overflow
> >  { if (oldval == mod)// if exactly at overflow
> >   { atomic_sub(loc,mod); }// must reduce by one cycle
> >   oldval %= mod;// must reduce result
> >  }
> >  return(oldval);
> > }
> >
> > I believe the following would be more correct, in the sense that it
> > avoids the problem of hitting 2*mod:
> >
> > inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
> > {
> >  unsigned oldval = atomic_add_value(loc, 1);
> >  if (oldval >= mod) { // if overflow
> >   do {  // very unlikely to loop
> >    oldval -= mod;
> >   } while (oldval >= mod);
> >   if (!oldval) atomic_sub(loc,mod);
> >  }
> >  return oldval;
> > }
> >
> > The idea is that the value must be reduced by 'mod' once for every time it
> > hits a multiple of 'mod'. Every time it hits a multiple of 'mod', this
> > must be because it was incremented there by some thread, so we get
> > that thread to do the corresponding reduction.
>
> You have a race condition between the "(oldval >= mod)" and "(!oldval)"
> and the "atomic_sub(loc, mod)".  You need compare and swap to make the
> compare and update atomic.  Otherwise use
>
> inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
> {
>  unsigned oldval = atomic_add_value(loc, 1)%mod;
>  if (oldval == 0) { // overflow
>   atomic_sub(loc, mod);
>  return oldval;
> }
[snip]

Wait-free increment modulo n?

If you can stand your buffer size constrained to being a power of 2
(say 2**a),
use exchange-and-add with an addend of 2**(W-a), where W is the number
of
bits in a computer word. You benefit from automatic wrapround.
Afterwards
the thread translates the scaled pointer at its leisure, safe from
interaction with other threads.

Example: buffer size of 16KB on 32-bit machine. Start the pointer at
zero and
exchange-and-add 256K each time. Having obtained a pointer value, the
thread shifts right 18 bits and adds the buffer base address to get the
address
of the buffer element (in this case a single byte) in question.

Works fine on Pentium machines with the XADD instruction. Also, you can
tell
when a thread has wrapped the buffer (carry flag) or reached the
halfway mark
(overflow flag), which is occasionally useful.

Regards,
Chris Noonan

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: John Nagle <nagle@animats.com>
Date: Sat, 16 Sep 2006 12:17:55 CST Raw View

usenet@leapheap.co.uk wrote:
> Joe Seigh wrote:
>
>>Charles Bryant wrote:
>>>John Nagle wrote:
> Wait-free increment modulo n?

     I think we've demonstrated that atomic increment mod N is worth having
as a primitive.  It's hard to do with the existing primitives, it's
frequently needed for circular buffers, and the optimal implementation
varies with the hardware platform.  That makes it a good candidate for
standardization.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: n1568288018.ch@chch.demon.co.uk (Charles Bryant)
Date: Sat, 16 Sep 2006 20:50:36 GMT Raw View

In article <GNmdnVbkaKJ0H5fYnZ2dnUVZ_qadnZ2d@comcast.com>,
Joe Seigh <jseigh_01@xemaps.com> wrote:
>Charles Bryant wrote:
>> inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
>> {
>>  unsigned oldval = atomic_add_value(loc, 1);
>>  if (oldval >= mod) { // if overflow
>>   do {  // very unlikely to loop
>>    oldval -= mod;
>>   } while (oldval >= mod);
>>   if (!oldval) atomic_sub(loc,mod);
>>  }
>>  return oldval;
>> }
>>
>
>You have a race condition between the "(oldval >= mod)" and "(!oldval)"
>and the "atomic_sub(loc, mod)".

I don't see a race. Can you give an example?

>You need compare and swap to make the
>compare and update atomic.

I was assuming that the only primitive available was the
atomic_add_value().

>Otherwise use
>
>inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
>{
> unsigned oldval = atomic_add_value(loc, 1)%mod;
> if (oldval == 0) { // overflow
>  atomic_sub(loc, mod);
> return oldval;
>}

Isn't that exectly the same as my version except it uses a '%' operator?

>The original code appears to be trying to avoid using division in
>the mainline code so compare and swap may be the way to go.

The *original* code uses '%'. Mine avoids it.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: n552374065.ch@chch.demon.co.uk (Charles Bryant)
Date: Sun, 17 Sep 2006 01:33:40 GMT Raw View

In article <i7VOg.1736$6S3.1728@newssvr25.news.prodigy.net>,
John Nagle <nagle@animats.com> wrote:
>     I think we've demonstrated that atomic increment mod N is worth having
>as a primitive.  It's hard to do with the existing primitives, it's
>frequently needed for circular buffers, and the optimal implementation
>varies with the hardware platform.  That makes it a good candidate for
>standardization.

I disagree. The code which used the version appearing in this thread
did not need an atomic increment (with or without mod N). A mutually
atomic load and store are sufficient (ignoring memory ordering issues,
where atomic increment won't help).

I think there are some fairly major disadvantages of standarising
atomic operations, the biggest being that they seem to be an area in
which programmers greatly overestimate their abilities, because atomic
operations are extremely difficult to use correctly, yet appear quite
simple.  This leads to them being used incorrectly, producing code
which appears to work, passes lots of tests, but contains race
conditions which are difficult to track down.

Another problem is caused by variation in existing hardware. Simply
using mutexes and condition variables is adequate for almost all
tasks, while being vastly easier to get right than trying to use
atomic operations. In the few cases where this simple approach is too
slow, having standarised atomic operations is little help, because the
only way to make the code faster is to tune it to the platform. For
example, if someone wants to maintain a count for statistical
purposes, it would be wrong to just assume that an atomic increment is
the best way to write it. The platform may not have an atomic
increment, so a standard library forced to provide it would implement
it as lock/add/unlock - at least as bad as doing that yourself. Maybe
the platform has a highly efficient thread-local storage, so you could
keep a separate count per thread and add them later.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jseigh_01@xemaps.com (Joe Seigh)
Date: Sun, 17 Sep 2006 18:06:05 GMT Raw View

Charles Bryant wrote:
> In article <GNmdnVbkaKJ0H5fYnZ2dnUVZ_qadnZ2d@comcast.com>,
> Joe Seigh <jseigh_01@xemaps.com> wrote:
>
>>Charles Bryant wrote:
>>
>>>inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
>>>{
>>> unsigned oldval = atomic_add_value(loc, 1);
>>> if (oldval >= mod) { // if overflow
>>>  do {  // very unlikely to loop
>>>   oldval -= mod;
>>>  } while (oldval >= mod);
>>>  if (!oldval) atomic_sub(loc,mod);
>>> }
>>> return oldval;
>>>}
>>>
>>
>>You have a race condition between the "(oldval >= mod)" and "(!oldval)"
>>and the "atomic_sub(loc, mod)".
>
>
> I don't see a race. Can you give an example?
>
>
>>You need compare and swap to make the
>>compare and update atomic.
>
>
> I was assuming that the only primitive available was the
> atomic_add_value().
>
>
>>Otherwise use
>>
>>inline unsigned selfatomic_incmod_value(unsigned *loc, unsigned mod)
>>{
>> unsigned oldval = atomic_add_value(loc, 1)%mod;
>> if (oldval == 0) { // overflow
>>  atomic_sub(loc, mod);
>> return oldval;
>>}
>
>
> Isn't that exectly the same as my version except it uses a '%' operator?

You're right.  I misread the code.  I was thinking of some different
logic.  Yours doesn't have a race condition either.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "DocInverter" <google@icemx.net>
Date: Fri, 8 Sep 2006 10:12:53 CST Raw View

John Nagle wrote:
>     It does help if the compiler knows that some operation is
> unlocking a lock.  It's reasonable for an implementation to
> flush any non-auto variables from a register when unlocking
> any lock.  That will deal with most machine-level race
> conditions without programmer intervention.  (By that, I
> mean that if the programmer uses a mutex to protect shared
> data, the issues associated with CPU and cache coherence
> are dealt with the compiler and the hardware.  That's as
> it should be.)
>
>     But if the compiler has no knowledge of locks, then it
> either has to flush too often, or will fail to flush a register
> at a crucial moment.  So the language needs at least enough
> thread support that the compiler knows when an unlock has occured.

I believe that it is not necessary for the language/compiler to know
about lock/unlock operations being special:

Suppose you lock() a mutex, modify some shared data, and then unlock()
again. If at this point the compiler knows only the prototype of lock()
and unlock(), it must assume that any non-stack-local memory locations
are read from/written to by these functions, and therefore flush all
data or re-read all data, respectively.

(For my own spinlock implementation that works only on GCC I use inline
assembly and include "memory" in the list of clobbered locations, that
does the same trick as not having the implementation of lock()/unlock()
visible at the point of use.)

(It is then the responsability of the lock() and unlock() functions to
include the appropriate memory barriers to make sure that these changes
are seen across all CPU cores; that's one of the main things missing in
volatile that render it absolutely useless for synchronising across
threads; only exception is something like the implementation of
lock()/unlock(), read Linus Torvald's take on volatile on LKML if you
don't believe me.)

Regards, Colin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Earl Purple" <earlpurple@gmail.com>
Date: Fri, 8 Sep 2006 11:32:23 CST Raw View

Maciej Sobczak wrote:
>
> Object *p = new Object();
> // set up the new object
>
> mtx.lock();
> queue.push(p);
> mtx.unlock();
>
> above, the object in question was created and constructed *outside* of
> the scope of mutex, and the mutex itself is only guarding the queue's
> internal stuff. If you associate the mutex with the queue only (and
> that's the only thing you can statically declare), then consumers will
> see the new pointer in the queue, but not necessarily the object that it
> points to. (On the other, hand, if you try to extend the scope of mutex
> to cover also the object's construction, it might increase contention.)

I have a ProdConQueue template which is basically what you are
implementing here. Now its push() method does the locking for you so
it's pretty similar to what you have there.

Above you are pushing a pointer onto the queue. The thread that does
the pushing is not going to call delete so I don't see the problem. If
you pushed shared_ptr<Object> then it would be important that the
reference counting is thread-safe because both threads might lose their
last reference at the same time. That would be a problem, incidentally,
only after the consumer thread has popped it off the queue. As long as
its in the queue there is at least one reference to it. The locking
ensures that.

> Note that above, the pointer indirection might not be so explicit. Think
> about std::queue of std::strings. Or std::queue of Persons, where Person
> has some std::string attributes.

If the string is implemented with copy-on-write then again the problem
would occur after the consumer thread has taken it off the queue. Until
then, nobody is going to write to the string, and the queue will ensure
that it increments the reference count before the producer lets go.

If string is not implemented with copy-on-write there should be no
thread issues at all as both threads are working with different copies.

> Of course, we might as well drop our habits and accommodate some new
> ones. Hell, we're going to do this anyway. ;-)

Along with ProdConQueue I have a ConsumerThread class. I thought it
would be a pretty "standard" behaviour but actually I have 3 different
behaviours. My ProdConQueue class allows multiple pushes and pops,
(called insert and flush). These lock once then add or remove multiple
items at a time. flush commonly used where I have just one consumer
thread. Yet my two instances of threads that flush both have different
behaviour -  one splits it into individual items and processes them one
at a time, the other one processes the whole lot together.

My pop() and flush() functions also both come with timed_wait options.
Does the consumer-thread use a timed-wait or not? Again specific to
requirements.

I would be interested to see how other producer-consumer queues have
been implemented and how they compare to mine.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Chris Thomasson" <cristom@comcast.net>
Date: Fri, 8 Sep 2006 13:38:09 CST Raw View

"Earl Purple" <earlpurple@gmail.com> wrote in message
news:1157724904.382816.213090@e3g2000cwe.googlegroups.com...


[...]


> I would be interested to see how other producer-consumer queues have
> been implemented and how they compare to mine.

Sure. Here is a trick you can use to implement efficient lock-free queues:


http://groups.google.ca/group/comp.programming.threads/msg/a359bfb41c68b98b

http://groups.google.ca/group/comp.programming.threads/msg/a1f155896395ffb9
(multi-producer/consumer)




http://appcore.home.comcast.net/
(my single producer/consumer)

AppCore is my demo library for lock-free algorithms... Apparently its fairly
"popular"

http://groups.google.ca/group/comp.programming.threads/browse_frm/thread/205dcaed77941352/d154b56f0f233cef#d154b56f0f233cef
(take a look, read links)



I support timed-waits through the use of my eventcount synchronization
primitive. This is analogous to a condition variable except that it can be
used to wait on a lock-free data-structures natural conditions (e.g., queue
empty, queue full, ect...):

http://groups.google.ca/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380/8e7a0379b55557c0?lnk=gst&q=simple+portable+eventcount&rnum=1#8e7a0379b55557c0
(my first pseudo-code implementation)


http://appcore.home.comcast.net/appcore/src/ac_eventcount_algo1_c.html
http://appcore.home.comcast.net/appcore/include/ac_eventcount_algo1_h.html
(my first real implementation)



I would be happy to discuss all of this if you are interested!


Thank you.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Sat, 9 Sep 2006 13:42:48 GMT Raw View

Earl Purple wrote:
> I would be interested to see how other producer-consumer queues have
> been implemented and how they compare to mine.

     Take a look at our "http://www.overbot.com/public/qnx/mutexlock.h",
which we used on our DARPA Grand Challenge vehicle.  "class BoundedBuffer"
is the key section.  This implements a thread safe producer/consumer queue
for POSIX threads.  "atomic_inc" and "atomic_dec" are used.  Take a look
at "atomic_incmod_value", which does "add 1 mod N" without locking.
That's painful enough that it's worth having atomic modular addition
as a standard "atomic" function, since it's a basic primitive for
circular buffers.

     (There's some QNX-specific code; QNX, a real-time OS, allows timeouts on
a mutex, which is an extension to the POSIX standard).

    John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jdennett@acm.org (James Dennett)
Date: Sat, 9 Sep 2006 14:41:02 GMT Raw View

DocInverter wrote:
> John Nagle wrote:
>>     It does help if the compiler knows that some operation is
>> unlocking a lock.  It's reasonable for an implementation to
>> flush any non-auto variables from a register when unlocking
>> any lock.  That will deal with most machine-level race
>> conditions without programmer intervention.  (By that, I
>> mean that if the programmer uses a mutex to protect shared
>> data, the issues associated with CPU and cache coherence
>> are dealt with the compiler and the hardware.  That's as
>> it should be.)
>>
>>     But if the compiler has no knowledge of locks, then it
>> either has to flush too often, or will fail to flush a register
>> at a crucial moment.  So the language needs at least enough
>> thread support that the compiler knows when an unlock has occured.
>
> I believe that it is not necessary for the language/compiler to know
> about lock/unlock operations being special:
>
> Suppose you lock() a mutex, modify some shared data, and then unlock()
> again. If at this point the compiler knows only the prototype of lock()
> and unlock(), it must assume that any non-stack-local memory locations
> are read from/written to by these functions, and therefore flush all
> data or re-read all data, respectively.

But compilers can know more, and can do inlining between
different translation units.  Relying on quirks of simple
implementations isn't good enough; if we want to ensure
that code isn't moved across certain function calls, that
needs to be specified by the standard.  (AFAIK, a legal
implementation of C++ could save all code generation
until link time, or even runtime, when it can know about
all included code.  C++ is carefully design to *allow*
separate compilation, but apart from requiring certain
diagnostics doesn't specify in detail how much work has
to be done when translating a TU.)

-- James

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: John Nagle <nagle@animats.com>
Date: Sat, 9 Sep 2006 15:59:01 CST Raw View

James Dennett wrote:
> DocInverter wrote:
>
>> John Nagle wrote:
>>
>>>     It does help if the compiler knows that some operation is
>>> unlocking a lock.  It's reasonable for an implementation to
>>> flush any non-auto variables from a register when unlocking
>>> any lock.  That will deal with most machine-level race
>>> conditions without programmer intervention.  (By that, I
>>> mean that if the programmer uses a mutex to protect shared
>>> data, the issues associated with CPU and cache coherence
>>> are dealt with the compiler and the hardware.  That's as
>>> it should be.)
>>>
>>>     But if the compiler has no knowledge of locks, then it
>>> either has to flush too often, or will fail to flush a register
>>> at a crucial moment.  So the language needs at least enough
>>> thread support that the compiler knows when an unlock has occured.
>>
>>
>> I believe that it is not necessary for the language/compiler to know
>> about lock/unlock operations being special:
..

>
> But compilers can know more, and can do inlining between
> different translation units.

    Some Intel compilers for embedded CPUs do that.

    Locking and unlocking should be inlineable,
for efficiency.  This typically requires
generation of special instructions.  Current high performance
solutions tend to involve inline functions with
embedded assembly code, with the attendant problems
of that approach.

>  Relying on quirks of simple
> implementations isn't good enough; if we want to ensure
> that code isn't moved across certain function calls, that
> needs to be specified by the standard.

    Yes.  This needs to be done both correctly and efficiently.
With multiprocessors becoming far more common, hand-waving
isn't good enough any more.

   John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: cristom@comcast.net ("Chris Thomasson")
Date: Sun, 10 Sep 2006 05:20:46 GMT Raw View

"James Dennett" <jdennett@acm.org> wrote in message
news:yhAMg.14880$lv.13789@fed1read12...
> DocInverter wrote:
>> John Nagle wrote:

[...]

> But compilers can know more, and can do inlining between

[...]

Read this proposal:


http://groups.google.com/group/comp.programming.threads/msg/1d9d4e6b888609e4


What do you think?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: cristom@comcast.net ("Chris Thomasson")
Date: Sun, 10 Sep 2006 05:20:07 GMT Raw View

"kanze" <kanze@gabi-soft.fr> wrote in message
news:1157532880.157500.119000@i3g2000cwc.googlegroups.com...
> Alan McKenney wrote:
>> kanze wrote:

>    p = new C;                  if ( p != NULL ) p->someFunctionInC() ;

For clarification I would define the solution:


Processor 1
-------------------

x = new C;
#StoreLoad | #StoreStore
p = x;



Processor 2
-------------------

x = p;
#LoadLoad /w Data-Dependency Hint
if (x) { x->someFunctionInC(); }





or you could do this in certain situations:


Processor 1
-------------------

x = new C;
#StoreStore
p = x;



Processor 2
-------------------

x = p;
#LoadLoad /w Data-Dependency Hint
if (x) { x->someFunctionInC(); }




Also, take a look at this solution:

http://groups.google.com/group/comp.programming.threads/msg/ca2f1af4552233df

This works on TSO...


Any questions?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jseigh_01@xemaps.com (Joe Seigh)
Date: Sun, 10 Sep 2006 05:21:20 GMT Raw View

John Nagle wrote:
> Earl Purple wrote:
>
>> I would be interested to see how other producer-consumer queues have
>> been implemented and how they compare to mine.
>
>
>     Take a look at our "http://www.overbot.com/public/qnx/mutexlock.h",
> which we used on our DARPA Grand Challenge vehicle.  "class BoundedBuffer"
> is the key section.  This implements a thread safe producer/consumer queue
> for POSIX threads.  "atomic_inc" and "atomic_dec" are used.  Take a look
> at "atomic_incmod_value", which does "add 1 mod N" without locking.
> That's painful enough that it's worth having atomic modular addition
> as a standard "atomic" function, since it's a basic primitive for
> circular buffers.
>
>     (There's some QNX-specific code; QNX, a real-time OS, allows
> timeouts on
> a mutex, which is an extension to the POSIX standard).
>
>                 John Nagle
>

Is there a possible race condition in atomic_incmod_value?  It seems
to assume that loc will never get larger than 2*mod - 1 before the
decrement occurs.  It it ever reached that point, loc would increase
forever without being decremented and would eventually overflow.
Compare and swap logic could take care of that situation.

There are algorithms that assume forward progress before a number
reached a certain size or wrapped but that value is usually large
enough to be infinity for all practical purposes.  Something close
to 2**64.  2**32 is too small for todays processors.

You might want to look at some fast pathed semaphore code.  This code
http://groups-beta.google.com/group/comp.programming.threads/msg/ea28d867d9cd30a3
It's for windows but I think I tried a version of it on Linux, which already
has a fast pathed semaphore, and it improved its performance.  You want
to avoid doing syscalls on every semaphore wait and post if it isn't
necessary.   Note that the windows version assumes necessary memory
barriers on the win32 atomic ops which is only true on intel and not
on powerpc (xbox360) so they will have to be added in those cases.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jseigh_01@xemaps.com (Joe Seigh)
Date: Sun, 10 Sep 2006 12:52:53 GMT Raw View

[note: followup set to comp.programming.threads]


Joe Seigh wrote:
> John Nagle wrote:
>
>> Earl Purple wrote:
>>
>>> I would be interested to see how other producer-consumer queues have
>>> been implemented and how they compare to mine.
>>
>>
>>
>>     Take a look at our "http://www.overbot.com/public/qnx/mutexlock.h",
>> which we used on our DARPA Grand Challenge vehicle.  "class
>> BoundedBuffer"
>> is the key section.  This implements a thread safe producer/consumer
>> queue
>> for POSIX threads.  "atomic_inc" and "atomic_dec" are used.  Take a look
>> at "atomic_incmod_value", which does "add 1 mod N" without locking.
>> That's painful enough that it's worth having atomic modular addition
>> as a standard "atomic" function, since it's a basic primitive for
>> circular buffers.
>>
>>     (There's some QNX-specific code; QNX, a real-time OS, allows
>> timeouts on
>> a mutex, which is an extension to the POSIX standard).
>>
>>                 John Nagle
>>
>
> Is there a possible race condition in atomic_incmod_value?  It seems
> to assume that loc will never get larger than 2*mod - 1 before the
> decrement occurs.  It it ever reached that point, loc would increase
> forever without being decremented and would eventually overflow.
> Compare and swap logic could take care of that situation.
>
> There are algorithms that assume forward progress before a number
> reached a certain size or wrapped but that value is usually large
> enough to be infinity for all practical purposes.  Something close
> to 2**64.  2**32 is too small for todays processors.
>
> You might want to look at some fast pathed semaphore code.  This code
> http://groups-beta.google.com/group/comp.programming.threads/msg/ea28d867d9cd30a3
>
> It's for windows but I think I tried a version of it on Linux, which
> already
> has a fast pathed semaphore, and it improved its performance.  You want
> to avoid doing syscalls on every semaphore wait and post if it isn't
> necessary.   Note that the windows version assumes necessary memory
> barriers on the win32 atomic ops which is only true on intel and not
> on powerpc (xbox360) so they will have to be added in those cases.
>
>

There's some other potential problems that I missed on my first quick
look at the code.  Rather than discuss them on c.s.c++, I'm cross posting
with follow up set to c.p.t. where the discussion will be more on topic.
Producer/consumer has been discussed quite a bit over there.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Sun, 10 Sep 2006 12:52:19 GMT Raw View

Joe Seigh wrote:
> John Nagle wrote:
>
>> Earl Purple wrote:
>>
>>> I would be interested to see how other producer-consumer queues have
>>> been implemented and how they compare to mine.
>>
>>
>>
>>     Take a look at our "http://www.overbot.com/public/qnx/mutexlock.h",
>> which we used on our DARPA Grand Challenge vehicle.  "class
>> BoundedBuffer"
>> is the key section.  This implements a thread safe producer/consumer
>> queue
>> for POSIX threads.  "atomic_inc" and "atomic_dec" are used.  Take a look
>> at "atomic_incmod_value", which does "add 1 mod N" without locking.
>> That's painful enough that it's worth having atomic modular addition
>> as a standard "atomic" function, since it's a basic primitive for
>> circular buffers.
>>
>>     (There's some QNX-specific code; QNX, a real-time OS, allows
>> timeouts on
>> a mutex, which is an extension to the POSIX standard).
>>
>>                 John Nagle
>>
>
> Is there a possible race condition in atomic_incmod_value?  It seems
> to assume that loc will never get larger than 2*mod - 1 before the
> decrement occurs.  It it ever reached that point, loc would increase
> forever without being decremented and would eventually overflow.
> Compare and swap logic could take care of that situation.

     Yes, that's a problem, but because that function is only used
for the bounded buffer code, which uses a Semaphore to limit the
number of items in the queue, that's not a bug.  But I'm not
happy with it.

     If you want to use compare-and-swap without locking, you have
to have it available as a primitive.  This requires hardware
support.  Interestingly, QNX mutexes use compare-and-swap on x86,
but load/store conditional opcodes on RISC processors.  The
Linux people had some serious problems in this area; see
"http://lists.osdl.org/pipermail/robustmutexes/2003-December/000110.html"

     With current hardware, the portable primitive is a mutex;
you can't rely on having either compare-and-swap or test-and-set
available as an atomic operation.

    John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Chris Thomasson" <cristom@comcast.net>
Date: Sun, 10 Sep 2006 12:58:50 CST Raw View

"Joe Seigh" <jseigh_01@xemaps.com> wrote in message
news:gvCdneRguvhJZp7YnZ2dnUVZ_v-dnZ2d@comcast.com...
> Joe Seigh wrote:
>> John Nagle wrote:
>>
>>> Earl Purple wrote:
>>>
>>>> I would be interested to see how other producer-consumer queues have
>>>> been implemented and how they compare to mine.

[...]


>> You might want to look at some fast pathed semaphore code.  This code
>> http://groups-beta.google.com/group/comp.programming.threads/msg/ea28d867d9cd30a3

[...]

> There's some other potential problems that I missed on my first quick
> look at the code.

[...]



http://groups.google.com/group/comp.programming.threads/msg/d07dc3508a2b1ce7


http://groups.google.com/group/comp.programming.threads/msg/77c6577e9e264f82


http://groups.google.com/group/comp.programming.threads/msg/628fd20930203be3


IIRC, that should address basically everything...


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: cristom@comcast.net ("Chris Thomasson")
Date: Mon, 11 Sep 2006 02:13:14 GMT Raw View

"Chris Thomasson" <cristom@comcast.net> wrote in message news:...
> "Joe Seigh" <jseigh_01@xemaps.com> wrote in message
> news:gvCdneRguvhJZp7YnZ2dnUVZ_v-dnZ2d@comcast.com...
>> Joe Seigh wrote:
>>> John Nagle wrote:
>>>> Earl Purple wrote:

[...]


>> There's some other potential problems that I missed on my first quick
>> look at the code.
>
> [...]

[...]


Never mind... I think I confused the problem(s) Joe noticed in the code:


http://www.overbot.com/public/qnx/mutexlock.h



with an old bug that was in Joes lock-free original semaphore code...


Sorry for any confusion...


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: no.spam@no.spam.com (Maciej Sobczak)
Date: Mon, 11 Sep 2006 14:41:51 GMT Raw View

Earl Purple wrote:

>> Object *p = new Object();
>> // set up the new object
>>
>> mtx.lock();
>> queue.push(p);
>> mtx.unlock();
>>
>> above, the object in question was created and constructed *outside* of
>> the scope of mutex, and the mutex itself is only guarding the queue's
>> internal stuff. If you associate the mutex with the queue only (and
>> that's the only thing you can statically declare), then consumers will
>> see the new pointer in the queue, but not necessarily the object that it
>> points to. (On the other, hand, if you try to extend the scope of mutex
>> to cover also the object's construction, it might increase contention.)
>
> I have a ProdConQueue template which is basically what you are
> implementing here. Now its push() method does the locking for you so
> it's pretty similar to what you have there.
>
> Above you are pushing a pointer onto the queue. The thread that does
> the pushing is not going to call delete so I don't see the problem.

Because the problem is not in lifetime, but value visibility.
What mutex gives you is not only the mutual exclusion, but also the
guarantee that one thread will see the *value* that the other thread wrote.
Today, the typical mutex causes all memory to be synchronized and this
is necessary for the reader to really see all the new values in memory.
If we allow the mutex to be associated only with the queue, then the
mutual exclusion (as a tool that serializes execution of some threads)
will still be guaranteed, but not necessarily the visibility of writes.
The object pointed by the pointer in the queue will really be "there",
because it was already constructed and written by the writer thread, but
the reader thread might not see its state properly, if the memory write
operations did not propagate through the system in time.
That's why associating the mutex only with some selected set of objects
(as opposed to synchronizing all memory) would break many multithreading
patterns, including the one above.

--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wkaras@yahoo.com
Date: Mon, 4 Sep 2006 17:17:16 CST Raw View

kanze wrote:
> Chris Thomasson wrote:
> > <wkaras@yahoo.com> wrote in message
.
> > > The definition of "observable behavior" in the current
> > > Standard should be fleshed out, but I think it will
> > > always be somewhat open-ended.  Maybe the
> > > comparison is with the number of bits in an int,
> > > with a minimum but no max, just a requirement
> > > that you say what it is.  If a compiler assets that
> > > it supports multi-threading of a given type, all
> > > threads must "properly see" each others
> > > observable reads/writes.
>
> > I would not use that compiler...
>
> Don't worry, nothing that constraining will be adopted.  (Such a
> policy would require barriers around just about every access.)
.

It seems that you (both) are interpreting "observable
reads/writes" to be the same as "all reads/writes".  Again
let me repeat, by observable read/write I mean the
equivalent of a read or a write of a volatile variable.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Tue, 5 Sep 2006 09:24:49 CST Raw View

wkaras@yahoo.com wrote:
> kanze wrote:
> > Chris Thomasson wrote:
> > > <wkaras@yahoo.com> wrote in message

> > > > The definition of "observable behavior" in the current
> > > > Standard should be fleshed out, but I think it will
> > > > always be somewhat open-ended.  Maybe the
> > > > comparison is with the number of bits in an int,
> > > > with a minimum but no max, just a requirement
> > > > that you say what it is.  If a compiler assets that
> > > > it supports multi-threading of a given type, all
> > > > threads must "properly see" each others
> > > > observable reads/writes.

> > > I would not use that compiler...

> > Don't worry, nothing that constraining will be adopted.
> > (Such a policy would require barriers around just about
> > every access.)

> It seems that you (both) are interpreting "observable
> reads/writes" to be the same as "all reads/writes".  Again let
> me repeat, by observable read/write I mean the equivalent of a
> read or a write of a volatile variable.

I was.  Now that I think about it, I think what Chris was
objecting to is that this "observability" be bound to the type
of the object; typically, what is needed in a multithreaded
context is an explicitly induced point in the program where all
previous writes (regardless of the target type) become visible
(before any of the following writes become visible).

I think that Chris and I agree that volatile is not really
relevant with regards to multithreading.  There is a proposal by
Microsoft to give more teeth to volatile.  I'm not totally
against the proposal, as I think it goes in the direction of the
original intent of volatile.  But I don't think that the
proposal will give volatile any more real relevance with regards
to multithreading.  What you usually need in multithreading is a
sequencing guarantee---that (all) preceding writes will become
visible to all observers before any of the following writes.
(Note that this generally implies some sequencing actions on the
part of the observers as well.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: cristom@comcast.net ("Chris Thomasson")
Date: Tue, 5 Sep 2006 14:54:04 GMT Raw View

<wkaras@yahoo.com> wrote in message
news:1157391272.941676.304590@m73g2000cwd.googlegroups.com...
>
> kanze wrote:
>> Chris Thomasson wrote:
>> > <wkaras@yahoo.com> wrote in message
> .
>> > > The definition of "observable behavior" in the current
>> > > Standard should be fleshed out, but I think it will
>> > > always be somewhat open-ended.  Maybe the
>> > > comparison is with the number of bits in an int,
>> > > with a minimum but no max, just a requirement
>> > > that you say what it is.  If a compiler assets that
>> > > it supports multi-threading of a given type, all
>> > > threads must "properly see" each others
>> > > observable reads/writes.
>>
>> > I would not use that compiler...
>>
>> Don't worry, nothing that constraining will be adopted.  (Such a
>> policy would require barriers around just about every access.)
> .
>
> It seems that you (both) are interpreting "observable
> reads/writes" to be the same as "all reads/writes".  Again
> let me repeat, by observable read/write I mean the
> equivalent of a read or a write of a volatile variable.

That would be basically equivalent to:

ld.mf/st.mf

If their going to define volatile has having barriers.... They should let
the programmer define exactly what type of barriers are used for stores and
what kind of barrier is used for loads... I would drop volatile and prefer
something like this:

.. std::atomic<Type>::op<MemoryBarrier>(...);

That way I can do stuff like this:

static T pVar;


T std::atomic<T>::cas<std::StoreLoad|std::StoreStore>(&pVar, ..., ...);


T std::atomic<T>::cas<std::LoadStore|std::StoreStore>(&pVar, ..., ...);


T std::atomic<T>::cas<std::LoadStore|std::LoadLoad>(&pVar, ..., ...);


T std::atomic<T>::cas<std::StoreLoad>(&pVar, ..., ...);


T std::atomic<T>::cas<std::LoadStore>(&pVar, ..., ...);


Tstd:: atomic<T>::cas<std::LoadLoad>(&pVar, ..., ...);


T std::atomic<T>::cas<std::StoreStore>(&pVar, ..., ...);


' naked by default
T atomic<T>::cas<>(&pVar, ..., ...);



Compiler shall not perform any code motion/tricky optimizations' across
calls to std::atomic these functions... That takes care of compiler
ordering... The memory barrier takes care of the hardware ordering... I
would model this design after the SPARC instruction-set, like the example
shows, because of its "fine granularity"...


Any thoughts?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Alan McKenney" <alan_mckenney1@yahoo.com>
Date: Tue, 5 Sep 2006 12:18:04 CST Raw View

kanze wrote:
> wkaras@yahoo.com wrote:

>
> .... typically, what is needed in a multithreaded
> context is an explicitly induced point in the program where all
> previous writes (regardless of the target type) become visible
> (before any of the following writes become visible).

> ....  What you usually need in multithreading is a
> sequencing guarantee---that (all) preceding writes will become
> visible to all observers before any of the following writes.
> (Note that this generally implies some sequencing actions on the
> part of the observers as well.)

Does it need to be "all writes"?

Perhaps I've been corrupted by my years doing
supercomputing, but when I think of parallel processing
(and I think of multithreading as parallel processing), I
envision the model system as one with a bunch of
CPUs (possibly with local memory)
with a vast network between the CPUs and the (shared)
memory.  When a CPU updates a (shared) variable,
the update slowly "percolates" out through the network.

In this situation, waiting for "all writes" from all CPUs
to be done would require all CPUs to stop and wait
for the memory network to become quiescent.
Since this would happen every time any CPU requests
a "wait for all writes", it would cause an O(no of processors)
performance hit.

I don't know about anyone else, but when I use
mutexes, I always associate each mutex with a set
of (shared) variables that it controls, so what I would
want is to be assured that all writes to the variables
controlled by this mutex were visible to my
thread before the mutex was considered locked.

In other words, for me, synchronization always
applies to an object or set of objects, not to the
universe of objects.

If I may invent some ill-advised syntax, I'd want something
like

synchronizable group_a { int a; std::string b; MyClass c; };

lock_group( group_a );
a += 1;
unlock_group( group_a );

-- Alan McKenney

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wkaras@yahoo.com
Date: Tue, 5 Sep 2006 12:16:30 CST Raw View

kanze wrote:
> wkaras@yahoo.com wrote:
> > kanze wrote:
> > > Chris Thomasson wrote:
> > > > <wkaras@yahoo.com> wrote in message
.
> > It seems that you (both) are interpreting "observable
> > reads/writes" to be the same as "all reads/writes".  Again let
> > me repeat, by observable read/write I mean the equivalent of a
> > read or a write of a volatile variable.
>
> I was.  Now that I think about it, I think what Chris was
> objecting to is that this "observability" be bound to the type
> of the object; typically, what is needed in a multithreaded
> context is an explicitly induced point in the program where all
> previous writes (regardless of the target type) become visible
> (before any of the following writes become visible).

Seems like overkill.  In fact, my own proposal could be
made more flexible:

namespace std
{
template <unsigned seq_id>
struct seq
  {
    template <typename T>
    void observable_read(const T &);

    template <typename T>
    void observable_write<const T &);
  };

template <typename T>
void observable_read(const T &x)
   { seq<0>::observable_read(x); }

template <typename T>
void observable_write<const T &);
   { seq<0>::observable_write(x); }

// ...
using namespace std;
// ...

a = 5;
seq<1>::observable_write(a);
a_ready = true;
seq<1>::observable_write(a_ready);
b = 5;
seq<2>::observable_write(b);
b_ready = true;
seq<2>::observable_write(b_ready);

In this example, the observable write to a could be reordered
to follow the observable write to b_ready, as long as
the observable write to a_ready still followed the one
to a.

There may not currently be any CPU architecture that
could take advantage of this amount of flexibility.  But
the intent of the observable reads/writes is more clear,
and perhaps the code is more future safe.

> I think that Chris and I agree that volatile is not really
> relevant with regards to multithreading.  There is a proposal by
> Microsoft to give more teeth to volatile.  I'm not totally
> against the proposal, as I think it goes in the direction of the
> original intent of volatile.  But I don't think that the
> proposal will give volatile any more real relevance with regards
> to multithreading.  What you usually need in multithreading is a
> sequencing guarantee---that (all) preceding writes will become
> visible to all observers before any of the following writes.
> (Note that this generally implies some sequencing actions on the
> part of the observers as well.)

Wasn't the orginal intent of volatile to allow effective
interfacing with memory-mapped peripherals?  How
could an implementation of volatile that worked properly
with memory-mapped peripherals (in the general case
where read/write order mattered) not work properly for
inter-thread data passing.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Tue, 5 Sep 2006 19:00:07 GMT Raw View

kanze wrote:
> I was.  Now that I think about it, I think what Chris was
> objecting to is that this "observability" be bound to the type
> of the object; typically, what is needed in a multithreaded
> context is an explicitly induced point in the program where all
> previous writes (regardless of the target type) become visible
> (before any of the following writes become visible).

    That's probably the best that can be done for C++, given that
the compiler has no idea what a given lock protects.  There's
a performance penalty for that approach, but it's not that
large on most existing CPUs given modern cache synchronization
hardware.

    This approach maps badly to hardware which has huge lookahead
(the Itanium).  It maps badly to machines which have slow access
to shared memory and fast local memory (the Cell).  But for x86,
which is what matters, it's good enough.

    John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Tue, 5 Sep 2006 19:18:24 GMT Raw View

Alan McKenney wrote:
> kanze wrote:
>
>>wkaras@yahoo.com wrote:

> Does it need to be "all writes"?
>
> Perhaps I've been corrupted by my years doing
> supercomputing, but when I think of parallel processing
> (and I think of multithreading as parallel processing), I
> envision the model system as one with a bunch of
> CPUs (possibly with local memory)
> with a vast network between the CPUs and the (shared)
> memory.  When a CPU updates a (shared) variable,
> the update slowly "percolates" out through the network.
>
> In this situation, waiting for "all writes" from all CPUs
> to be done would require all CPUs to stop and wait
> for the memory network to become quiescent.
> Since this would happen every time any CPU requests
> a "wait for all writes", it would cause an O(no of processors)
> performance hit.
>
> I don't know about anyone else, but when I use
> mutexes, I always associate each mutex with a set
> of (shared) variables that it controls, so what I would
> want is to be assured that all writes to the variables
> controlled by this mutex were visible to my
> thread before the mutex was considered locked.
>
> In other words, for me, synchronization always
> applies to an object or set of objects, not to the
> universe of objects.

     As I point out occasionally, that architecture is used on the Sony
Playstation 3.  The number of machines in existence with architectures
like that is about to increase by five or six orders of magnitude, as
loosely coupled multiprocessor architecture reaches the Toys-R-Us/WalMart
level.

     So it might be worth giving such synchronization problems more attention.

     John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: no.spam@no.spam.com (Maciej Sobczak)
Date: Wed, 6 Sep 2006 14:48:15 GMT Raw View

Alan McKenney wrote:

> I don't know about anyone else, but when I use
> mutexes, I always associate each mutex with a set
> of (shared) variables that it controls, so what I would
> want is to be assured that all writes to the variables
> controlled by this mutex were visible to my
> thread before the mutex was considered locked.
>
> In other words, for me, synchronization always
> applies to an object or set of objects, not to the
> universe of objects.
>
> If I may invent some ill-advised syntax, I'd want something
> like
>
> synchronizable group_a { int a; std::string b; MyClass c; };

Good point (and I would back this idea), but this is not compatible with
current common usage patterns.
Consider for example the "producer" thread that adds some object to the
shared queue. For this pattern to work, every new object that is created
by the producer (and it might be created as a dynamic structure on the
free store) needs to be visible to consumer threads, but this visibility
will not be guaranteed if the mutex will be associated with some stable
set of objects - in this case with the queue itself.
Something like this:

Object *p = new Object();
// set up the new object

mtx.lock();
queue.push(p);
mtx.unlock();

above, the object in question was created and constructed *outside* of
the scope of mutex, and the mutex itself is only guarding the queue's
internal stuff. If you associate the mutex with the queue only (and
that's the only thing you can statically declare), then consumers will
see the new pointer in the queue, but not necessarily the object that it
points to. (On the other, hand, if you try to extend the scope of mutex
to cover also the object's construction, it might increase contention.)

I guess it's a common pattern and this patter would break.

Note that above, the pointer indirection might not be so explicit. Think
about std::queue of std::strings. Or std::queue of Persons, where Person
has some std::string attributes.

Of course, we might as well drop our habits and accommodate some new
ones. Hell, we're going to do this anyway. ;-)

--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 6 Sep 2006 09:50:17 CST Raw View

wkaras@yahoo.com wrote:
> kanze wrote:
> > wkaras@yahoo.com wrote:
> > > kanze wrote:
> > > > Chris Thomasson wrote:
> > > > > <wkaras@yahoo.com> wrote in message

> > > It seems that you (both) are interpreting "observable
> > > reads/writes" to be the same as "all reads/writes".  Again let
> > > me repeat, by observable read/write I mean the equivalent of a
> > > read or a write of a volatile variable.

> > I was.  Now that I think about it, I think what Chris was
> > objecting to is that this "observability" be bound to the type
> > of the object; typically, what is needed in a multithreaded
> > context is an explicitly induced point in the program where all
> > previous writes (regardless of the target type) become visible
> > (before any of the following writes become visible).

> Seems like overkill.

It is.  But you generally do need synchronization accross a set
of accesses, and not a single access.

> In fact, my own proposal could be
> made more flexible:

> namespace std
> {
> template <unsigned seq_id>
> struct seq
>   {
>     template <typename T>
>     void observable_read(const T &);

>     template <typename T>
>     void observable_write<const T &);
>   };
>
> template <typename T>
> void observable_read(const T &x)
>    { seq<0>::observable_read(x); }

> template <typename T>
> void observable_write<const T &);
>    { seq<0>::observable_write(x); }

> // ...
> using namespace std;
> // ...

> a = 5;
> seq<1>::observable_write(a);
> a_ready = true;
> seq<1>::observable_write(a_ready);
> b = 5;
> seq<2>::observable_write(b);
> b_ready = true;
> seq<2>::observable_write(b_ready);

> In this example, the observable write to a could be reordered
> to follow the observable write to b_ready, as long as
> the observable write to a_ready still followed the one
> to a.

> There may not currently be any CPU architecture that
> could take advantage of this amount of flexibility.  But
> the intent of the observable reads/writes is more clear,
> and perhaps the code is more future safe.

This goes along somewhat with what Alan spoke of;
synchronization requests which affect sets of variables, rather
than all the variables (or just one access).

As you say, it won't buy you anything with most modern general
purpose machines today.  And it introduces a lot of extra
complexity, for which we have little existing practice to base
our ideas on.

> > I think that Chris and I agree that volatile is not really
> > relevant with regards to multithreading.  There is a proposal by
> > Microsoft to give more teeth to volatile.  I'm not totally
> > against the proposal, as I think it goes in the direction of the
> > original intent of volatile.  But I don't think that the
> > proposal will give volatile any more real relevance with regards
> > to multithreading.  What you usually need in multithreading is a
> > sequencing guarantee---that (all) preceding writes will become
> > visible to all observers before any of the following writes.
> > (Note that this generally implies some sequencing actions on the
> > part of the observers as well.)

> Wasn't the orginal intent of volatile to allow effective
> interfacing with memory-mapped peripherals?

That's my understanding of it.

> How could an implementation of volatile that worked properly
> with memory-mapped peripherals (in the general case where
> read/write order mattered) not work properly for inter-thread
> data passing.

The implementation of volatile in Sun CC (and in g++ for Sparc
under Solaris) doesn't work properly for memory-mapped
peripherals:-).  (At least not formally.  I suspect that what
actually happens is that the hardware recognizes the address as
one of a memory mapped peripheral, and does some synchronization
on its own, even if the Sparc architecture standard doesn't
require it.)

Even if it did, however, volatile imposes an absolute ordering
on all accesses to everything that is declared volatile.  Which
is only enough if you declare almost everything volatile.  And
then, it is far too much, imposing an enormous performance
penalty.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 6 Sep 2006 11:26:19 CST Raw View

Alan McKenney wrote:
> kanze wrote:
> > wkaras@yahoo.com wrote:

> > .... typically, what is needed in a multithreaded
> > context is an explicitly induced point in the program where all
> > previous writes (regardless of the target type) become visible
> > (before any of the following writes become visible).

> > ....  What you usually need in multithreading is a
> > sequencing guarantee---that (all) preceding writes will become
> > visible to all observers before any of the following writes.
> > (Note that this generally implies some sequencing actions on the
> > part of the observers as well.)

> Does it need to be "all writes"?

Not really.  That was a major simplification on my part.  The
important point is that there are a set of writes all of which
must become visible before any of the writes in a second set,
although the order within each set is not important.  To do this
with volatile (assuming strong volatile semantics) would require
declaring all of the objects in both sets volatile, creating a
total ordering of all of the writes, which is not necessary and
which imposes an extreme performance penalty.  (Note too that
when I speak of the ordering of the writes, I really mean the
order that is seen by all observers.)

> Perhaps I've been corrupted by my years doing
> supercomputing, but when I think of parallel processing
> (and I think of multithreading as parallel processing), I
> envision the model system as one with a bunch of
> CPUs (possibly with local memory)
> with a vast network between the CPUs and the (shared)
> memory.  When a CPU updates a (shared) variable,
> the update slowly "percolates" out through the network.

> In this situation, waiting for "all writes" from all CPUs to
> be done would require all CPUs to stop and wait for the memory
> network to become quiescent.  Since this would happen every
> time any CPU requests a "wait for all writes", it would cause
> an O(no of processors) performance hit.

Yes, but only at one very specific point in time.

In fact, it's a little bit more subtle.  The "writing" processor
uses a primitive to ensure that it "exports" all of the
preceding writes before it "exports" any following writes; on
many modern processors, a store A instruction, followed by a
store B instruction, may result in B being "written" before A
unless special steps are taken.  And the "reading" processors
use a primitive to ensure that all following reads access
"later" values than all previous reads.

Consider a simple example: p is a pointer, initialized with
null:

    processor A                 processor B

    p = new C;                  if ( p != NULL ) p->someFunctionInC() ;

This doesn't work, of course, because there is no ordering
between the writes in the constructor of C, and the write to p;
the inversion of the ordering may occur when actually writing to
global memory, in processor A, or when reading from global
memory, in processor B.

In this case, there is no way to make volatile at an object
level work, no matter how strong it is made, because volatile
doesn't engage until the constructor has finished.

> I don't know about anyone else, but when I use
> mutexes, I always associate each mutex with a set
> of (shared) variables that it controls, so what I would
> want is to be assured that all writes to the variables
> controlled by this mutex were visible to my
> thread before the mutex was considered locked.

At least under Posix, you have this.  It's one of the Posix
guarantees concerning pthread_mutex_lock (and all of the other
pthread synchronization requests).  I think (hope?) is is a
foregone conclusion that any standard mutexes, regardless of the
syntax finally adopted, will adopt these guarantees.  If you
wrap the two operations above in a mutex lock, there is no
problem.

The interest in atomic operations, observable, and such, is for
lock free algorithms.  On a Sun Sparc, for example, I don't need
a lock to make the above work; just inserting a few membar
instructions in critical places is sufficient.

> In other words, for me, synchronization always
> applies to an object or set of objects, not to the
> universe of objects.

At the design level, you are certainly correct.  In practice,
the synchronization is done by means of a system request
(pthread_mutex_lock, for example) which doesn't know what the
set of objects is, and synchronizes everything.  At least on a
Sparc (the architecture I know best), this memory
synchronization is done by means of a machine instruction
membar, and this instruction synchronizes everything.

> If I may invent some ill-advised syntax, I'd want something
> like

> synchronizable group_a { int a; std::string b; MyClass c; };

> lock_group( group_a );
> a += 1;
> unlock_group( group_a );

I don't think it would buy much on most modern processors.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wkaras@yahoo.com
Date: Wed, 6 Sep 2006 12:24:05 CST Raw View

kanze wrote:
> wkaras@yahoo.com wrote:
> > kanze wrote:
> > > wkaras@yahoo.com wrote:
> > > > kanze wrote:
> > > > > Chris Thomasson wrote:
> > > > > > <wkaras@yahoo.com> wrote in message
.
> > How could an implementation of volatile that worked properly
> > with memory-mapped peripherals (in the general case where
> > read/write order mattered) not work properly for inter-thread
> > data passing.
>
> The implementation of volatile in Sun CC (and in g++ for Sparc
> under Solaris) doesn't work properly for memory-mapped
> peripherals:-).  (At least not formally.  I suspect that what
> actually happens is that the hardware recognizes the address as
> one of a memory mapped peripheral, and does some synchronization
> on its own, even if the Sparc architecture standard doesn't
> require it.)

That raises a good point.  If volatile is meant for use only with
mem-mapped peripherals, the Standard should say that
an implementation may apply it only when used for
implementation-specific distinguished addresses (for peripherals).
Maybe many compiler writers have assumed this flexibility is
implicit in the Standard.

> Even if it did, however, volatile imposes an absolute ordering
> on all accesses to everything that is declared volatile.  Which
> is only enough if you declare almost everything volatile.  And
> then, it is far too much, imposing an enormous performance
> penalty.

Exactly, that's why I made my proposal.  But I think you're
exagerating a bit.  Only variables that are accessed by
multiple threads need to be volatile.  If that's nearly all the
variables, it woud be questionable whether a multi-threaded
implementation made sense at all.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Sean Kelly" <sean@f4.ca>
Date: Wed, 6 Sep 2006 16:07:44 CST Raw View

Alan McKenney wrote:
> kanze wrote:
> > wkaras@yahoo.com wrote:
>
> >
> > .... typically, what is needed in a multithreaded
> > context is an explicitly induced point in the program where all
> > previous writes (regardless of the target type) become visible
> > (before any of the following writes become visible).
>
> > ....  What you usually need in multithreading is a
> > sequencing guarantee---that (all) preceding writes will become
> > visible to all observers before any of the following writes.
> > (Note that this generally implies some sequencing actions on the
> > part of the observers as well.)
>
> Does it need to be "all writes"?

Typically, "all writes from the executing CPU," though it really
depends on the programmer's expectations.  For example, with processor
consistency (IA-32):

    x = y = 0;

    // CPU 1
    x = 1;

    // CPU 2
    if(x == 1)
        y = 1;

    // CPU 3
    if(y == 1)
        assert(x == 1);

The above assertion can fail, because CPU 3 may see the store from CPU
2 before it sees the store from CPU 1.  From what Alexander Terekhov
said, JSR-133 actually requires all volatile stores to be sequentially
consistent to avoid this, which has a detrimental effect on volatile
load performance.

> Perhaps I've been corrupted by my years doing
> supercomputing, but when I think of parallel processing
> (and I think of multithreading as parallel processing), I
> envision the model system as one with a bunch of
> CPUs (possibly with local memory)
> with a vast network between the CPUs and the (shared)
> memory.  When a CPU updates a (shared) variable,
> the update slowly "percolates" out through the network.

Same here.  It seems easiest to think about timing issues and such
using this model.

> In this situation, waiting for "all writes" from all CPUs
> to be done would require all CPUs to stop and wait
> for the memory network to become quiescent.
> Since this would happen every time any CPU requests
> a "wait for all writes", it would cause an O(no of processors)
> performance hit.

It's just "all writes from the executing CPU," so every volatile store
could be thought to have release semantics.

> I don't know about anyone else, but when I use
> mutexes, I always associate each mutex with a set
> of (shared) variables that it controls, so what I would
> want is to be assured that all writes to the variables
> controlled by this mutex were visible to my
> thread before the mutex was considered locked.

In most instances this is enough.  But I think it really depends on
what is expected of volatile/atomic operations in C++.  It seems the
Java approach was to make them as idiot-proof as possible at the
expense of performance.  I suspect this will not be the decision for
C++, but I'm not sure where the line will be drawn.  If I had to guess
however, I would say that the language changes will be the bare minimum
and the average user's exposure to atomic operations will be through
library code.  So at the very least there must be some way to restrict
the optimizer from rearranging stores across "volatile" barriers, plus
perhaps some way to control load/store ordering (unless we're expected
to use ASM for this).

Sean

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Chris Thomasson" <cristom@comcast.net>
Date: Mon, 4 Sep 2006 01:15:32 CST Raw View

<wkaras@yahoo.com> wrote in message
news:1156863246.290976.191710@i3g2000cwc.googlegroups.com...
>
> kanze wrote:
>> wkaras@yahoo.com wrote:
> .
[...]
>> An important consideration is being able to the guarantee order
>> some set of writes becomes visible to other threads.  But it's
>> part of the larger problem of the memory model in general.
>
> I talked about that in the part you didn't quote.  When
> I say "observable read/write" I guess I need to clarify this
> means like a read/write of a volatile variable.  I think the
> Standard does require that the order (as determined by
> sequence points) be preserved.

volatile has nothing to do with the memory model... Well, except in
Microsoft...

http://groups.google.com/group/comp.programming.threads/msg/52fbe7472d229061?hl=en

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e7b5fe64cb35d64d/52fbe7472d229061?&hl=en#52fbe7472d229061

I hope that volatile does not become so strict that is rendered useless...

> The definition of "observable behavior" in the current
> Standard should be fleshed out, but I think it will
> always be somewhat open-ended.  Maybe the
> comparison is with the number of bits in an int,
> with a minimum but no max, just a requirement
> that you say what it is.  If a compiler assets that
> it supports multi-threading of a given type, all
> threads must "properly see" each others
> observable reads/writes.

I would not use that compiler... That sounds more strict than Java memory
model:

http://groups.google.com/group/comp.programming.threads/msg/5c24e02f54919230?hl=en

http://groups.google.com/group/comp.programming.threads/msg/a730da4289ee4e7f?hl=en

:O

> If a compiler targets
> a multi-core processor without implicit
> cache synchronization,

Huh?

> the doc should say when/
> whether it's going to generate the explicit
> cache flushes/invalidates for the cores to
> pass data between themselves.

Ahhh... Okay...

Anytime somebody talks about the cache wrt the memory model... Well:

http://groups.google.com/group/alt.winsock.programming/msg/db0139360ffbf4e2

http://groups.google.com/group/comp.programming.threads/msg/423df394a0370fa6

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/29ea516c5581240e/423df394a0370fa6?#423df394a0370fa6
(read all, BTW I was SenderX)

Here is some of my quick thoughts on the subject:

http://groups.google.com/group/comp.programming.threads/msg/ca2f1af4552233df

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/6715c3e5a73c4016/08d850d47125a2b8?lnk=gst&q=chris+thomasson+memory+fences+x86&rnum=2#08d850d47125a2b8

I hope the memory model c++ finally go with is compatible with lock-free
reader patterns:

https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068

http://groups.google.com/group/comp.programming.threads/msg/a730da4289ee4e7f
(Yikes! Not good; for me at least...)

I have grave concerns about the negative impact that "overly
strict/paranoid" memory models can have on advanced thread synchronization
techniques. IMHO, it would be really neat for C++ to have efficient and
flexible support for this kind of stuff:

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/65c9c2673682d4cf/bc1506fb6d0e3ba7?lnk=gst&q=vzOom&rnum=1#bc1506fb6d0e3ba7

I am looking forward to see what c++ come up with; One quick piece of advice
: Please don't let c++ memory model turn into a behemoth; think of poor
java... I can't implement my vZOOM library on Java. It could be done with
Java volatiles, however IMHO it would simply be big a waste of my time and
energy. Java would force my reader threads to use memory barriers during
their various searching activities'. Why bother...?

Therefore, I really hope that C++ will allow me to be in complete and total
control wrt the memory barriers that any of my code uses.

Thank you all for your time.

--
Chris Thomasson
http://appcore.home.comcast.net/
(portable lock-free data-structures)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Mon, 4 Sep 2006 09:32:14 CST Raw View

Chris Thomasson wrote:
> <wkaras@yahoo.com> wrote in message
> news:1156863246.290976.191710@i3g2000cwc.googlegroups.com...

> > kanze wrote:
> >> wkaras@yahoo.com wrote:

> [...]
> >> An important consideration is being able to the guarantee order
> >> some set of writes becomes visible to other threads.  But it's
> >> part of the larger problem of the memory model in general.

> > I talked about that in the part you didn't quote.  When
> > I say "observable read/write" I guess I need to clarify this
> > means like a read/write of a volatile variable.  I think the
> > Standard does require that the order (as determined by
> > sequence points) be preserved.

> volatile has nothing to do with the memory model...

The original intent was that it would.  That "memory accesses"
were guaranteed to not move across a sequence point boundary.
The (C) standard left the definition of what an access is up to
the implementation, and it is true that most implementations
today use a very useless definition: that a load or store
instruction has been executed.  (The also fail to document this,
which means that they aren't conform---the standard requires
that implementation defined behavior be documented.)  Arguably,
this ignores the intent of volatile, even if it meets the letter
of the law.

> Well, except in Microsoft...

> http://groups.google.com/group/comp.programming.threads/msg/52fbe7472d229061?hl=en
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e7b5fe64cb35d64d/52fbe7472d229061?&hl=en#52fbe7472d229061

> I hope that volatile does not become so strict that is
> rendered useless...

It's currently useless in most major compilers, so no proposed
change will make it worse.

> > The definition of "observable behavior" in the current
> > Standard should be fleshed out, but I think it will
> > always be somewhat open-ended.  Maybe the
> > comparison is with the number of bits in an int,
> > with a minimum but no max, just a requirement
> > that you say what it is.  If a compiler assets that
> > it supports multi-threading of a given type, all
> > threads must "properly see" each others
> > observable reads/writes.

> I would not use that compiler...

Don't worry, nothing that constraining will be adopted.  (Such a
policy would require barriers around just about every access.)

    [...]
> Therefore, I really hope that C++ will allow me to be in
> complete and total control wrt the memory barriers that any of
> my code uses.

It will doubtlessly not prevent you from doing whatever you want
(and will not insert memory barriers arbitrarily if you don't do
something specific to provoke them).  It will, hopefully, also
provide a certain number of ready-built primitives for those who
are less expert in the field.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wkaras@yahoo.com
Date: Tue, 29 Aug 2006 10:41:55 CST Raw View

kanze wrote:
> wkaras@yahoo.com wrote:
.
> > I would hope that alot of brownie points would be awarded to
> > any proposed standard for which a portable open source
> > implementation as a layer on top of POSIX threads existed.
>
> Existing practice is certainly desirable.  (Except, of course,
> when the effect of the existing practice was to show that it was
> a bad idea:-).)  The problem is that a code base isn't a
> standard; you still have to specify what it does, and in the
> standard, you cannot specify something as "whatever the
> underlying OS happens to give you."  (To start with, in order to
> do this, you'd have to specify the mapping to the underlying OS.
> All underlying OS's.)

Yes, I was not saying or implying that an implementation is
equivalent to a standard.

> If nothing else, you've got to say that
> the compiler cannot move code around something like:
>     std::scoped_lock( someMutex ) ;
> (Posix guarantees it for C, and all of the Posix-compatible C++
> compilers extend the guarantee for C++---although this wasn't
> always true.  Boost guarantees it for a Posix platform, because
> you more or less know the implementation.  Formulating it in
> standard'ese, on the other hand, is less simple.)

"observe" (below) is intended to accomplish the purpose of
"scoped_lock", but I think in a clearer and more generally
useful way.

>
> Beyond that, it's important to realize that different people use
> multithreading for different reasons.  For what I do, a few
> modifications in Boost.threads would do the trick.  On the other
> hand, I know of more than a few people who need to be able to
> propagage an exception (of unknown type, but I think it would be
> acceptable to limit it to types derived from std::exception)
> across a join---this is impossible with Boost as it currently
> stands (and without the limitation to std::exception, probably
> impossible without explicit compiler support).  And people using
> threads to parallize numeric calculations, and such things, need
> low lever, non-locking primitives: things like CAS or
> atomic_incr.

I would go further and say that it's probably impossible to come up
with a library that handles all forms of multi-threading, including
ISRs,
interacting with an ASIC in dual port mem, etc.  So shoot for a
generally but not universally useful lib.  A "clean" version of
POSIX (e.g. realeasing mutexes in destructors) seems
like a logical baseline.

>
> > I would also suggest adding (in std namespace) the "pseudo"
> > function template:
>
> > template <typename T>
> > void observe(const T &x) { ... }
>
> > If observe is called on an object x, then for every instance i
> > (directly or indirectly) within x, i of a primative type:
>
> > 1.  An observable write to i will fall between
> > the sequence point following a (value changing)
> > nominal write to i and the sequence point following
> > the observe(x) call.
> > 2.  An observable read of i will fall between
> > the s.p. following the observe(x) call and the
> > s.p. following a nominal read of i.
>
> I think it's a bit more complicated than that.  Observable by
> whom?  Under what conditions?
>
> An important consideration is being able to the guarantee order
> some set of writes becomes visible to other threads.  But it's
> part of the larger problem of the memory model in general.

I talked about that in the part you didn't quote.  When
I say "observable read/write" I guess I need to clarify this
means like a read/write of a volatile variable.  I think the
Standard does require that the order (as determined by
sequence points) be preserved.

The definition of "observable behavior" in the current
Standard should be fleshed out, but I think it will
always be somewhat open-ended.  Maybe the
comparison is with the number of bits in an int,
with a minimum but no max, just a requirement
that you say what it is.  If a compiler assets that
it supports multi-threading of a given type, all
threads must "properly see" each others
observable reads/writes.  If a compiler targets
a multi-core processor without implicit
cache synchronization, the doc should say when/
whether it's going to generate the explicit
cache flushes/invalidates for the cores to
pass data between themselves.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wkaras@yahoo.com
Date: Fri, 25 Aug 2006 22:06:00 CST Raw View

kanze wrote:
> kuyper@wizard.net wrote:
> > Jiang wrote:
>
> > > I do not think Bjarne Stroustrup will change the language
> > > a lot, since he said many times that library extensions
> > > will be preferred to language extensions.
>
> > It's not Stroustrup's language anymore; it's the standards committee
> > which will decide whether and how the language is changed. I'm sure he
> > has opinions on the matter, and they're likely to be given considerable
> > weight by the committee. However, since the language was standardized,
> > all that Stroustrup can change by his own decisions are his own
> > implementation of C++, not the official C++ language itself.
>
> I think that there is a general consensus in the C++ community,
> and in the committee, that all other things being equal, a
> library solution is preferable.  I think that there is also a
> very large consensus as to the need to support threading, and a
> general recognition that this requires something in the
> language.
.

I would hope that alot of brownie points would be awarded to
any proposed standard for which a portable open source
implementation as a layer on top of POSIX threads existed.

I would also suggest adding (in std namespace) the "pseudo"
function template:

template <typename T>
void observe(const T &x) { ... }

If observe is called on an object x, then for every instance i
(directly or indirectly) within x, i of a primative type:

1.  An observable write to i will fall between
the sequence point following a (value changing)
nominal write to i and the sequence point following
the observe(x) call.
2.  An observable read of i will fall between
the s.p. following the observe(x) call and the
s.p. following a nominal read of i.

There might a need for two flavors of observe,
one guaranteed to work when the reads/writes
to i are via aliases and/or outside the compilation
unit, and one that's not.

The assumption here is that, whatever "observable
behavior" exactly means, it's observable by all
the threads running with a common memory
space.

If you think "observe" is redundant with volatile,
try compiling some code that uses a
volatile instance of a class.  You'll find
you have to duplicate all the (non-static)
member functions and add a 'volatile'
qualifier to them.  Also, using volatile tends to
result in more memory barriers and
register flushing than is really necessary to
safely pass data between threads.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Nikolaos D. Bougalis" <nikb@webmaster.com>
Date: Sun, 27 Aug 2006 13:46:48 CST Raw View

Earl Purple wrote:

> By the way, there is already one feature of thread-synchronicity in the
> standard language - the
> keyword "volatile". Outside of a multi-threaded environment I don't
> know what purpose the keyword
> serves.

 [I know this is bordering on off-topic, but this is a very common
misconception that must be addressed.]

 It serves plenty of purposes as outlined by the C and C++ standards -- namely
how it interacts with signal handlers and variable access across longjmp
boundaries, and the side-effects that occur during volatile _accesses_.

 But the standard says nothing about threads, so what 'volatile' does in that
context is undefined behavior. Frankly, using 'volatile' as a synchronization
mechanism almost always indicates a serious issue that's being covered up
(note, I'm talking about _defining_ variables as volatile -- not _accessing_
variables as volatile through casts).

 In a multi-threaded program, the important factor is atomicity, and neither
the C nor the C++ standard have anything to say about that. The proper
primitive for synchronization of threads is a lock of some kind and not a
volatile variable. Indeed, 'volatile' _by itself_ is no more useful in
implementing such a lock than a cucumber.

 -n

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Mon, 28 Aug 2006 14:04:38 GMT Raw View

Nikolaos D. Bougalis wrote:
> Earl Purple wrote:
>
>> By the way, there is already one feature of thread-synchronicity in the
>> standard language - the
>> keyword "volatile". Outside of a multi-threaded environment I don't
>> know what purpose the keyword
>> serves.

>     But the standard says nothing about threads, so what 'volatile' does
> in that context is undefined behavior. Frankly, using 'volatile' as a
> synchronization mechanism almost always indicates a serious issue that's
> being covered up (note, I'm talking about _defining_ variables as
> volatile -- not _accessing_ variables as volatile through casts).

    True.  It's also worth noting that, if the compiler interprets
"volatile" as "must flush cache to main memory on write", declaring
thread-shared data as volatile causes a sizable performance hit.
Even on multiprocessors, the inter-CPU cache interlocking is
far faster than forcing a write to memory and waiting for it.

    It does help if the compiler knows that some operation is
unlocking a lock.  It's reasonable for an implementation to
flush any non-auto variables from a register when unlocking
any lock.  That will deal with most machine-level race
conditions without programmer intervention.  (By that, I
mean that if the programmer uses a mutex to protect shared
data, the issues associated with CPU and cache coherence
are dealt with the compiler and the hardware.  That's as
it should be.)

    But if the compiler has no knowledge of locks, then it
either has to flush too often, or will fail to flush a register
at a crucial moment.  So the language needs at least enough
thread support that the compiler knows when an unlock has occured.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Mon, 28 Aug 2006 10:21:34 CST Raw View

wkaras@yahoo.com wrote:
> kanze wrote:
> > kuyper@wizard.net wrote:
> > > Jiang wrote:

> > > > I do not think Bjarne Stroustrup will change the language
> > > > a lot, since he said many times that library extensions
> > > > will be preferred to language extensions.

> > > It's not Stroustrup's language anymore; it's the standards
> > > committee which will decide whether and how the language
> > > is changed. I'm sure he has opinions on the matter, and
> > > they're likely to be given considerable weight by the
> > > committee. However, since the language was standardized,
> > > all that Stroustrup can change by his own decisions are
> > > his own implementation of C++, not the official C++
> > > language itself.

> > I think that there is a general consensus in the C++
> > community, and in the committee, that all other things being
> > equal, a library solution is preferable.  I think that there
> > is also a very large consensus as to the need to support
> > threading, and a general recognition that this requires
> > something in the language.

> I would hope that alot of brownie points would be awarded to
> any proposed standard for which a portable open source
> implementation as a layer on top of POSIX threads existed.

Existing practice is certainly desirable.  (Except, of course,
when the effect of the existing practice was to show that it was
a bad idea:-).)  The problem is that a code base isn't a
standard; you still have to specify what it does, and in the
standard, you cannot specify something as "whatever the
underlying OS happens to give you."  (To start with, in order to
do this, you'd have to specify the mapping to the underlying OS.
All underlying OS's.)  If nothing else, you've got to say that
the compiler cannot move code around something like:
    std::scoped_lock( someMutex ) ;
(Posix guarantees it for C, and all of the Posix-compatible C++
compilers extend the guarantee for C++---although this wasn't
always true.  Boost guarantees it for a Posix platform, because
you more or less know the implementation.  Formulating it in
standard'ese, on the other hand, is less simple.)

Beyond that, it's important to realize that different people use
multithreading for different reasons.  For what I do, a few
modifications in Boost.threads would do the trick.  On the other
hand, I know of more than a few people who need to be able to
propagage an exception (of unknown type, but I think it would be
acceptable to limit it to types derived from std::exception)
across a join---this is impossible with Boost as it currently
stands (and without the limitation to std::exception, probably
impossible without explicit compiler support).  And people using
threads to parallize numeric calculations, and such things, need
low lever, non-locking primitives: things like CAS or
atomic_incr.

> I would also suggest adding (in std namespace) the "pseudo"
> function template:

> template <typename T>
> void observe(const T &x) { ... }

> If observe is called on an object x, then for every instance i
> (directly or indirectly) within x, i of a primative type:

> 1.  An observable write to i will fall between
> the sequence point following a (value changing)
> nominal write to i and the sequence point following
> the observe(x) call.
> 2.  An observable read of i will fall between
> the s.p. following the observe(x) call and the
> s.p. following a nominal read of i.

I think it's a bit more complicated than that.  Observable by
whom?  Under what conditions?

An important consideration is being able to the guarantee order
some set of writes becomes visible to other threads.  But it's
part of the larger problem of the memory model in general.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Tue, 22 Aug 2006 22:41:18 CST Raw View

Jiang wrote:
> Several days ago, James Kanze said that thread issues will be
> addressed in next standard.

> http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/7900a13685a8b8c2/47b6d8f6ce67f963#47b6d8f6ce67f963

I said that it was highly likely.  We can't be 100% sure until
the standard is adopted, but the committee is definitly working
on it, and it seems to have a fairly high priority.

> If this is true, I would like to know that:

> 1. How will the language and library overlap?

That is part of what is still under discussion.  Some things
must be handled at the language level.

>    That is, threading will be purposed by library extension
>    (like boost.thread), or adding new language constructs,
>    such as keywords like "synchronized"?

Or something in between.  I'm pretty sure we won't see a
"synchronized" keyword, like the Java fiasco.  (At least I hope
so, and to date, I haven't heard anyone proposing it.)  And I'm
equally sure that the language itself will be modified to "know"
about threading in some ways---if nothing else, threading will
introduce a whole new world of possible undefined behaviors.

> 2. Is it possible that in the recent future we
>    will have a C++ binding for threading? If not, why?

I feel fairly certain that in C++0x, it will be possible to
start threads, join them, and synchronize them, possibly in
different ways (low-level, lock free and mutex or something
similar).  What it will look like is still an open topic---if I
had to guess, I'd say that at least parts of it might look
something like boost::threads.

> 3. This is maybe a little bit off-topic, but it is said
>    that Bjarne Stroustrup has been working with
>    concurrent programming for at least 20 years,
>    for what reason he (together with committee
>    members) did not add threading to the language?

He wanted to be able to implement it under Unix at the time?

Historically, concurrent programming didn't always involve what
we consider threads today.  There are (or at least were) many
models.  (What ever happened to the transputer and Occam?)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Wed, 23 Aug 2006 14:33:47 GMT Raw View

kanze wrote:
> Jiang wrote:

> There are (or at least were) many
> models.  (What ever happened to the transputer and Occam?)

    It turned into the Cell processor and the Playstation 3.
For which ways to run C++ are being frantically developed, hopefully
in time for the holiday shopping season.

    Yes, it's the concurrency system that wouldn't die -
message-passing between tightly coupled non-shared-memory processors.
You thought it died with the ILLIAC IV.  You thought it died with
the BBN Butterfly.  You thought it died with the Ncube.
But it LIVES! It's BACK!  And IT'S COMING FOR XMAS!

    At some point, C++ may have to deal with such machines.  I'd
suggest looking at the papers for GDC 2007 to see what's being
done in that area.  Enough brainpower is now being thrown at this
hardware that something workable will emerge.  Then again,
maybe the XBox 360 (which is a classical 3-CPU shared memory machine)
will win out, and we won't have to deal with the problem.

    Meanwhile, it would be nice if C++ had some features to
easily generate fast marshalling code.  We're going to be
seeing more interprocess and inter-CPU communication.

    John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Jiang" <goo.mail01@yahoo.com>
Date: Wed, 23 Aug 2006 11:16:23 CST Raw View

kanze wrote:
> Jiang wrote:
> > Several days ago, James Kanze said that thread issues will be
> > addressed in next standard.
>
> > http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/7900a13685a8b8c2/47b6d8f6ce67f963#47b6d8f6ce67f963
>
> I said that it was highly likely.  We can't be 100% sure until
> the standard is adopted, but the committee is definitly working
> on it, and it seems to have a fairly high priority.
>

Sorry for my poor quote, actuallly I meant the thread issues
*possbly* will be addressed in next standard.

> > If this is true, I would like to know that:
>
> > 1. How will the language and library overlap?
>
> That is part of what is still under discussion.  Some things
> must be handled at the language level.
>
> >    That is, threading will be purposed by library extension
> >    (like boost.thread), or adding new language constructs,
> >    such as keywords like "synchronized"?
>
> Or something in between.  I'm pretty sure we won't see a
> "synchronized" keyword, like the Java fiasco.  (At least I hope
> so, and to date, I haven't heard anyone proposing it.)  And I'm
> equally sure that the language itself will be modified to "know"
> about threading in some ways---if nothing else, threading will
> introduce a whole new world of possible undefined behaviors.
>

Agree. And without this kind of knowledge, some basic
constructs can not work easily without special treatment,
for an quick example,  Exception handling comes to mind.


> > 2. Is it possible that in the recent future we
> >    will have a C++ binding for threading? If not, why?
>
> I feel fairly certain that in C++0x, it will be possible to
> start threads, join them, and synchronize them, possibly in
> different ways (low-level, lock free and mutex or something
> similar).  What it will look like is still an open topic---if I
> had to guess, I'd say that at least parts of it might look
> something like boost::threads.
>

That is also my understanding about the current situation.
I do not think Bjarne Stroustrup will change the language
a lot, since he said many times that library extensions
will be preferred to language extensions.


> > 3. This is maybe a little bit off-topic, but it is said
> >    that Bjarne Stroustrup has been working with
> >    concurrent programming for at least 20 years,
> >    for what reason he (together with committee
> >    members) did not add threading to the language?
>
> He wanted to be able to implement it under Unix at the time?
>

Really? I am not sure, but Unix(s) at that time does not
have any thread model in my mind.

> Historically, concurrent programming didn't always involve what
> we consider threads today.  There are (or at least were) many
> models.  (What ever happened to the transputer and Occam?)
>

Yes, what you said is possibly one of the most significant
reason for why we do not have a concurrent programming
model in C++.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 23 Aug 2006 12:42:46 CST Raw View

Jiang wrote:
> kanze wrote:
> > Jiang wrote:

    [...]
> > > 3. This is maybe a little bit off-topic, but it is said
> > >    that Bjarne Stroustrup has been working with
> > >    concurrent programming for at least 20 years,
> > >    for what reason he (together with committee
> > >    members) did not add threading to the language?

> > He wanted to be able to implement it under Unix at the time?

> Really? I am not sure, but Unix(s) at that time does not
> have any thread model in my mind.

Exactly.  Since the OS didn't support concurrency, he didn't put
it in the language.

I've seen some early attempts to implement threaded systems in
C++ on top of an OS which didn't support them.  They weren't
pretty.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kuyper@wizard.net
Date: Wed, 23 Aug 2006 12:42:15 CST Raw View

Jiang wrote:
.
> I do not think Bjarne Stroustrup will change the language
> a lot, since he said many times that library extensions
> will be preferred to language extensions.

It's not Stroustrup's language anymore; it's the standards committee
which will decide whether and how the language is changed. I'm sure he
has opinions on the matter, and they're likely to be given considerable
weight by the committee. However, since the language was standardized,
all that Stroustrup can change by his own decisions are his own
implementation of C++, not the official C++ language itself.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Jiang" <goo.mail01@yahoo.com>
Date: Wed, 23 Aug 2006 21:04:26 CST Raw View

SuperKoko wrote:
> Jiang wrote:
> > Several days ago, James Kanze said that thread issues will be addressed
> > in next standard.
> >
> > http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/7900a13685a8b8c2/47b6d8f6ce67f963#47b6d8f6ce67f963
> >
> > If this is true, I would like to know that:
> >
> > 1. How will the language and library overlap?
> >
> >    That is, threading will be purposed by library
> >    extension (like boost.thread), or adding new
> >    language constructs, such as keywords like
> >    "synchronized"?
> >
> I hope it'll be a library facility.
> Languages having builtin threading tend to impose a single very
> particular mean to do multithreading and it's easy to reach the limits
> of those models.
>

Maybe this is true. But if this "single very particular mean" helps,
then do we really need the freedom for threading?

For example, compared with

void foo()
{
    scoped_lock(mutex);        // OK, RAII used here

    // access the resource...
}

, the following function bar

void synchronized bar()
{
    // access the resource...
}

is much clean and well controlled in my mind.

Here programmers do not have to remember RAII,
well, if RAII is parts of the language. Compared
with lock/unlock, constructor/destructor is much
reliable, but why not make a futher step?

The problem is,  even we have a thread library, lots of
low level details must be handled by our programmers.
To fight with the complexity of threading, we really need
necessary language level abstraction in my mind.

Just consider this, the read_write_mutex in boost.thread
was removed due to problems with deadlocks. If those
best experts can not handle the low level issues
correctly (although it's rare case, of course), IMHO,
I would like to use the single particular, but controllable
one. Freedom here does not benefit too much for me.


> > 2. Is it possible that in the recent future we
> >    will have a C++ binding for threading? If not, why?
> >
> Yes, it is possible that in C++0x we will have a C++ binding for
> threading.
>

Glad to hear that.

> I'm sure that many people would say that we already have POSIX. So, why
> would we have to add anything else.
> In practice POSIX is not omni-present and perhaps the WG21 can add a
> library layer abstracting platform differences, if possible, better
> than POSIX and in a more OO way.
> That's why I prefer programming in POSIX+C++ than in Ada or any other
> language containing an intrinsic concurrent model.
>

Yes, without a true C++ binding, threading using c++ always
invokes undefined behavior.


> > 3. This is maybe a little bit off-topic, but it is said
> >    that Bjarne Stroustrup has been working with
> >    concurrent programming for at least 20 years,
> >    for what reason he (together with committee
> >    members) did not add threading to the language?
> >    C++ predates POSIX? Lack of manpower? Or simply
> >    the complicity?
> It might be due to the fact that normalizing concurrent programming is
> a hard work and the committee had better to normalize the existing C++
> implementations in 1998.
> Now for C++0x, this issue has been raised:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1815.html
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1907.html
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1940.pdf
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2043.html
>
> Moreover, it requires "testing" that. i.e. before normalizing something
> we must have several C++ implementations successfully implementing the
> feature.
>
> To get more info about the papers currently proposed:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/


Roger that. I read some of them but not all, thank you for your time.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Thu, 24 Aug 2006 09:52:57 CST Raw View

Jiang wrote:
> SuperKoko wrote:
> > Jiang wrote:
> > > Several days ago, James Kanze said that thread issues will be addressed
> > > in next standard.

> > > http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/7900a13685a8b8c2/47b6d8f6ce67f963#47b6d8f6ce67f963

> > > If this is true, I would like to know that:

> > > 1. How will the language and library overlap?

> > >    That is, threading will be purposed by library
> > >    extension (like boost.thread), or adding new
> > >    language constructs, such as keywords like
> > >    "synchronized"?

> > I hope it'll be a library facility.
> > Languages having builtin threading tend to impose a single very
> > particular mean to do multithreading and it's easy to reach the limits
> > of those models.

> Maybe this is true. But if this "single very particular mean"
> helps, then do we really need the freedom for threading?

> For example, compared with

> void foo()
> {
>     scoped_lock(mutex);        // OK, RAII used here
>
>     // access the resource...
> }

> , the following function bar

> void synchronized bar()
> {
>     // access the resource...
> }

> is much clean and well controlled in my mind.

Except that even in Java, I've almost never found a case where
the scope of a single function corresponded exactly to the scope
necessary for a lock; in fact, the coding guidelines where I was
working ended up banning synchronized functions.

Much of the time, a synchronized block was appropriate, and of
course, scoped_lock emulates this very well---I don't see
anything to make you want to prefer one over the other.

And of course, "much of the time" isn't always.  There are cases
where you want or need a lock whose lifetime doesn't correspond
to that of a scope.  If you have a synchronized block in the
language, you need a fairly complex control object (using
conditionals) to implement it.  If you only have scoped_lock,
you need something like shared_ptr( new scoped_lock ).  And if
the locking primitives are available outside of scoped_lock,
it's even easier.

> Here programmers do not have to remember RAII,
> well, if RAII is parts of the language. Compared
> with lock/unlock, constructor/destructor is much
> reliable, but why not make a futher step?

Because it is more constraining.  And at the function level,
doesn't really work in practice.

> The problem is,  even we have a thread library, lots of
> low level details must be handled by our programmers.

Library or language won't change that.  Problems such as correct
memory synchronization exist, and we can't legislate them away.

> To fight with the complexity of threading, we really need
> necessary language level abstraction in my mind.

Higher level abstractions do help.  But many of them are
application dependent.  And I don't think that there is enough
time before the next version to standardize the few which
aren't---we'd have to make them publicly available somehow
(ideally in boost), and get people using them, so we get some
feedback.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Earl Purple" <earlpurple@gmail.com>
Date: Thu, 24 Aug 2006 09:55:53 CST Raw View

Jiang wrote:

> SuperKoko wrote:
> > I hope it'll be a library facility.
> > Languages having builtin threading tend to impose a single very
> > particular mean to do multithreading and it's easy to reach the limits
> > of those models.
>
> Maybe this is true. But if this "single very particular mean" helps,
> then do we really need the freedom for threading?
>
> For example, compared with
>
> void foo()
> {
>     scoped_lock(mutex);        // OK, RAII used here
>
>     // access the resource...
> }
>
> , the following function bar
>
> void synchronized bar()
> {
>     // access the resource...
> }
>
> is much clean and well controlled in my mind.

There are 3 locking issues here:
- A mutex. Covers all situations
- A critical section. Only one thread is allowed to access this thread
at a time
- An atomic section of code. This effectively means that all other
threads must wait while
this block of code is completed. With a genuine dual-processor there
really are 2 threads
running at a time so this may be harder to enforce.

With a mutex, you have different mutexes but the same mutex could be
locked in multiple parts of
the code which clash with each other

> Here programmers do not have to remember RAII,
> well, if RAII is parts of the language. Compared
> with lock/unlock, constructor/destructor is much
> reliable, but why not make a futher step?

It's not a matter of programmers "remembering RAII". RAII is there so
that programmers don't
have to remember to release resources.

> The problem is,  even we have a thread library, lots of
> low level details must be handled by our programmers.

Low-level details would be handled by the writers of the standard
libraries rather than by the
compiler manufacturers, who may be different. I would prefer it if my
code were compiled directly
in machine code rather than the compiler having to keep compiling a
standard library of templates
time and again.

Also, if the library were to be written on a UNIX system to wrap
pthreads then would it use a header
file <cpthread> to avoid name-clashing or would the entire pthread
library be automatically "included"
into your source when you weren't actually using it directly. Similarly
on Windows with their standard
library and on any other platform that has multi-threading.

> Just consider this, the read_write_mutex in boost.thread
> was removed due to problems with deadlocks.

No matter how good a threads library you write, it is up to the
application-level programmer to ensure
that there are no deadlocks. The purpose of a library is to aid good
programming, not to prevent bad
programming. Of course you do put in some protection so that
programmers won't do the wrong thing.

The potential problem of read-write locks is writer starvation. If
there are always readers about, the
writer may never get a chance to write. This can be prevented by adding
an addition mutex - the
reader and writer both must acquire the mutex before acquiring the
read-write lock, but there is a
difference. The read releases the mutex immediately on acquiring it,
the writer holds onto the mutex
until it gets the read-write lock. That means the mutex remains locked
if there is a writer waiting to
write, and new readers cannot read although the existing ones may
continue reading. In practice
this is not a bad thing as although the readers may wait a bit, they
will eventually read the most
updated information.

Now if you have to implement a library on top of what you already have,
then on a POSIX system you
would probably implement read-write locks on top of pthread_rwlock_init
etc. There is, as far as I'm
aware no defined behaviour as to whether writers get priority in such a
library so to be safe you may
well add it in to your own.

>If those
> best experts can not handle the low level issues
> correctly (although it's rare case, of course), IMHO,
> I would like to use the single particular, but controllable
> one. Freedom here does not benefit too much for me.

The issues with boost is that they are trying to write a library for
all systems, but different systems
do things differently. On Windows there is no concept of rwlock, you
have to implement the whole
thing yourself using just mutex and semaphore etc.

That would be the same for any "standard" library for threads. A
language feature would mean that
your code would be directly compiled to the relevant machine code.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: gwesp@otaku.freeshell.org (Gerhard Wesp)
Date: Thu, 24 Aug 2006 15:18:37 GMT Raw View

John Nagle <nagle@animats.com> wrote:
>    Meanwhile, it would be nice if C++ had some features to
> easily generate fast marshalling code.  We're going to be
> seeing more interprocess and inter-CPU communication.

I advocate some form of introspection to iterate over class members.
Then we could write:

struct foo { int a; double b; string s; };

for(auto x in foo) {write_binary(x);}

And it could be used for lots of other tasks as well.

Regards
-Gerhard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Thu, 24 Aug 2006 10:19:32 CST Raw View

kuyper@wizard.net wrote:
> Jiang wrote:

> > I do not think Bjarne Stroustrup will change the language
> > a lot, since he said many times that library extensions
> > will be preferred to language extensions.

> It's not Stroustrup's language anymore; it's the standards committee
> which will decide whether and how the language is changed. I'm sure he
> has opinions on the matter, and they're likely to be given considerable
> weight by the committee. However, since the language was standardized,
> all that Stroustrup can change by his own decisions are his own
> implementation of C++, not the official C++ language itself.

I think that there is a general consensus in the C++ community,
and in the committee, that all other things being equal, a
library solution is preferable.  I think that there is also a
very large consensus as to the need to support threading, and a
general recognition that this requires something in the
language.

I don't want to speak for Stroustrup, but at the last meeting,
he definitly gave the impression of concurring with this
consensus.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Fri, 25 Aug 2006 09:49:57 CST Raw View

Earl Purple wrote:
> Jiang wrote:

> > SuperKoko wrote:
> > > I hope it'll be a library facility.
> > > Languages having builtin threading tend to impose a single
> > > very particular mean to do multithreading and it's easy to
> > > reach the limits of those models.

> > Maybe this is true. But if this "single very particular
> > mean" helps, then do we really need the freedom for
> > threading?

> > For example, compared with

> > void foo()
> > {
> >     scoped_lock(mutex);        // OK, RAII used here
> >
> >     // access the resource...
> > }

> > , the following function bar

> > void synchronized bar()
> > {
> >     // access the resource...
> > }

> > is much clean and well controlled in my mind.

> There are 3 locking issues here:
> - A mutex. Covers all situations
> - A critical section. Only one thread is allowed to access this thread
> at a time
> - An atomic section of code. This effectively means that all
> other threads must wait while this block of code is completed.
> With a genuine dual-processor there really are 2 threads
> running at a time so this may be harder to enforce.

Don't confuse names with principles.  Mutex, critical section
and atomic section (as you define it---I've never heard the term
before) all mean more or less the same thing.  (Normally, I
would use critical section for the area of code being protected,
and mutex for the mechanism protecting it.  Historically, too,
there were other ways of ensuring exclusion in a critical
section---on the processors I started with, I'd disable
interrupts, for example.)

There are also atomic operations, which work without using
external protection.

> With a mutex, you have different mutexes but the same mutex
> could be locked in multiple parts of the code which clash with
> each other

I'm not sure what you're trying to say here.  But I think your
point is that two different functions need to access the same
data, and thus use the same mutex.  In the Java model of
synchronous functions, this works for member data, since all
member functions synchronize on the same data.

Of course, most of the time you use external synchronization,
you are updating several different objects, and need to ensure
that no other thread interrupts between the updates.  Which
means synchronizing with the same mutex in functions of
different objects.  (In at least one case, I ran the application
as if threads weren't preemptive.  The application had exactly
one mutex, which was only released when a thread explicitly
wanted to allow other threads through---typically, when it
needed to wait for IO or something like that.  It made thread
safety a lot, lot easier, and on a single processor machine,
total throughput was considerably higher than it would have been
with finer grained locking.)

> > Here programmers do not have to remember RAII,
> > well, if RAII is parts of the language. Compared
> > with lock/unlock, constructor/destructor is much
> > reliable, but why not make a futher step?

> It's not a matter of programmers "remembering RAII". RAII is
> there so that programmers don't have to remember to release
> resources.

In the Java model, locks have to be handled by the language,
precisely because there is no RAII.  A lock is a resource where
RAII is usually an important simplification.

I'm not really sure about the Jiang's point here.  Locks at the
function level don't work well, because the granularity of
locking rarely corresponds to the granularity of a
function---you end up either holding the lock to long, or
twisting your design to make the functions fit the required
locking.  And the difference between block level locking and
RAII seems very slim to me:

block locking:

    void f()
    {
        //  ...
        synchronize mutexObject {
           //  locked section...
        }
        //  ...
    }

RAII:

    void f()
    {
        //  ...
        {
            scoped_lock l( mutexObject ) ;
            //  locked section...
        }
        //  ...
    }

Block locking has the advantage that you don't need to invent a
name for the scoped_lock object.  On the other hand, in the RAII
solution, you frequently don't need the inner block, and in
some, very simple cases, you can even write things like:

    scoped_lock( mutexObject ), whatever() ;

(This works as long as whatever is a simple expression.  Whether
it is a good idea on readability grounds is another question.  I
can't say that I really like it, but the possibility does
exist.)

More generally, the RAII idiom also allows things like:

    std::auto_ptr< scoped_lock >
    f()
    {
        std::auto_ptr< scoped_lock > l ;
        // ...
        return l ;
    }

Posession of the lock is not tied to function scope.

> > The problem is,  even we have a thread library, lots of
> > low level details must be handled by our programmers.

> Low-level details would be handled by the writers of the
> standard libraries rather than by the compiler manufacturers,
> who may be different. I would prefer it if my code were
> compiled directly in machine code rather than the compiler
> having to keep compiling a standard library of templates time
> and again.

Well, everything ultimately compiles to machine code.  I'm not
quite sure what your point is here---it sounds like you are
arguing against the library solution (since in the library
solution, the compiler would have to keep compiler a standard
library again and again).

> Also, if the library were to be written on a UNIX system to
> wrap pthreads then would it use a header file <cpthread> to
> avoid name-clashing or would the entire pthread library be
> automatically "included" into your source when you weren't
> actually using it directly. Similarly on Windows with their
> standard library and on any other platform that has
> multi-threading.

I don't get this.  Why would the compiler have to automatically
include anything, any more than it does for e.g. new or typeid
(both of which depend on "library" functions or classes)?

> > Just consider this, the read_write_mutex in boost.thread
> > was removed due to problems with deadlocks.

> No matter how good a threads library you write, it is up to
> the application-level programmer to ensure that there are no
> deadlocks. The purpose of a library is to aid good
> programming, not to prevent bad programming. Of course you do
> put in some protection so that programmers won't do the wrong
> thing.

> The potential problem of read-write locks is writer
> starvation. If there are always readers about, the writer may
> never get a chance to write.

That depends on how they are implemented.  Normally, I would
expect that as soon as a writer is waiting further read requests
suspend until it has finished.

> This can be prevented by adding an addition mutex - the reader
> and writer both must acquire the mutex before acquiring the
> read-write lock, but there is a difference. The read releases
> the mutex immediately on acquiring it, the writer holds onto
> the mutex until it gets the read-write lock. That means the
> mutex remains locked if there is a writer waiting to write,
> and new readers cannot read although the existing ones may
> continue reading. In practice this is not a bad thing as
> although the readers may wait a bit, they will eventually read
> the most updated information.

There's no reason to be that complicated.  If an implementation
doesn't provide read/write locks, they can easily be simulated
with a single conditional.  Regardless of the desired locking
policy.

> Now if you have to implement a library on top of what you
> already have, then on a POSIX system you would probably
> implement read-write locks on top of pthread_rwlock_init etc.
> There is, as far as I'm aware no defined behaviour as to
> whether writers get priority in such a library so to be safe
> you may well add it in to your own.

The defined behavior depends partially on the implementation. If
the Thread Execution Scheduling option is supported, read-write
locks are required to work correctly; otherwise, it is
implementation defined.  (Solaris documents them as working
correctly, and I suspect that this is true on most
implementations---pthread_rwlock_rdlock will block if there is a
thread waiting to write, even if it doesn't have the lock.)

> > If those best experts can not handle the low level issues
> > correctly (although it's rare case, of course), IMHO, I
> > would like to use the single particular, but controllable
> > one. Freedom here does not benefit too much for me.

> The issues with boost is that they are trying to write a
> library for all systems, but different systems do things
> differently. On Windows there is no concept of rwlock, you
> have to implement the whole thing yourself using just mutex
> and semaphore etc.

Which shouldn't be that difficult, since they have implemented
conditional variables (which is what you need to implement a
read-write lock).

> That would be the same for any "standard" library for threads.
> A language feature would mean that your code would be directly
> compiled to the relevant machine code.

The distinction isn't that black and white, consider typeid and
std::type_info.  I don't know what C++ threading will look like
in its final version, but I do know that it will have some
language support, and that there will be library parts as well.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Earl Purple" <earlpurple@gmail.com>
Date: Mon, 21 Aug 2006 05:07:49 CST Raw View

SuperKoko wrote:
> I hope it'll be a library facility.
> Languages having builtin threading tend to impose a single very
> particular mean to do multithreading and it's easy to reach the limits
> of those models.

Does a "library facility" mean a "standard" interface that is written
to wrap
what is already there? In that case, your STL will probably come with a
bunch
of pre-processors.

I'd rather see it implemented with a C++ runtime library rather than
simply a
bunch of "wrapper" classes for pthreads or whatever happens to be the
standard C
threading library on the system.

> I'm sure that many people would say that we already have POSIX. So, why
> would we have to add anything else.

Because POSIX is a C interface and we don't want to have to write our
programs in C
just beause they happen to compile. A good C++ library would

- use RAII locks, preferably with move-semantics.
- exceptions if thread-creation fails rather than returning error
codes. No need to call
"get_last_error()" or any equivalent because the exception will hold
the reason.
- use a class method for thread-creation, not a function pointer.

The only downside of course would be possibly having to drop my own
threading classes
in favour of the new "standard" ones. Actually it was fairly tricky to
implement condition-variables
together with my Mutex locks because condition variables silently
modify the mutex state so I
had to couple them in with Mutex. At least I was able to get away with
this using only
a forward declaration and awarding friendship.

Whilst I would like the basic functionality to be a "language" feature
(i.e. not simply coded as
a wrapper for the C library that is already there) I would like to see
a library with it. For example,
I have a very useful producer-consumer-queue "collection" (uses
std::deque), and there could
possibly be other synchronised collections.

What would possibly be good is to be able to have keywords like
"atomic" so you could put in
your code. Maybe also a keyword atomic_if thus:

atomic_if( ---x == 0 )
{
 // block
}

which would guarantee atomic decrement with result checking.

By the way, there is already one feature of thread-synchronicity in the
standard language - the
keyword "volatile". Outside of a multi-threaded environment I don't
know what purpose the keyword
serves.

> That's why I prefer programming in POSIX+C++ than in Ada or any other
> language containing an intrinsic concurrent model.

The presence of having language features won't prevent you from
extending the library set by writing
libraries of your own.

When you think about it, all of STL and boost is no more than extending
what was already there in
the language.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kwikius" <andy@servocomm.freeserve.co.uk>
Date: Mon, 21 Aug 2006 10:07:59 CST Raw View

Earl Purple wrote:

<...>

> By the way, there is already one feature of thread-synchronicity in the
> standard language - the
> keyword "volatile". Outside of a multi-threaded environment I don't
> know what purpose the keyword
> serves.

That's easy ... I/O ports.

regards
Andy Little

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Mon, 21 Aug 2006 10:09:36 CST Raw View

Earl Purple wrote:
> SuperKoko wrote:
> > I hope it'll be a library facility.

It can't be purely a library facility, and still be portable.
Even Posix makes requirements with regards to what the compilers
can do, and what user code is allowed to do.  Language
requirements.

> > Languages having builtin threading tend to impose a single
> > very particular mean to do multithreading and it's easy to
> > reach the limits of those models.

> Does a "library facility" mean a "standard" interface that is
> written to wrap what is already there?

Well, I don't think anyone is proposing something that will
require modifying the OS.  If a proposal cannot be implemented
under Unix and Windows, as they currently stand, then I don't
think it will fly.

> In that case, your STL will probably come with a bunch of
> pre-processors.

> I'd rather see it implemented with a C++ runtime library
> rather than simply a bunch of "wrapper" classes for pthreads
> or whatever happens to be the standard C threading library on
> the system.

I've mentionned Boost before.  It's certainly being considered.
I can't image it being adopted "as is", without any changes, but
I can't imagine it being ignored, either.

> > I'm sure that many people would say that we already have
> > POSIX. So, why would we have to add anything else.

> Because POSIX is a C interface and we don't want to have to
> write our programs in C just beause they happen to compile. A
> good C++ library would

> - use RAII locks, preferably with move-semantics.

Would support RAII locks.  You certainly don't want to impose
them; that comes down to Java's synchronize.

> - exceptions if thread-creation fails rather than returning
> error codes. No need to call "get_last_error()" or any
> equivalent because the exception will hold the reason.

I think that will largely depend on what a thread actually is,
but all in all, I suspect that the error reporting mechanism
will be exceptions.

> - use a class method for thread-creation, not a function
> pointer.

Most likely, something along the lines of boost::function, I
would guess.  The best of both worlds, so to speak.  (Or a
refusal to committee to one paradigm.)

> The only downside of course would be possibly having to drop
> my own threading classes in favour of the new "standard" ones.
> Actually it was fairly tricky to implement condition-variables
> together with my Mutex locks because condition variables
> silently modify the mutex state so I had to couple them in
> with Mutex. At least I was able to get away with this using
> only a forward declaration and awarding friendship.

Condition variables ARE tied in with Mutexes.  In every
interface I've seen.

One can, of course, conceive of higher level structures, which
encapsulate a number of things; I've got a message queue that I
use a lot in my own code, for example.  Given the time frame and
the available existing practice, I don't expect any such things
in the standard.

I'm just guessing here, and I could be way off, but I sort of
expect:

 -- a very precise definition of the memory model in a
    multi-threaded context,

 -- a set of library components more or less similar to
    boost::threads, and

 -- probably, because there seem to be a couple of experts who
    think important, and are willing to work on it, some lower
    level synchronization primitives.

> Whilst I would like the basic functionality to be a "language"
> feature (i.e. not simply coded as a wrapper for the C library
> that is already there) I would like to see a library with it.
> For example, I have a very useful producer-consumer-queue
> "collection" (uses std::deque), and there could possibly be
> other synchronised collections.

I'd like such things too, but I think it's asking too much given
the time frame we have to work in.

> What would possibly be good is to be able to have keywords
> like "atomic" so you could put in your code. Maybe also a
> keyword atomic_if thus:

> atomic_if( ---x == 0 )
> {
>  // block
> }

> which would guarantee atomic decrement with result checking.

There is a desire for some primitives along those lines, and
work is being done on them.

> By the way, there is already one feature of
> thread-synchronicity in the standard language - the keyword
> "volatile". Outside of a multi-threaded environment I don't
> know what purpose the keyword serves.

As it currently stands, it serves no purpose in a multithreaded
environment, and no purpose whatsoever as implemented in the
Sparc or PC Linux compilers I have access to.  It's original
purpose (support for memory mapped IO), of course, doesn't
concern programs in user environments on such machines; it could
be relevant for code in the kernel, however, if the compilers
actually implemented what was needed.

Microsoft has "redefined" volatile for use in multi-threading,
and have proposed that definition as a modification to the
standard.  I'm not sure what the opinion of the committee is on
it; if nothing else, their redefinition corresponds to a common
misconception concerning current practice.  (On the other hand,
they don't seem to have implemented it in the compiler in Visual
Studios 2005.)

> > That's why I prefer programming in POSIX+C++ than in Ada or
> > any other language containing an intrinsic concurrent model.

> The presence of having language features won't prevent you
> from extending the library set by writing libraries of your
> own.

> When you think about it, all of STL and boost is no more than
> extending what was already there in the language.

But you can't do threading in just the library.  There are
language level issues that must be addresses, such as memory
synchronization.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Mon, 21 Aug 2006 16:12:41 GMT Raw View

kanze wrote:
> But you can't do threading in just the library.  There are
> language level issues that must be addresses, such as memory
> synchronization.

     True.  That really is a language issue, because it may be necessary
to use instructions, like test-and-set, or compare-and-swap where
available, that the compiler would not otherwise emit.
Current solutions in that area tend to involve assembly language
library functions or system calls, often with a performance
penalty.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Jiang" <goo.mail01@yahoo.com>
Date: Fri, 18 Aug 2006 09:58:51 CST Raw View

Several days ago, James Kanze said that thread issues will be addressed
in next standard.

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/7900a13685a8b8c2/47b6d8f6ce67f963#47b6d8f6ce67f963

If this is true, I would like to know that:

1. How will the language and library overlap?

   That is, threading will be purposed by library
   extension (like boost.thread), or adding new
   language constructs, such as keywords like
   "synchronized"?

2. Is it possible that in the recent future we
   will have a C++ binding for threading? If not, why?

3. This is maybe a little bit off-topic, but it is said
   that Bjarne Stroustrup has been working with
   concurrent programming for at least 20 years,
   for what reason he (together with committee
   members) did not add threading to the language?
   C++ predates POSIX? Lack of manpower? Or simply
   the complicity?
   For this question I traced into D&E, but without
   any luck.

Any comments are welcome.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Sat, 19 Aug 2006 10:03:49 CST Raw View

Jiang wrote:
> Several days ago, James Kanze said that thread issues will be addressed
> in next standard.
>
> http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/7900a13685a8b8c2/47b6d8f6ce67f963#47b6d8f6ce67f963
>
> If this is true, I would like to know that:
>
> 1. How will the language and library overlap?
>
>    That is, threading will be purposed by library
>    extension (like boost.thread), or adding new
>    language constructs, such as keywords like
>    "synchronized"?
>
I hope it'll be a library facility.
Languages having builtin threading tend to impose a single very
particular mean to do multithreading and it's easy to reach the limits
of those models.

> 2. Is it possible that in the recent future we
>    will have a C++ binding for threading? If not, why?
>
Yes, it is possible that in C++0x we will have a C++ binding for
threading.

I'm sure that many people would say that we already have POSIX. So, why
would we have to add anything else.
In practice POSIX is not omni-present and perhaps the WG21 can add a
library layer abstracting platform differences, if possible, better
than POSIX and in a more OO way.
That's why I prefer programming in POSIX+C++ than in Ada or any other
language containing an intrinsic concurrent model.

> 3. This is maybe a little bit off-topic, but it is said
>    that Bjarne Stroustrup has been working with
>    concurrent programming for at least 20 years,
>    for what reason he (together with committee
>    members) did not add threading to the language?
>    C++ predates POSIX? Lack of manpower? Or simply
>    the complicity?
It might be due to the fact that normalizing concurrent programming is
a hard work and the committee had better to normalize the existing C++
implementations in 1998.
Now for C++0x, this issue has been raised:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1815.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1907.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1940.pdf
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2043.html

Moreover, it requires "testing" that. i.e. before normalizing something
we must have several C++ implementations successfully implementing the
feature.

To get more info about the papers currently proposed:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]