Topic: Proposed alternative approach to specifying


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Wed, 15 Feb 2017 14:01:47 -0800 (PST)
Raw View
------=_Part_2734_1002937724.1487196107380
Content-Type: multipart/alternative;
 boundary="----=_Part_2735_1886748262.1487196107380"

------=_Part_2735_1886748262.1487196107380
Content-Type: text/plain; charset=UTF-8

On Wednesday, February 15, 2017 at 2:17:51 PM UTC-5, Walt Karas wrote:
>
> - In a program execution, each thread defines a nominal order of (thread
> local) memory and fence operations.
> - For any operations X and Y in thread T, either X before Y, or Y before X.
> - A program execution defines a nominal global order of global memory
> operations.
> - If X and Y are atomic global operations, then either X before Y or Y
> before X in the global order.
> - If X is a global operation, and there exists a global store S where
> neither X before S nor S before X in the global order, then the result of X
> is undefined.
>

This rule needs the additional condition that X and S affect the same
memory location.


> - A program execution defines a partial function f(T, LO) -> GO where LO
> is a local memory operation in thread T, and GO is a global operation.  If
> LO is atomic, then GO must be atomic. The result of LO is the result of
> GO.  If the result of GO is undefined, the result of LO is undefined.
> (Even if f(T, LO) is not required to exist, it none the less _may_ exist.)
> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T, and
> f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in the
> global order is not allowed.
> - If:
> 1.  In a thread T, X and Y are memory operations, and F is a fence
> operation.
> 2.  X before F and F before Y.
> 3.  F is sequentially consistent and both X and Y are atomic, or
> 4.  F is acquire and both X and Y are loads, or
> 5.  F is release and both X and Y are stores.
> then F is activated for X and Y.
> - In a thread T, if X and Y are memory operations (where X before Y) with
> an activated fence F, then f(T, X) must exist.
> - For every global operation GO, there must exist a local operation LO in
> some thread T where f(T, LO) -> GO.  (Assuming no intense gamma radiation.)
> - A sequentially consistent (thread local) atomic memory operation implies
> two sequentially consistent fences, one before and one after it (as well as
> a preceding release for a store, and a succeeding acquire for a load).
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/4e917873-c4ea-4552-9208-b0a196399815%40isocpp.org.

------=_Part_2735_1886748262.1487196107380
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wednesday, February 15, 2017 at 2:17:51 PM UTC-5, Walt =
Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><=
div>- In a program execution,=C2=A0each thread defines a nominal order of (=
thread local) memory and fence operations.</div><div>- For any operations X=
 and Y in thread T, either X before Y, or Y before X.</div><div>- A program=
 execution defines a nominal global order of global memory operations.</div=
><div>- If X and Y are atomic global operations, then either X before Y or =
Y before X in the global order.</div><div>- If X is a global operation, and=
 there exists a global store S where neither X before S nor S before X in t=
he global order, then the result of X is undefined.</div></div></blockquote=
><div><br></div><div>This rule needs the additional condition that X and S =
affect the same memory location.</div><div>=C2=A0</div><blockquote class=3D=
"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc s=
olid;padding-left: 1ex;"><div dir=3D"ltr"><div>- A program execution define=
s a partial function f(T, LO) -&gt; GO where LO is a local memory operation=
 in thread T, and GO is a global operation.=C2=A0 If LO is atomic, then GO =
must be atomic. The result of LO is the result of GO.=C2=A0 If the result o=
f GO is undefined, the result of LO is undefined.=C2=A0 (Even if f(T, LO) i=
s not required to exist, it none the less _may_ exist.)</div><div>- If LO1 =
and LO2 are operations in thread T, and LO1 before LO2 in T, and f(T, LO1) =
and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in the global ord=
er is not allowed.</div><div>- If:</div><div>1.=C2=A0 In a thread T, X and =
Y are memory operations, and F is a fence operation.</div><div>2.=C2=A0 X b=
efore=C2=A0F and=C2=A0F before Y.</div><div>3.=C2=A0 F is sequentially cons=
istent and both X and Y are atomic, or</div><div>4.=C2=A0 F is acquire and =
both X and Y are loads, or</div><div>5.=C2=A0 F is release and both X and Y=
 are stores.</div><div>then F is activated for X and Y.</div><div>-=C2=A0In=
 a thread T, if X and Y are memory operations (where X before Y) with an ac=
tivated fence F, then f(T, X) must exist.</div><div>- For every global oper=
ation GO, there must exist a local operation LO in some thread T where f(T,=
 LO) -&gt; GO.=C2=A0 (Assuming no intense gamma radiation.)</div><div>- A s=
equentially consistent (thread local) atomic memory operation implies two s=
equentially consistent fences, one before and one after it (as well as a pr=
eceding release for a store, and a succeeding acquire for a load).</div></d=
iv></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/4e917873-c4ea-4552-9208-b0a196399815%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/4e917873-c4ea-4552-9208-b0a196399815=
%40isocpp.org</a>.<br />

------=_Part_2735_1886748262.1487196107380--

------=_Part_2734_1002937724.1487196107380--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 15 Feb 2017 14:14:02 -0800 (PST)
Raw View
------=_Part_2591_1883450940.1487196842791
Content-Type: multipart/alternative;
 boundary="----=_Part_2592_1251921240.1487196842791"

------=_Part_2592_1251921240.1487196842791
Content-Type: text/plain; charset=UTF-8

First, what is a "global memory operation"?

Second, why do you feel that the memory model definition needs to be
changed in the way you suggest? The C++ data race and memory model
specification was passed through the hands of a number of experts in both
compiler design and threading at the CPU level. The existing wording was
carefully crafted in this regard.

We should not completely rewrite the current wording unless there is a
significant deficiency in the current wording. So what's wrong with the
current wording?

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3cdf2397-4fd3-4df4-9cb7-108db9ae2539%40isocpp.org.

------=_Part_2592_1251921240.1487196842791
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">First, what is a &quot;global memory operation&quot;?<br><=
br>Second, why do you feel that the memory model definition needs to be cha=
nged in the way you suggest? The C++ data race and memory model specificati=
on was passed through the hands of a number of experts in both compiler des=
ign and threading at the CPU level. The existing wording was carefully craf=
ted in this regard.<br><br>We should not completely rewrite the current wor=
ding unless there is a significant deficiency in the current wording. So wh=
at&#39;s wrong with the current wording?<br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/3cdf2397-4fd3-4df4-9cb7-108db9ae2539%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/3cdf2397-4fd3-4df4-9cb7-108db9ae2539=
%40isocpp.org</a>.<br />

------=_Part_2592_1251921240.1487196842791--

------=_Part_2591_1883450940.1487196842791--

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Wed, 15 Feb 2017 14:32:18 -0800 (PST)
Raw View
------=_Part_2545_1352027344.1487197938885
Content-Type: multipart/alternative;
 boundary="----=_Part_2546_1656812737.1487197938886"

------=_Part_2546_1656812737.1487197938886
Content-Type: text/plain; charset=UTF-8



On Wednesday, February 15, 2017 at 5:14:03 PM UTC-5, Nicol Bolas wrote:
>
> First, what is a "global memory operation"?
>

A read or a write of one or more byes at consecutive addresses.
Nominally, each thread has its own memory, and global memory is also
distinct.


>
> Second, why do you feel that the memory model definition needs to be
> changed in the way you suggest? The C++ data race and memory model
> specification was passed through the hands of a number of experts in both
> compiler design and threading at the CPU level. The existing wording was
> carefully crafted in this regard.
>

I personally have not read any description that I've understood.  That of
course is my problem, but I don't think there is any harm in suggesting the
beginnings of an alternative, which can be taken or left.  Or, if what I'm
saying is guru-blessed as equivalent to the current wording, then that
would help me and perhaps others to understand the Standard better.

>
> We should not completely rewrite the current wording unless there is a
> significant deficiency in the current wording. So what's wrong with the
> current wording?
>

Even if it's not deficient, wider understanding is desirable.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2cc7705e-8330-40c6-a53d-1aeb9a701493%40isocpp.org.

------=_Part_2546_1656812737.1487197938886
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Wednesday, February 15, 2017 at 5:14:03 PM UTC-=
5, Nicol Bolas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;m=
argin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=
=3D"ltr">First, what is a &quot;global memory operation&quot;?<br></div></b=
lockquote><div><br></div><div>A read or a write of=C2=A0one or more byes at=
 consecutive addresses.=C2=A0 Nominally,=C2=A0each thread has its own memor=
y, and global memory is also distinct.</div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px =
#ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><br>Second, why do you feel=
 that the memory model definition needs to be changed in the way you sugges=
t? The C++ data race and memory model specification was passed through the =
hands of a number of experts in both compiler design and threading at the C=
PU level. The existing wording was carefully crafted in this regard.<br></d=
iv></blockquote><div><br></div><div>I personally have not read any descript=
ion that I&#39;ve understood.=C2=A0 That of course is my problem, but I don=
&#39;t think there is any harm in=C2=A0suggesting the beginnings of an alte=
rnative, which can be taken or left.=C2=A0 Or, if what I&#39;m saying is gu=
ru-blessed as equivalent to the current=C2=A0wording, then=C2=A0that would =
help me and perhaps others to understand the Standard better.=C2=A0</div><b=
lockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;borde=
r-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><br>We should n=
ot completely rewrite the current wording unless there is a significant def=
iciency in the current wording. So what&#39;s wrong with the current wordin=
g?<br></div></blockquote><div><br></div><div>Even if it&#39;s not deficient=
, wider understanding is desirable.=C2=A0</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/2cc7705e-8330-40c6-a53d-1aeb9a701493%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/2cc7705e-8330-40c6-a53d-1aeb9a701493=
%40isocpp.org</a>.<br />

------=_Part_2546_1656812737.1487197938886--

------=_Part_2545_1352027344.1487197938885--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Wed, 15 Feb 2017 14:49:44 -0800 (PST)
Raw View
------=_Part_287_1731293694.1487198984350
Content-Type: multipart/alternative;
 boundary="----=_Part_288_997659979.1487198984350"

------=_Part_288_997659979.1487198984350
Content-Type: text/plain; charset=UTF-8

On Wednesday, February 15, 2017 at 5:32:19 PM UTC-5, Walt Karas wrote:
>
> On Wednesday, February 15, 2017 at 5:14:03 PM UTC-5, Nicol Bolas wrote:
>>
>> First, what is a "global memory operation"?
>>
>
> A read or a write of one or more byes at consecutive addresses.
> Nominally, each thread has its own memory, and global memory is also
> distinct.
>

Wait, when did a thread get its own memory? Oh yes, there are
`thread_local` variables, but last time I checked, the C++ memory model had
no problem with you making a pointer/reference to a `thread_local` variable
accessible to other threads.

Second, why do you feel that the memory model definition needs to be
>> changed in the way you suggest? The C++ data race and memory model
>> specification was passed through the hands of a number of experts in both
>> compiler design and threading at the CPU level. The existing wording was
>> carefully crafted in this regard.
>>
>
> I personally have not read any description that I've understood. That of
> course is my problem, but I don't think there is any harm in suggesting the
> beginnings of an alternative, which can be taken or left.  Or, if what I'm
> saying is guru-blessed as equivalent to the current wording, then that
> would help me and perhaps others to understand the Standard better.
>

>> We should not completely rewrite the current wording unless there is a
>> significant deficiency in the current wording. So what's wrong with the
>> current wording?
>>
>
> Even if it's not deficient, wider understanding is desirable.
>

That is a matter best handled by teachers and language proponents, not by
standard wording.

It's one thing to say that the standard is poorly specified, in that it
doesn't explain what happens in certain circumstances. But wanting a change
simply because it makes the standard easier for the lay user to understand?
Particularly in a highly complex section of the standard like the stuff on
race conditions?

That is a highly dangerous thing to do.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/6ffb87d6-4a86-4f41-bad9-57aa28e75edc%40isocpp.org.

------=_Part_288_997659979.1487198984350
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wednesday, February 15, 2017 at 5:32:19 PM UTC-5, Walt =
Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">O=
n Wednesday, February 15, 2017 at 5:14:03 PM UTC-5, Nicol Bolas wrote:<bloc=
kquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">First, what is a &quot;=
global memory operation&quot;?<br></div></blockquote><div><br></div><div>A =
read or a write of=C2=A0one or more byes at consecutive addresses.=C2=A0 No=
minally,=C2=A0each thread has its own memory, and global memory is also dis=
tinct.</div></div></blockquote><div><br>Wait, when did a thread get its own=
 memory? Oh yes, there are `thread_local` variables, but last time I checke=
d, the C++ memory model had no problem with you making a pointer/reference =
to a `thread_local` variable accessible to other threads.<br><br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-=
left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><div></div><block=
quote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left=
:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Second, why do you feel =
that the memory model definition needs to be changed in the way you suggest=
? The C++ data race and memory model specification was passed through the h=
ands of a number of experts in both compiler design and threading at the CP=
U level. The existing wording was carefully crafted in this regard.<br></di=
v></blockquote><div><br></div><div>I personally have not read any descripti=
on that I&#39;ve understood. That of course is my problem, but I don&#39;t =
think there is any harm in=C2=A0suggesting the beginnings of an alternative=
, which can be taken or left.=C2=A0 Or, if what I&#39;m saying is guru-bles=
sed as equivalent to the current=C2=A0wording, then=C2=A0that would help me=
 and perhaps others to understand the Standard better.=C2=A0</div></div></b=
lockquote><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><br>We should not com=
pletely rewrite the current wording unless there is a significant deficienc=
y in the current wording. So what&#39;s wrong with the current wording?<br>=
</div></blockquote><div><br></div><div>Even if it&#39;s not deficient, wide=
r understanding is desirable.</div></div></blockquote><div><br>That is a ma=
tter best handled by teachers and language proponents, not by standard word=
ing.<br><br>It&#39;s one thing to say that the standard is poorly specified=
, in that it doesn&#39;t explain what happens in certain circumstances. But=
 wanting a change simply because it makes the standard easier for the lay u=
ser to understand? Particularly in a highly complex section of the standard=
 like the stuff on race conditions?<br><br>That is a highly dangerous thing=
 to do.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/6ffb87d6-4a86-4f41-bad9-57aa28e75edc%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/6ffb87d6-4a86-4f41-bad9-57aa28e75edc=
%40isocpp.org</a>.<br />

------=_Part_288_997659979.1487198984350--

------=_Part_287_1731293694.1487198984350--

.


Author: Tony V E <tvaneerd@gmail.com>
Date: Wed, 15 Feb 2017 19:41:35 -0500
Raw View
<html><head></head><body lang=3D"en-US" style=3D"background-color: rgb(255,=
 255, 255); line-height: initial;">                                        =
                                              <div style=3D"width: 100%; fo=
nt-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif=
; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, =
255, 255);">Leaving the questions about =E2=80=8Eactually changing the stan=
dard aside, and focusing on understanding (which may just mean this could b=
e a std-discussion question instead of std-proposal),</div><div style=3D"wi=
dth: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-seri=
f, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-col=
or: rgb(255, 255, 255);"><br></div><div style=3D"width: 100%; font-size: in=
itial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rg=
b(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"=
>In your model, if X and Y are relaxed atomic operations in thread T1, can =
thread T2 see them as Y before X while thread T3 sees them as =E2=80=8EX be=
fore Y?</div>                                                              =
                                                                       <div=
 style=3D"width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro=
', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; ba=
ckground-color: rgb(255, 255, 255);"><br style=3D"display:initial"></div>  =
                                                                           =
                                                                           =
                                           <div style=3D"font-size: initial=
; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31,=
 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Sent=
&nbsp;from&nbsp;my&nbsp;BlackBerry&nbsp;portable&nbsp;Babbage&nbsp;Device</=
div>                                                                       =
                                                                           =
                                <table width=3D"100%" style=3D"background-c=
olor:white;border-spacing:0px;"> <tbody><tr><td colspan=3D"2" style=3D"font=
-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"=
>                           <div style=3D"border-style: solid none none; bo=
rder-top-color: rgb(181, 196, 223); border-top-width: 1pt; padding: 3pt 0in=
 0in; font-family: Tahoma, 'BB Alpha Sans', 'Slate Pro'; font-size: 10pt;">=
  <div><b>From: </b>'Walt Karas' via ISO C++ Standard - Future Proposals</d=
iv><div><b>Sent: </b>Wednesday, February 15, 2017 2:17 PM</div><div><b>To: =
</b>ISO C++ Standard - Future Proposals</div><div><b>Reply To: </b>std-prop=
osals@isocpp.org</div><div><b>Subject: </b>[std-proposals] Proposed alterna=
tive approach to specifying required memory operation ordering</div></div><=
/td></tr></tbody></table><div style=3D"border-style: solid none none; borde=
r-top-color: rgb(186, 188, 209); border-top-width: 1pt; font-size: initial;=
 text-align: initial; background-color: rgb(255, 255, 255);"></div><br><div=
 id=3D"_originalContent" style=3D""><div dir=3D"ltr"><div>- In a program ex=
ecution,&nbsp;each thread defines a nominal order of (thread local) memory =
and fence operations.</div><div>- For any operations X and Y in thread T, e=
ither X before Y, or Y before X.</div><div>- A program execution defines a =
nominal global order of global memory operations.</div><div>- If X and Y ar=
e atomic global operations, then either X before Y or Y before X in the glo=
bal order.</div><div>- If X is a global operation, and there exists a globa=
l store S where neither X before S nor S before X in the global order, then=
 the result of X is undefined.</div><div>- A program execution defines a pa=
rtial function f(T, LO) -&gt; GO where LO is a local memory operation in th=
read T, and GO is a global operation.&nbsp; If LO is atomic, then GO must b=
e atomic. The result of LO is the result of GO.&nbsp; If the result of GO i=
s undefined, the result of LO is undefined.&nbsp; (Even if f(T, LO) is not =
required to exist, it none the less _may_ exist.)</div><div>- If LO1 and LO=
2 are operations in thread T, and LO1 before LO2 in T, and f(T, LO1) and f(=
T, LO2) both exist, then f(T, LO2) before f(T, LO1) in the global order is =
not allowed.</div><div>- If:</div><div>1.&nbsp; In a thread T, X and Y are =
memory operations, and F is a fence operation.</div><div>2.&nbsp; X before&=
nbsp;F and&nbsp;F before Y.</div><div>3.&nbsp; F is sequentially consistent=
 and both X and Y are atomic, or</div><div>4.&nbsp; F is acquire and both X=
 and Y are loads, or</div><div>5.&nbsp; F is release and both X and Y are s=
tores.</div><div>then F is activated for X and Y.</div><div>-&nbsp;In a thr=
ead T, if X and Y are memory operations (where X before Y) with an activate=
d fence F, then f(T, X) must exist.</div><div>- For every global operation =
GO, there must exist a local operation LO in some thread T where f(T, LO) -=
&gt; GO.&nbsp; (Assuming no intense gamma radiation.)</div><div>- A sequent=
ially consistent (thread local) atomic memory operation implies two sequent=
ially consistent fences, one before and one after it (as well as a precedin=
g release for a store, and a succeeding acquire for a load).</div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter">https://groups.goo=
gle.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85c=
b5bd%40isocpp.org</a>.<br>
<br><!--end of _originalContent --></div></body></html>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/20170216004134.4902991.82070.24891%40=
gmail.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com=
/a/isocpp.org/d/msgid/std-proposals/20170216004134.4902991.82070.24891%40gm=
ail.com</a>.<br />

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Wed, 15 Feb 2017 18:14:24 -0800 (PST)
Raw View
------=_Part_494_1154098050.1487211264904
Content-Type: multipart/alternative;
 boundary="----=_Part_495_223697406.1487211264904"

------=_Part_495_223697406.1487211264904
Content-Type: text/plain; charset=UTF-8



On Wednesday, February 15, 2017 at 5:49:44 PM UTC-5, Nicol Bolas wrote:
>
> On Wednesday, February 15, 2017 at 5:32:19 PM UTC-5, Walt Karas wrote:
>>
>> On Wednesday, February 15, 2017 at 5:14:03 PM UTC-5, Nicol Bolas wrote:
>>>
>>> First, what is a "global memory operation"?
>>>
>>
>> A read or a write of one or more byes at consecutive addresses.
>> Nominally, each thread has its own memory, and global memory is also
>> distinct.
>>
>
> Wait, when did a thread get its own memory? Oh yes, there are
> `thread_local` variables, but last time I checked, the C++ memory model had
> no problem with you making a pointer/reference to a `thread_local` variable
> accessible to other threads.
>

Typically threads have their own registers, and a variable may be register
cached.

Also, I think it's desirable for the Standard to allow CPU designers to
experiment with designs where there is no automatic cache coherency between
cores.

A model where each thread nominally has its own copy of the memory I think
handles both of these.


>
> Second, why do you feel that the memory model definition needs to be
>>> changed in the way you suggest? The C++ data race and memory model
>>> specification was passed through the hands of a number of experts in both
>>> compiler design and threading at the CPU level. The existing wording was
>>> carefully crafted in this regard.
>>>
>>
>> I personally have not read any description that I've understood. That of
>> course is my problem, but I don't think there is any harm in suggesting the
>> beginnings of an alternative, which can be taken or left.  Or, if what I'm
>> saying is guru-blessed as equivalent to the current wording, then that
>> would help me and perhaps others to understand the Standard better.
>>
>
>>> We should not completely rewrite the current wording unless there is a
>>> significant deficiency in the current wording. So what's wrong with the
>>> current wording?
>>>
>>
>> Even if it's not deficient, wider understanding is desirable.
>>
>
> That is a matter best handled by teachers and language proponents, not by
> standard wording.
>
> It's one thing to say that the standard is poorly specified, in that it
> doesn't explain what happens in certain circumstances. But wanting a change
> simply because it makes the standard easier for the lay user to understand?
> Particularly in a highly complex section of the standard like the stuff on
> race conditions?
>
> That is a highly dangerous thing to do.
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/53e4bffe-9484-4234-ab4c-afa6ccdc8951%40isocpp.org.

------=_Part_495_223697406.1487211264904
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Wednesday, February 15, 2017 at 5:49:44 PM UTC-=
5, Nicol Bolas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;m=
argin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=
=3D"ltr">On Wednesday, February 15, 2017 at 5:32:19 PM UTC-5, Walt Karas wr=
ote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">On Wednesday, =
February 15, 2017 at 5:14:03 PM UTC-5, Nicol Bolas wrote:<blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr">First, what is a &quot;global memor=
y operation&quot;?<br></div></blockquote><div><br></div><div>A read or a wr=
ite of=C2=A0one or more byes at consecutive addresses.=C2=A0 Nominally,=C2=
=A0each thread has its own memory, and global memory is also distinct.</div=
></div></blockquote><div><br>Wait, when did a thread get its own memory? Oh=
 yes, there are `thread_local` variables, but last time I checked, the C++ =
memory model had no problem with you making a pointer/reference to a `threa=
d_local` variable accessible to other threads.<br></div></div></blockquote>=
<div><br></div><div>Typically threads have their own registers, and a varia=
ble may be register cached.</div><div><br></div><div>Also, I think it&#39;s=
 desirable for the Standard to allow CPU designers to experiment with desig=
ns where there is no automatic cache coherency between cores.</div><div><br=
></div><div>A model where each thread nominally has its own copy of the mem=
ory I think handles both of these.</div><div>=C2=A0</div><blockquote class=
=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #cc=
c solid;padding-left: 1ex;"><div dir=3D"ltr"><div><br></div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"ltr"><div></div><blockquote class=3D"g=
mail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr">Second, why do you feel that the memory =
model definition needs to be changed in the way you suggest? The C++ data r=
ace and memory model specification was passed through the hands of a number=
 of experts in both compiler design and threading at the CPU level. The exi=
sting wording was carefully crafted in this regard.<br></div></blockquote><=
div><br></div><div>I personally have not read any description that I&#39;ve=
 understood. That of course is my problem, but I don&#39;t think there is a=
ny harm in=C2=A0suggesting the beginnings of an alternative, which can be t=
aken or left.=C2=A0 Or, if what I&#39;m saying is guru-blessed as equivalen=
t to the current=C2=A0wording, then=C2=A0that would help me and perhaps oth=
ers to understand the Standard better.=C2=A0</div></div></blockquote><block=
quote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left=
:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gma=
il_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr"><br>We should not completely rewrite the c=
urrent wording unless there is a significant deficiency in the current word=
ing. So what&#39;s wrong with the current wording?<br></div></blockquote><d=
iv><br></div><div>Even if it&#39;s not deficient, wider understanding is de=
sirable.</div></div></blockquote><div><br>That is a matter best handled by =
teachers and language proponents, not by standard wording.<br><br>It&#39;s =
one thing to say that the standard is poorly specified, in that it doesn&#3=
9;t explain what happens in certain circumstances. But wanting a change sim=
ply because it makes the standard easier for the lay user to understand? Pa=
rticularly in a highly complex section of the standard like the stuff on ra=
ce conditions?<br><br>That is a highly dangerous thing to do.<br></div></di=
v></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/53e4bffe-9484-4234-ab4c-afa6ccdc8951%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/53e4bffe-9484-4234-ab4c-afa6ccdc8951=
%40isocpp.org</a>.<br />

------=_Part_495_223697406.1487211264904--

------=_Part_494_1154098050.1487211264904--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Wed, 15 Feb 2017 18:33:28 -0800
Raw View
On quarta-feira, 15 de fevereiro de 2017 18:14:24 PST 'Walt Karas' via ISO C++
Standard - Future Proposals wrote:
> > Wait, when did a thread get its own memory? Oh yes, there are
> > `thread_local` variables, but last time I checked, the C++ memory model
> > had
> > no problem with you making a pointer/reference to a `thread_local`
> > variable
> > accessible to other threads.
>
> Typically threads have their own registers, and a variable may be register
> cached.

C++ does not take registers into account. A compiler may only register-cache a
variable in an "as if" situation: that is, doing so cannot alter the behaviour
and state visible from other threads.

Therefore, there is no such thing as local memory. All memory is global.

> Also, I think it's desirable for the Standard to allow CPU designers to
> experiment with designs where there is no automatic cache coherency between
> cores.

In what way does the current standard prevent this from happening?

> A model where each thread nominally has its own copy of the memory I think
> handles both of these.

Sure, so long as it starts with an empty local memory area. Adding support for
thread-specific visibility is a topic completely outside the standard today and
also outside of any threading library. It's not impossible with current
processors, but just not done.

> > Second, why do you feel that the memory model definition needs to be
> >>> changed in the way you suggest? The C++ data race and memory model
> >>> specification was passed through the hands of a number of experts in
> >>> both
> >>> compiler design and threading at the CPU level. The existing wording was
> >>> carefully crafted in this regard.
> >>
> >> I personally have not read any description that I've understood. That of
> >> course is my problem, but I don't think there is any harm in suggesting
> >> the
> >> beginnings of an alternative, which can be taken or left.  Or, if what
> >> I'm
> >> saying is guru-blessed as equivalent to the current wording, then that
> >> would help me and perhaps others to understand the Standard better.
> >>
> >>> We should not completely rewrite the current wording unless there is a
> >>> significant deficiency in the current wording. So what's wrong with the
> >>> current wording?
> >>
> >> Even if it's not deficient, wider understanding is desirable.
> >
> > That is a matter best handled by teachers and language proponents, not by
> > standard wording.
> >
> > It's one thing to say that the standard is poorly specified, in that it
> > doesn't explain what happens in certain circumstances. But wanting a
> > change
> > simply because it makes the standard easier for the lay user to
> > understand?
> > Particularly in a highly complex section of the standard like the stuff on
> > race conditions?
> >
> > That is a highly dangerous thing to do.

I agree with Nicol here: you don't need to read the standard to learn the
feature. The standard text is there to specify the behaviour that the
compilers should implement, in any machine.

If you have a new proposal, let's discuss. If you have a defect (including
ambiguity in the spec), let's discuss. Refactoring, I think most people will
pass.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2793023.D66KQLQaOp%40tjmaciei-mobl1.

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Wed, 15 Feb 2017 18:51:06 -0800 (PST)
Raw View
------=_Part_1155_1146937566.1487213466308
Content-Type: multipart/alternative;
 boundary="----=_Part_1156_535960125.1487213466308"

------=_Part_1156_535960125.1487213466308
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable



On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:
>
> Leaving the questions about =E2=80=8Eactually changing the standard aside=
, and=20
> focusing on understanding (which may just mean this could be a=20
> std-discussion question instead of std-proposal),
>
> In your model, if X and Y are relaxed atomic operations in thread T1, can=
=20
> thread T2 see them as Y before X while thread T3 sees them as =E2=80=8EX =
before Y?
>

No.  But f(T1, X) and f(T1, Y) may not exist, and T2 and T3 can only "see"=
=20
stores in global memory.

I think I see your point.  My notion of a nominal global order would make=
=20
good optimization too complex.
=20

>
> Sent from my BlackBerry portable Babbage Device
> *From: *'Walt Karas' via ISO C++ Standard - Future Proposals
> *Sent: *Wednesday, February 15, 2017 2:17 PM
> *To: *ISO C++ Standard - Future Proposals
> *Reply To: *std-pr...@isocpp.org <javascript:>
> *Subject: *[std-proposals] Proposed alternative approach to specifying=20
> required memory operation ordering
>
> - In a program execution, each thread defines a nominal order of (thread=
=20
> local) memory and fence operations.
> - For any operations X and Y in thread T, either X before Y, or Y before =
X.
> - A program execution defines a nominal global order of global memory=20
> operations.
> - If X and Y are atomic global operations, then either X before Y or Y=20
> before X in the global order.
> - If X is a global operation, and there exists a global store S where=20
> neither X before S nor S before X in the global order, then the result of=
 X=20
> is undefined.
> - A program execution defines a partial function f(T, LO) -> GO where LO=
=20
> is a local memory operation in thread T, and GO is a global operation.  I=
f=20
> LO is atomic, then GO must be atomic. The result of LO is the result of=
=20
> GO.  If the result of GO is undefined, the result of LO is undefined. =20
> (Even if f(T, LO) is not required to exist, it none the less _may_ exist.=
)
> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T, and=
=20
> f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in th=
e=20
> global order is not allowed.
> - If:
> 1.  In a thread T, X and Y are memory operations, and F is a fence=20
> operation.
> 2.  X before F and F before Y.
> 3.  F is sequentially consistent and both X and Y are atomic, or
> 4.  F is acquire and both X and Y are loads, or
> 5.  F is release and both X and Y are stores.
> then F is activated for X and Y.
> - In a thread T, if X and Y are memory operations (where X before Y) with=
=20
> an activated fence F, then f(T, X) must exist.
> - For every global operation GO, there must exist a local operation LO in=
=20
> some thread T where f(T, LO) -> GO.  (Assuming no intense gamma radiation=
..)
> - A sequentially consistent (thread local) atomic memory operation implie=
s=20
> two sequentially consistent fences, one before and one after it (as well =
as=20
> a preceding release for a store, and a succeeding acquire for a load).
>
> --=20
> You received this message because you are subscribed to the Google Groups=
=20
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an=
=20
> email to std-proposal...@isocpp.org <javascript:>.
> To post to this group, send email to std-pr...@isocpp.org <javascript:>.
> To view this discussion on the web visit=20
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04=
a-46e6-b93a-0b9ea85cb5bd%40isocpp.org=20
> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f0=
4a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&utm_source=3Dfoot=
er>
> .
>
>

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/0c01122b-f725-4a97-ae28-e7ea6a4eb590%40isocpp.or=
g.

------=_Part_1156_535960125.1487213466308
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Wednesday, February 15, 2017 at 7:41:40 PM UTC-=
5, Tony V E wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;marg=
in-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div lang=3D=
"en-US" style=3D"background-color:rgb(255,255,255);line-height:initial">   =
                                                                           =
        <div style=3D"width:100%;font-size:initial;font-family:Calibri,&#39=
;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:initi=
al;background-color:rgb(255,255,255)">Leaving the questions about =E2=80=8E=
actually changing the standard aside, and focusing on understanding (which =
may just mean this could be a std-discussion question instead of std-propos=
al),</div><div style=3D"width:100%;font-size:initial;font-family:Calibri,&#=
39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:ini=
tial;background-color:rgb(255,255,255)"><br></div><div style=3D"width:100%;=
font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-s=
erif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,2=
55)">In your model, if X and Y are relaxed atomic operations in thread T1, =
can thread T2 see them as Y before X while thread T3 sees them as =E2=80=8E=
X before Y?</div></div></blockquote><div><br></div><div>No. =C2=A0But f(T1,=
 X) and f(T1, Y) may not exist, and T2 and T3 can only &quot;see&quot; stor=
es in global memory.</div><div><br></div><div>I think I see your point. =C2=
=A0My notion of a nominal global order would make good optimization too com=
plex.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"marg=
in: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><d=
iv lang=3D"en-US" style=3D"background-color:rgb(255,255,255);line-height:in=
itial">                                                                    =
                                                                 <div style=
=3D"width:100%;font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sa=
ns-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-colo=
r:rgb(255,255,255)"><br style=3D"display:initial"></div>                   =
                                                                           =
                                                                           =
                          <div style=3D"font-size:initial;font-family:Calib=
ri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-alig=
n:initial;background-color:rgb(255,255,255)">Sent=C2=A0from=C2=A0my=C2=A0Bl=
ackBerry=C2=A0<wbr>portable=C2=A0Babbage=C2=A0Device</div>                 =
                                                                           =
                                                                           =
           <table width=3D"100%" style=3D"background-color:white;border-spa=
cing:0px"> <tbody><tr><td colspan=3D"2" style=3D"font-size:initial;text-ali=
gn:initial;background-color:rgb(255,255,255)">                           <d=
iv style=3D"border-style:solid none none;border-top-color:rgb(181,196,223);=
border-top-width:1pt;padding:3pt 0in 0in;font-family:Tahoma,&#39;BB Alpha S=
ans&#39;,&#39;Slate Pro&#39;;font-size:10pt">  <div><b>From: </b>&#39;Walt =
Karas&#39; via ISO C++ Standard - Future Proposals</div><div><b>Sent: </b>W=
ednesday, February 15, 2017 2:17 PM</div><div><b>To: </b>ISO C++ Standard -=
 Future Proposals</div><div><b>Reply To: </b><a href=3D"javascript:" target=
=3D"_blank" gdf-obfuscated-mailto=3D"tXWA2xVCBQAJ" rel=3D"nofollow" onmouse=
down=3D"this.href=3D&#39;javascript:&#39;;return true;" onclick=3D"this.hre=
f=3D&#39;javascript:&#39;;return true;">std-pr...@isocpp.org</a></div><div>=
<b>Subject: </b>[std-proposals] Proposed alternative approach to specifying=
 required memory operation ordering</div></div></td></tr></tbody></table><d=
iv style=3D"border-style:solid none none;border-top-color:rgb(186,188,209);=
border-top-width:1pt;font-size:initial;text-align:initial;background-color:=
rgb(255,255,255)"></div><br><div><div dir=3D"ltr"><div>- In a program execu=
tion,=C2=A0each thread defines a nominal order of (thread local) memory and=
 fence operations.</div><div>- For any operations X and Y in thread T, eith=
er X before Y, or Y before X.</div><div>- A program execution defines a nom=
inal global order of global memory operations.</div><div>- If X and Y are a=
tomic global operations, then either X before Y or Y before X in the global=
 order.</div><div>- If X is a global operation, and there exists a global s=
tore S where neither X before S nor S before X in the global order, then th=
e result of X is undefined.</div><div>- A program execution defines a parti=
al function f(T, LO) -&gt; GO where LO is a local memory operation in threa=
d T, and GO is a global operation.=C2=A0 If LO is atomic, then GO must be a=
tomic. The result of LO is the result of GO.=C2=A0 If the result of GO is u=
ndefined, the result of LO is undefined.=C2=A0 (Even if f(T, LO) is not req=
uired to exist, it none the less _may_ exist.)</div><div>- If LO1 and LO2 a=
re operations in thread T, and LO1 before LO2 in T, and f(T, LO1) and f(T, =
LO2) both exist, then f(T, LO2) before f(T, LO1) in the global order is not=
 allowed.</div><div>- If:</div><div>1.=C2=A0 In a thread T, X and Y are mem=
ory operations, and F is a fence operation.</div><div>2.=C2=A0 X before=C2=
=A0F and=C2=A0F before Y.</div><div>3.=C2=A0 F is sequentially consistent a=
nd both X and Y are atomic, or</div><div>4.=C2=A0 F is acquire and both X a=
nd Y are loads, or</div><div>5.=C2=A0 F is release and both X and Y are sto=
res.</div><div>then F is activated for X and Y.</div><div>-=C2=A0In a threa=
d T, if X and Y are memory operations (where X before Y) with an activated =
fence F, then f(T, X) must exist.</div><div>- For every global operation GO=
, there must exist a local operation LO in some thread T where f(T, LO) -&g=
t; GO.=C2=A0 (Assuming no intense gamma radiation.)</div><div>- A sequentia=
lly consistent (thread local) atomic memory operation implies two sequentia=
lly consistent fences, one before and one after it (as well as a preceding =
release for a store, and a succeeding acquire for a load).</div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=3D"=
tXWA2xVCBQAJ" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;javascript:&=
#39;;return true;" onclick=3D"this.href=3D&#39;javascript:&#39;;return true=
;">std-proposal...@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"javascript:" target=3D"_bla=
nk" gdf-obfuscated-mailto=3D"tXWA2xVCBQAJ" rel=3D"nofollow" onmousedown=3D"=
this.href=3D&#39;javascript:&#39;;return true;" onclick=3D"this.href=3D&#39=
;javascript:&#39;;return true;">std-pr...@isocpp.org</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter" target=3D"_blank" =
rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;https://groups.google.com/=
a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40i=
socpp.org?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" on=
click=3D"this.href=3D&#39;https://groups.google.com/a/isocpp.org/d/msgid/st=
d-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium\x3=
demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com=
/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/d0fdabcd-f04a-46e6-<wbr>b93a-=
0b9ea85cb5bd%40isocpp.org</a><wbr>.<br>
<br></div></div>
</blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/0c01122b-f725-4a97-ae28-e7ea6a4eb590%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/0c01122b-f725-4a97-ae28-e7ea6a4eb590=
%40isocpp.org</a>.<br />

------=_Part_1156_535960125.1487213466308--

------=_Part_1155_1146937566.1487213466308--

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Thu, 16 Feb 2017 07:05:27 -0800 (PST)
Raw View
------=_Part_2876_1731179396.1487257527502
Content-Type: multipart/alternative;
 boundary="----=_Part_2877_180357580.1487257527503"

------=_Part_2877_180357580.1487257527503
Content-Type: text/plain; charset=UTF-8



On Wednesday, February 15, 2017 at 9:33:38 PM UTC-5, Thiago Macieira wrote:
>
> On quarta-feira, 15 de fevereiro de 2017 18:14:24 PST 'Walt Karas' via ISO
> C++
> Standard - Future Proposals wrote:
> > > Wait, when did a thread get its own memory? Oh yes, there are
> > > `thread_local` variables, but last time I checked, the C++ memory
> model
> > > had
> > > no problem with you making a pointer/reference to a `thread_local`
> > > variable
> > > accessible to other threads.
> >
> > Typically threads have their own registers, and a variable may be
> register
> > cached.
>
> C++ does not take registers into account. A compiler may only
> register-cache a
> variable in an "as if" situation: that is, doing so cannot alter the
> behaviour
> and state visible from other threads.
>
> Therefore, there is no such thing as local memory. All memory is global.
>
> > Also, I think it's desirable for the Standard to allow CPU designers to
> > experiment with designs where there is no automatic cache coherency
> between
> > cores.
>
> In what way does the current standard prevent this from happening?
>

Sorry, I did not mean "allow" in a strict sense.

In reality, in the current, widely used architectures, multi-core threading
does rely on per-thread partial copying of "global", shared base memory.
 Intuitively, it's possible that a model that more explicitly reflects this
might be clearer and cleaner.


>
> > A model where each thread nominally has its own copy of the memory I
> think
> > handles both of these.
>
> Sure, so long as it starts with an empty local memory area. Adding support
> for
> thread-specific visibility is a topic completely outside the standard
> today and
> also outside of any threading library. It's not impossible with current
> processors, but just not done.
>
> > > Second, why do you feel that the memory model definition needs to be
> > >>> changed in the way you suggest? The C++ data race and memory model
> > >>> specification was passed through the hands of a number of experts in
> > >>> both
> > >>> compiler design and threading at the CPU level. The existing wording
> was
> > >>> carefully crafted in this regard.
> > >>
> > >> I personally have not read any description that I've understood. That
> of
> > >> course is my problem, but I don't think there is any harm in
> suggesting
> > >> the
> > >> beginnings of an alternative, which can be taken or left.  Or, if
> what
> > >> I'm
> > >> saying is guru-blessed as equivalent to the current wording, then
> that
> > >> would help me and perhaps others to understand the Standard better.
> > >>
> > >>> We should not completely rewrite the current wording unless there is
> a
> > >>> significant deficiency in the current wording. So what's wrong with
> the
> > >>> current wording?
> > >>
> > >> Even if it's not deficient, wider understanding is desirable.
> > >
> > > That is a matter best handled by teachers and language proponents, not
> by
> > > standard wording.
> > >
> > > It's one thing to say that the standard is poorly specified, in that
> it
> > > doesn't explain what happens in certain circumstances. But wanting a
> > > change
> > > simply because it makes the standard easier for the lay user to
> > > understand?
> > > Particularly in a highly complex section of the standard like the
> stuff on
> > > race conditions?
> > >
> > > That is a highly dangerous thing to do.
>
> I agree with Nicol here: you don't need to read the standard to learn the
> feature. The standard text is there to specify the behaviour that the
> compilers should implement, in any machine.
>
> If you have a new proposal, let's discuss. If you have a defect (including
> ambiguity in the spec), let's discuss. Refactoring, I think most people
> will
> pass.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>    Software Architect - Intel Open Source Technology Center
>
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/8ace2b75-b403-4fc1-bb05-e6cf14e57cef%40isocpp.org.

------=_Part_2877_180357580.1487257527503
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Wednesday, February 15, 2017 at 9:33:38 PM UTC-=
5, Thiago Macieira wrote:<blockquote class=3D"gmail_quote" style=3D"margin:=
 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On qu=
arta-feira, 15 de fevereiro de 2017 18:14:24 PST &#39;Walt Karas&#39; via I=
SO C++=20
<br>Standard - Future Proposals wrote:
<br>&gt; &gt; Wait, when did a thread get its own memory? Oh yes, there are
<br>&gt; &gt; `thread_local` variables, but last time I checked, the C++ me=
mory model
<br>&gt; &gt; had
<br>&gt; &gt; no problem with you making a pointer/reference to a `thread_l=
ocal`
<br>&gt; &gt; variable
<br>&gt; &gt; accessible to other threads.
<br>&gt;=20
<br>&gt; Typically threads have their own registers, and a variable may be =
register
<br>&gt; cached.
<br>
<br>C++ does not take registers into account. A compiler may only register-=
cache a=20
<br>variable in an &quot;as if&quot; situation: that is, doing so cannot al=
ter the behaviour=20
<br>and state visible from other threads.
<br>
<br>Therefore, there is no such thing as local memory. All memory is global=
..
<br>
<br>&gt; Also, I think it&#39;s desirable for the Standard to allow CPU des=
igners to
<br>&gt; experiment with designs where there is no automatic cache coherenc=
y between
<br>&gt; cores.
<br>
<br>In what way does the current standard prevent this from happening?
<br></blockquote><div><br></div><div>Sorry, I did not mean &quot;allow&quot=
; in a strict sense.</div><div><br></div><div>In reality, in the current, w=
idely used architectures, multi-core threading does rely on per-thread part=
ial copying of &quot;global&quot;, shared base memory. =C2=A0Intuitively, i=
t&#39;s possible that a model that more explicitly reflects this might be c=
learer and cleaner.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote"=
 style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-=
left: 1ex;">
<br>&gt; A model where each thread nominally has its own copy of the memory=
 I think
<br>&gt; handles both of these.
<br>
<br>Sure, so long as it starts with an empty local memory area. Adding supp=
ort for=20
<br>thread-specific visibility is a topic completely outside the standard t=
oday and=20
<br>also outside of any threading library. It&#39;s not impossible with cur=
rent=20
<br>processors, but just not done.
<br>
<br>&gt; &gt; Second, why do you feel that the memory model definition need=
s to be
<br>&gt; &gt;&gt;&gt; changed in the way you suggest? The C++ data race and=
 memory model
<br>&gt; &gt;&gt;&gt; specification was passed through the hands of a numbe=
r of experts in
<br>&gt; &gt;&gt;&gt; both
<br>&gt; &gt;&gt;&gt; compiler design and threading at the CPU level. The e=
xisting wording was
<br>&gt; &gt;&gt;&gt; carefully crafted in this regard.
<br>&gt; &gt;&gt;=20
<br>&gt; &gt;&gt; I personally have not read any description that I&#39;ve =
understood. That of
<br>&gt; &gt;&gt; course is my problem, but I don&#39;t think there is any =
harm in suggesting
<br>&gt; &gt;&gt; the
<br>&gt; &gt;&gt; beginnings of an alternative, which can be taken or left.=
 =C2=A0Or, if what
<br>&gt; &gt;&gt; I&#39;m
<br>&gt; &gt;&gt; saying is guru-blessed as equivalent to the current wordi=
ng, then that
<br>&gt; &gt;&gt; would help me and perhaps others to understand the Standa=
rd better.
<br>&gt; &gt;&gt;=20
<br>&gt; &gt;&gt;&gt; We should not completely rewrite the current wording =
unless there is a
<br>&gt; &gt;&gt;&gt; significant deficiency in the current wording. So wha=
t&#39;s wrong with the
<br>&gt; &gt;&gt;&gt; current wording?
<br>&gt; &gt;&gt;=20
<br>&gt; &gt;&gt; Even if it&#39;s not deficient, wider understanding is de=
sirable.
<br>&gt; &gt;=20
<br>&gt; &gt; That is a matter best handled by teachers and language propon=
ents, not by
<br>&gt; &gt; standard wording.
<br>&gt; &gt;=20
<br>&gt; &gt; It&#39;s one thing to say that the standard is poorly specifi=
ed, in that it
<br>&gt; &gt; doesn&#39;t explain what happens in certain circumstances. Bu=
t wanting a
<br>&gt; &gt; change
<br>&gt; &gt; simply because it makes the standard easier for the lay user =
to
<br>&gt; &gt; understand?
<br>&gt; &gt; Particularly in a highly complex section of the standard like=
 the stuff on
<br>&gt; &gt; race conditions?
<br>&gt; &gt;=20
<br>&gt; &gt; That is a highly dangerous thing to do.
<br>
<br>I agree with Nicol here: you don&#39;t need to read the standard to lea=
rn the=20
<br>feature. The standard text is there to specify the behaviour that the=
=20
<br>compilers should implement, in any machine.
<br>
<br>If you have a new proposal, let&#39;s discuss. If you have a defect (in=
cluding=20
<br>ambiguity in the spec), let&#39;s discuss. Refactoring, I think most pe=
ople will=20
<br>pass.
<br>
<br>--=20
<br>Thiago Macieira - thiago (AT) <a href=3D"http://macieira.info" target=
=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;http://www.goo=
gle.com/url?q\x3dhttp%3A%2F%2Fmacieira.info\x26sa\x3dD\x26sntz\x3d1\x26usg\=
x3dAFQjCNEswDUBNCNanbu7euhqLn_62FW8ag&#39;;return true;" onclick=3D"this.hr=
ef=3D&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fmacieira.info\x26sa\x=
3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEswDUBNCNanbu7euhqLn_62FW8ag&#39;;return t=
rue;">macieira.info</a> - thiago (AT) <a href=3D"http://kde.org" target=3D"=
_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;http://www.google.=
com/url?q\x3dhttp%3A%2F%2Fkde.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH=
GRJdo5_JYG1DowztwAHAKs80XSA&#39;;return true;" onclick=3D"this.href=3D&#39;=
http://www.google.com/url?q\x3dhttp%3A%2F%2Fkde.org\x26sa\x3dD\x26sntz\x3d1=
\x26usg\x3dAFQjCNHGRJdo5_JYG1DowztwAHAKs80XSA&#39;;return true;">kde.org</a=
>
<br>=C2=A0 =C2=A0Software Architect - Intel Open Source Technology Center
<br>
<br></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/8ace2b75-b403-4fc1-bb05-e6cf14e57cef%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/8ace2b75-b403-4fc1-bb05-e6cf14e57cef=
%40isocpp.org</a>.<br />

------=_Part_2877_180357580.1487257527503--

------=_Part_2876_1731179396.1487257527502--

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Thu, 16 Feb 2017 07:11:44 -0800 (PST)
Raw View
------=_Part_983_192763592.1487257904677
Content-Type: multipart/alternative;
 boundary="----=_Part_984_161356715.1487257904677"

------=_Part_984_161356715.1487257904677
Content-Type: text/plain; charset=UTF-8

On Wednesday, February 15, 2017 at 5:01:47 PM UTC-5, Walt Karas wrote:
>
> On Wednesday, February 15, 2017 at 2:17:51 PM UTC-5, Walt Karas wrote:
>>
>> - In a program execution, each thread defines a nominal order of (thread
>> local) memory and fence operations.
>> - For any operations X and Y in thread T, either X before Y, or Y before
>> X.
>> - A program execution defines a nominal global order of global memory
>> operations.
>> - If X and Y are atomic global operations, then either X before Y or Y
>> before X in the global order.
>> - If X is a global operation, and there exists a global store S where
>> neither X before S nor S before X in the global order, then the result of X
>> is undefined.
>>
>
> This rule needs the additional condition that X and S affect the same
> memory location.
>
>
>> - A program execution defines a partial function f(T, LO) -> GO where LO
>> is a local memory operation in thread T, and GO is a global operation.  If
>> LO is atomic, then GO must be atomic. The result of LO is the result of
>> GO.  If the result of GO is undefined, the result of LO is undefined.
>> (Even if f(T, LO) is not required to exist, it none the less _may_ exist.)
>> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T, and
>> f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in the
>> global order is not allowed.
>> - If:
>> 1.  In a thread T, X and Y are memory operations, and F is a fence
>> operation.
>> 2.  X before F and F before Y.
>> 3.  F is sequentially consistent and both X and Y are atomic, or
>> 4.  F is acquire and both X and Y are loads, or
>> 5.  F is release and both X and Y are stores.
>> then F is activated for X and Y.
>> - In a thread T, if X and Y are memory operations (where X before Y) with
>> an activated fence F, then f(T, X) must exist.
>> - For every global operation GO, there must exist a local operation LO in
>> some thread T where f(T, LO) -> GO.  (Assuming no intense gamma radiation.)
>> - A sequentially consistent (thread local) atomic memory operation
>> implies two sequentially consistent fences, one before and one after it (as
>> well as a preceding release for a store, and a succeeding acquire for a
>> load).
>>
>
It's clear that, if this is to have any worth, global memory ordering must
be more partial and less tightly coupled to nominal thread orderings.  I
will explore improving it.  But, I fear it may end up not having any
advantage over the current model in simplicity and ease of understanding.

It's my impression that there's disappointment in the level of optimization
by currently-available compilers of code using <atomic>.  Doesn't that
indicate it's possibly worthwhile to make changes, perhaps major ones?

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/b8d56564-18d4-40ab-b217-4aa22c7a47cd%40isocpp.org.

------=_Part_984_161356715.1487257904677
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wednesday, February 15, 2017 at 5:01:47 PM UTC-5, Walt =
Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">O=
n Wednesday, February 15, 2017 at 2:17:51 PM UTC-5, Walt Karas wrote:<block=
quote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left=
:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>- In a program exec=
ution,=C2=A0each thread defines a nominal order of (thread local) memory an=
d fence operations.</div><div>- For any operations X and Y in thread T, eit=
her X before Y, or Y before X.</div><div>- A program execution defines a no=
minal global order of global memory operations.</div><div>- If X and Y are =
atomic global operations, then either X before Y or Y before X in the globa=
l order.</div><div>- If X is a global operation, and there exists a global =
store S where neither X before S nor S before X in the global order, then t=
he result of X is undefined.</div></div></blockquote><div><br></div><div>Th=
is rule needs the additional condition that X and S affect the same memory =
location.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"=
margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><di=
v dir=3D"ltr"><div>- A program execution defines a partial function f(T, LO=
) -&gt; GO where LO is a local memory operation in thread T, and GO is a gl=
obal operation.=C2=A0 If LO is atomic, then GO must be atomic. The result o=
f LO is the result of GO.=C2=A0 If the result of GO is undefined, the resul=
t of LO is undefined.=C2=A0 (Even if f(T, LO) is not required to exist, it =
none the less _may_ exist.)</div><div>- If LO1 and LO2 are operations in th=
read T, and LO1 before LO2 in T, and f(T, LO1) and f(T, LO2) both exist, th=
en f(T, LO2) before f(T, LO1) in the global order is not allowed.</div><div=
>- If:</div><div>1.=C2=A0 In a thread T, X and Y are memory operations, and=
 F is a fence operation.</div><div>2.=C2=A0 X before=C2=A0F and=C2=A0F befo=
re Y.</div><div>3.=C2=A0 F is sequentially consistent and both X and Y are =
atomic, or</div><div>4.=C2=A0 F is acquire and both X and Y are loads, or</=
div><div>5.=C2=A0 F is release and both X and Y are stores.</div><div>then =
F is activated for X and Y.</div><div>-=C2=A0In a thread T, if X and Y are =
memory operations (where X before Y) with an activated fence F, then f(T, X=
) must exist.</div><div>- For every global operation GO, there must exist a=
 local operation LO in some thread T where f(T, LO) -&gt; GO.=C2=A0 (Assumi=
ng no intense gamma radiation.)</div><div>- A sequentially consistent (thre=
ad local) atomic memory operation implies two sequentially consistent fence=
s, one before and one after it (as well as a preceding release for a store,=
 and a succeeding acquire for a load).</div></div></blockquote></div></bloc=
kquote><div><br></div><div>It&#39;s clear that, if this is to have any wort=
h, global memory ordering must be more partial and less tightly coupled to =
nominal thread orderings. =C2=A0I will explore improving it. =C2=A0But, I f=
ear it may end up not having any advantage over the current model in simpli=
city and ease of understanding.</div><div><br></div><div>It&#39;s my impres=
sion that there&#39;s disappointment in the level of optimization by curren=
tly-available compilers of code using &lt;atomic&gt;. =C2=A0Doesn&#39;t tha=
t indicate it&#39;s possibly worthwhile to make changes, perhaps major ones=
?=C2=A0</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/b8d56564-18d4-40ab-b217-4aa22c7a47cd%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/b8d56564-18d4-40ab-b217-4aa22c7a47cd=
%40isocpp.org</a>.<br />

------=_Part_984_161356715.1487257904677--

------=_Part_983_192763592.1487257904677--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 16 Feb 2017 10:28:42 -0800
Raw View
On quinta-feira, 16 de fevereiro de 2017 07:05:27 PST 'Walt Karas' via ISO C++
Standard - Future Proposals wrote:
> Sorry, I did not mean "allow" in a strict sense.
>
> In reality, in the current, widely used architectures, multi-core threading
> does rely on per-thread partial copying of "global", shared base memory.
>  Intuitively, it's possible that a model that more explicitly reflects this
> might be clearer and cleaner.

You may want to discuss this in the context of transactional memory.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1559388.phX7yBUZTN%40tjmaciei-mobl1.

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 16 Feb 2017 10:29:36 -0800
Raw View
On quinta-feira, 16 de fevereiro de 2017 07:11:44 PST 'Walt Karas' via ISO C++
Standard - Future Proposals wrote:
> It's my impression that there's disappointment in the level of optimization
> by currently-available compilers of code using <atomic>.  Doesn't that
> indicate it's possibly worthwhile to make changes, perhaps major ones?

To the compilers? Yes.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/1858181.0YPJHdfFcF%40tjmaciei-mobl1.

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Fri, 17 Feb 2017 11:54:24 -0800 (PST)
Raw View
------=_Part_370_1300324002.1487361264631
Content-Type: multipart/alternative;
 boundary="----=_Part_371_1583845232.1487361264632"

------=_Part_371_1583845232.1487361264632
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:
>
> Leaving the questions about =E2=80=8Eactually changing the standard aside=
, and=20
> focusing on understanding (which may just mean this could be a=20
> std-discussion question instead of std-proposal),
>
> In your model, if X and Y are relaxed atomic operations in thread T1, can=
=20
> thread T2 see them as Y before X while thread T3 sees them as =E2=80=8EX =
before Y?
>
> Sent from my BlackBerry portable Babbage Device
> *From: *'Walt Karas' via ISO C++ Standard - Future Proposals
> *Sent: *Wednesday, February 15, 2017 2:17 PM
> *To: *ISO C++ Standard - Future Proposals
> *Reply To: *std-pr...@isocpp.org <javascript:>
> *Subject: *[std-proposals] Proposed alternative approach to specifying=20
> required memory operation ordering
>
> - In a program execution, each thread defines a nominal order of (thread=
=20
> local) memory and fence operations.
> - For any operations X and Y in thread T, either X before Y, or Y before =
X.
> - A program execution defines a nominal global order of global memory=20
> operations.
> - If X and Y are atomic global operations, then either X before Y or Y=20
> before X in the global order.
> - If X is a global operation, and there exists a global store S where=20
> neither X before S nor S before X in the global order, then the result of=
 X=20
> is undefined.
> - A program execution defines a partial function f(T, LO) -> GO where LO=
=20
> is a local memory operation in thread T, and GO is a global operation.  I=
f=20
> LO is atomic, then GO must be atomic. The result of LO is the result of=
=20
> GO.  If the result of GO is undefined, the result of LO is undefined. =20
> (Even if f(T, LO) is not required to exist, it none the less _may_ exist.=
)
> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T, and=
=20
> f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in th=
e=20
> global order is not allowed.
> - If:
> 1.  In a thread T, X and Y are memory operations, and F is a fence=20
> operation.
> 2.  X before F and F before Y.
> 3.  F is sequentially consistent and both X and Y are atomic, or
> 4.  F is acquire and both X and Y are loads, or
> 5.  F is release and both X and Y are stores.
> then F is activated for X and Y.
> - In a thread T, if X and Y are memory operations (where X before Y) with=
=20
> an activated fence F, then f(T, X) must exist.
> - For every global operation GO, there must exist a local operation LO in=
=20
> some thread T where f(T, LO) -> GO.  (Assuming no intense gamma radiation=
..)
> - A sequentially consistent (thread local) atomic memory operation implie=
s=20
> two sequentially consistent fences, one before and one after it (as well =
as=20
> a preceding release for a store, and a succeeding acquire for a load).
>
> --=20
> You received this message because you are subscribed to the Google Groups=
=20
> "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an=
=20
> email to std-proposal...@isocpp.org <javascript:>.
> To post to this group, send email to std-pr...@isocpp.org <javascript:>.
> To view this discussion on the web visit=20
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04=
a-46e6-b93a-0b9ea85cb5bd%40isocpp.org=20
> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f0=
4a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&utm_source=3Dfoot=
er>
> .
>
>
Here is some code that illustrates some points of confusion I have with the=
=20
memory model:

#include <atomic>

#if 1

#if 1
using Cardinal =3D unsigned; // Homage to Niklaus Wirth
#else
using Cardinal =3D volatile unsigned;
#endif

#else

// Atomic access only.
class Cardinal
  {
    std::atomic<unsigned> i;

    static const std::memory_order Read_order =3D std::memory_order_relaxed=
;
    static const std::memory_order Write_order =3D std::memory_order_relaxe=
d;

    unsigned load() const { return(i.load(Read_order)); }

  public:

    Cardinal(unsigned i_) : i(i_) { }
    operator unsigned () const { return(load()); }
    void operator ++ () { i.store(load() + 1, Write_order); }
  };

#endif

// Increment my number, then if necessary wait for my identical twin thread
// to catch up with incrementing its number.
//
template <Cardinal *Mine, Cardinal *Twins_number>
void twin_thread()
  {
    unsigned equals_mine =3D 0;

    while (equals_mine < 100)
      {
        ++(*Mine);

        std::atomic_thread_fence(std::memory_order_release);

        ++equals_mine;

        while (*Twins_number < equals_mine)
          ;
      }
  }

Cardinal i1(0), i2(0);

void thread1() { twin_thread<&i1, &i2>(); }
void thread2() { twin_thread<&i2, &i1>(); }

#include <iostream>
#include <thread>
#include <chrono>
=20
int main()
  {
    std::thread t1(thread1);
    std::thread t2(thread2);
   =20
    std::this_thread::sleep_for(std::chrono::seconds(3));

    if (i1 =3D=3D 100)
      t1.join();
    else
      std::cout << i1 << '\n';

    if (i2 =3D=3D 100)
      t2.join();
    else
      std::cout << i2 << '\n';

    return(0);
  }

(I tested this with gcc (6.3.1), x86, Linux.)

When I don't enable optimization, this code works fine, whether access to=
=20
the thread-shared variables is atomic or non-atomic.  When I enable=20
optimization. the threads get into infinite loops if the access to the=20
thread-shared variables is non-atomic.  But atomic access, even with=20
relaxed order, causes the code to work fine with optimization.

I don't see how atomicity of access should matter to whether or not the=20
threads here get into infinite loops.  I suspect it only matters in=20
practice here because atomic FUD has caused gcc to take a "abandon all=20
optimization ye who use atomic" approach.  The problem with this code is=20
not atomicity of memory access, nor order of visibility of accesses in=20
different threads.  The problem is *lack* of visibility.  Due to the=20
reality that threads have (partial) private copies of memory.  This is why=
=20
I suspect that the memory model would be better with explicit=20
thread-private copies of memory.

If there's a valid reason that relaxed-order atomic access should cause the=
=20
threads to avoid infinite looping, then I'll have to cop to being an=20
atomic-naive wanker who's cluttering the proposal forum.

(Tangentially, I'm surprised that gcc will emit a "jump to here" like this=
=20
with no warning:

L7:
        jmp     .L7

even with -Wall -Wextra --predantic .)



=20

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/8575c41d-1c8e-4e6f-9f12-a07bb70a7703%40isocpp.or=
g.

------=_Part_371_1583845232.1487361264632
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony =
V E wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div lang=3D"en-US" =
style=3D"background-color:rgb(255,255,255);line-height:initial">           =
                                                                           =
<div style=3D"width:100%;font-size:initial;font-family:Calibri,&#39;Slate P=
ro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;backg=
round-color:rgb(255,255,255)">Leaving the questions about =E2=80=8Eactually=
 changing the standard aside, and focusing on understanding (which may just=
 mean this could be a std-discussion question instead of std-proposal),</di=
v><div style=3D"width:100%;font-size:initial;font-family:Calibri,&#39;Slate=
 Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;bac=
kground-color:rgb(255,255,255)"><br></div><div style=3D"width:100%;font-siz=
e:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;col=
or:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">In =
your model, if X and Y are relaxed atomic operations in thread T1, can thre=
ad T2 see them as Y before X while thread T3 sees them as =E2=80=8EX before=
 Y?</div>                                                                  =
                                                                   <div sty=
le=3D"width:100%;font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,=
sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-co=
lor:rgb(255,255,255)"><br style=3D"display:initial"></div>                 =
                                                                           =
                                                                           =
                            <div style=3D"font-size:initial;font-family:Cal=
ibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-al=
ign:initial;background-color:rgb(255,255,255)">Sent=C2=A0from=C2=A0my=C2=A0=
BlackBerry=C2=A0<wbr>portable=C2=A0Babbage=C2=A0Device</div>               =
                                                                           =
                                                                           =
             <table width=3D"100%" style=3D"background-color:white;border-s=
pacing:0px"> <tbody><tr><td colspan=3D"2" style=3D"font-size:initial;text-a=
lign:initial;background-color:rgb(255,255,255)">                           =
<div style=3D"border-style:solid none none;border-top-color:rgb(181,196,223=
);border-top-width:1pt;padding:3pt 0in 0in;font-family:Tahoma,&#39;BB Alpha=
 Sans&#39;,&#39;Slate Pro&#39;;font-size:10pt">  <div><b>From: </b>&#39;Wal=
t Karas&#39; via ISO C++ Standard - Future Proposals</div><div><b>Sent: </b=
>Wednesday, February 15, 2017 2:17 PM</div><div><b>To: </b>ISO C++ Standard=
 - Future Proposals</div><div><b>Reply To: </b><a href=3D"javascript:" targ=
et=3D"_blank" gdf-obfuscated-mailto=3D"tXWA2xVCBQAJ" rel=3D"nofollow" onmou=
sedown=3D"this.href=3D&#39;javascript:&#39;;return true;" onclick=3D"this.h=
ref=3D&#39;javascript:&#39;;return true;">std-pr...@isocpp.org</a></div><di=
v><b>Subject: </b>[std-proposals] Proposed alternative approach to specifyi=
ng required memory operation ordering</div></div></td></tr></tbody></table>=
<div style=3D"border-style:solid none none;border-top-color:rgb(186,188,209=
);border-top-width:1pt;font-size:initial;text-align:initial;background-colo=
r:rgb(255,255,255)"></div><br><div><div dir=3D"ltr"><div>- In a program exe=
cution,=C2=A0each thread defines a nominal order of (thread local) memory a=
nd fence operations.</div><div>- For any operations X and Y in thread T, ei=
ther X before Y, or Y before X.</div><div>- A program execution defines a n=
ominal global order of global memory operations.</div><div>- If X and Y are=
 atomic global operations, then either X before Y or Y before X in the glob=
al order.</div><div>- If X is a global operation, and there exists a global=
 store S where neither X before S nor S before X in the global order, then =
the result of X is undefined.</div><div>- A program execution defines a par=
tial function f(T, LO) -&gt; GO where LO is a local memory operation in thr=
ead T, and GO is a global operation.=C2=A0 If LO is atomic, then GO must be=
 atomic. The result of LO is the result of GO.=C2=A0 If the result of GO is=
 undefined, the result of LO is undefined.=C2=A0 (Even if f(T, LO) is not r=
equired to exist, it none the less _may_ exist.)</div><div>- If LO1 and LO2=
 are operations in thread T, and LO1 before LO2 in T, and f(T, LO1) and f(T=
, LO2) both exist, then f(T, LO2) before f(T, LO1) in the global order is n=
ot allowed.</div><div>- If:</div><div>1.=C2=A0 In a thread T, X and Y are m=
emory operations, and F is a fence operation.</div><div>2.=C2=A0 X before=
=C2=A0F and=C2=A0F before Y.</div><div>3.=C2=A0 F is sequentially consisten=
t and both X and Y are atomic, or</div><div>4.=C2=A0 F is acquire and both =
X and Y are loads, or</div><div>5.=C2=A0 F is release and both X and Y are =
stores.</div><div>then F is activated for X and Y.</div><div>-=C2=A0In a th=
read T, if X and Y are memory operations (where X before Y) with an activat=
ed fence F, then f(T, X) must exist.</div><div>- For every global operation=
 GO, there must exist a local operation LO in some thread T where f(T, LO) =
-&gt; GO.=C2=A0 (Assuming no intense gamma radiation.)</div><div>- A sequen=
tially consistent (thread local) atomic memory operation implies two sequen=
tially consistent fences, one before and one after it (as well as a precedi=
ng release for a store, and a succeeding acquire for a load).</div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"javascript:" target=3D"_blank" gdf-obfuscated-mailto=3D"=
tXWA2xVCBQAJ" rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;javascript:&=
#39;;return true;" onclick=3D"this.href=3D&#39;javascript:&#39;;return true=
;">std-proposal...@<wbr>isocpp.org</a>.<br>
To post to this group, send email to <a href=3D"javascript:" target=3D"_bla=
nk" gdf-obfuscated-mailto=3D"tXWA2xVCBQAJ" rel=3D"nofollow" onmousedown=3D"=
this.href=3D&#39;javascript:&#39;;return true;" onclick=3D"this.href=3D&#39=
;javascript:&#39;;return true;">std-pr...@isocpp.org</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter" target=3D"_blank" =
rel=3D"nofollow" onmousedown=3D"this.href=3D&#39;https://groups.google.com/=
a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40i=
socpp.org?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" on=
click=3D"this.href=3D&#39;https://groups.google.com/a/isocpp.org/d/msgid/st=
d-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium\x3=
demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com=
/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/d0fdabcd-f04a-46e6-<wbr>b93a-=
0b9ea85cb5bd%40isocpp.org</a><wbr>.<br>
<br></div></div></blockquote><div><br></div><div>Here is some code that ill=
ustrates some points of confusion I have with the memory model:</div><div><=
br></div><div><font color=3D"#666600">#include &lt;atomic&gt;</font><div cl=
ass=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace"><br></fon=
t></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monos=
pace">#if 1</font></div><div class=3D"subprettyprint"><font color=3D"#66660=
0" face=3D"monospace"><br></font></div><div class=3D"subprettyprint"><font =
color=3D"#666600" face=3D"monospace">#if 1</font></div><div class=3D"subpre=
ttyprint"><font color=3D"#666600" face=3D"monospace">using Cardinal =3D uns=
igned; // Homage to Niklaus Wirth</font></div><div class=3D"subprettyprint"=
><font color=3D"#666600" face=3D"monospace">#else</font></div><div class=3D=
"subprettyprint"><font color=3D"#666600" face=3D"monospace">using Cardinal =
=3D volatile unsigned;</font></div><div class=3D"subprettyprint"><font colo=
r=3D"#666600" face=3D"monospace">#endif</font></div><div class=3D"subpretty=
print"><font color=3D"#666600" face=3D"monospace"><br></font></div><div cla=
ss=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace">#else</fon=
t></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monos=
pace"><br></font></div><div class=3D"subprettyprint"><font color=3D"#666600=
" face=3D"monospace">// Atomic access only.</font></div><div class=3D"subpr=
ettyprint"><font color=3D"#666600" face=3D"monospace">class Cardinal</font>=
</div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospa=
ce">=C2=A0 {</font></div><div class=3D"subprettyprint"><font color=3D"#6666=
00" face=3D"monospace">=C2=A0 =C2=A0 std::atomic&lt;unsigned&gt; i;</font><=
/div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospac=
e"><br></font></div><div class=3D"subprettyprint"><font color=3D"#666600" f=
ace=3D"monospace">=C2=A0 =C2=A0 static const std::memory_order Read_order =
=3D std::memory_order_relaxed;</font></div><div class=3D"subprettyprint"><f=
ont color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 static const std::me=
mory_order Write_order =3D std::memory_order_relaxed;</font></div><div clas=
s=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace"><br></font>=
</div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospa=
ce">=C2=A0 =C2=A0 unsigned load() const { return(i.load(Read_order)); }</fo=
nt></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"mono=
space"><br></font></div><div class=3D"subprettyprint"><font color=3D"#66660=
0" face=3D"monospace">=C2=A0 public:</font></div><div class=3D"subprettypri=
nt"><font color=3D"#666600" face=3D"monospace"><br></font></div><div class=
=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=
=A0 Cardinal(unsigned i_) : i(i_) { }</font></div><div class=3D"subprettypr=
int"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 operator unsi=
gned () const { return(load()); }</font></div><div class=3D"subprettyprint"=
><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 void operator ++ =
() { i.store(load() + 1, Write_order); }</font></div><div class=3D"subprett=
yprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 };</font></div><d=
iv class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace"><br>=
</font></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"=
monospace">#endif</font></div><div class=3D"subprettyprint"><font color=3D"=
#666600" face=3D"monospace"><br></font></div><div class=3D"subprettyprint">=
<font color=3D"#666600" face=3D"monospace">// Increment my number, then if =
necessary wait for my identical twin thread</font></div><div class=3D"subpr=
ettyprint"><font color=3D"#666600" face=3D"monospace">// to catch up with i=
ncrementing its number.</font></div><div class=3D"subprettyprint"><font col=
or=3D"#666600" face=3D"monospace">//</font></div><div class=3D"subprettypri=
nt"><font color=3D"#666600" face=3D"monospace">template &lt;Cardinal *Mine,=
 Cardinal *Twins_number&gt;</font></div><div class=3D"subprettyprint"><font=
 color=3D"#666600" face=3D"monospace">void twin_thread()</font></div><div c=
lass=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 {=
</font></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"=
monospace">=C2=A0 =C2=A0 unsigned equals_mine =3D 0;</font></div><div class=
=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace"><br></font><=
/div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospac=
e">=C2=A0 =C2=A0 while (equals_mine &lt; 100)</font></div><div class=3D"sub=
prettyprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 =C2=
=A0 {</font></div><div class=3D"subprettyprint"><font color=3D"#666600" fac=
e=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 ++(*Mine);</font></div><div cla=
ss=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace"><br></font=
></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monosp=
ace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 std::atomic_thread_fence(std::memory_order=
_release);</font></div><div class=3D"subprettyprint"><font color=3D"#666600=
" face=3D"monospace"><br></font></div><div class=3D"subprettyprint"><font c=
olor=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 ++equals_mi=
ne;</font></div><div class=3D"subprettyprint"><font color=3D"#666600" face=
=3D"monospace"><br></font></div><div class=3D"subprettyprint"><font color=
=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_n=
umber &lt; equals_mine)</font></div><div class=3D"subprettyprint"><font col=
or=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;</fon=
t></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monos=
pace">=C2=A0 =C2=A0 =C2=A0 }</font></div><div class=3D"subprettyprint"><fon=
t color=3D"#666600" face=3D"monospace">=C2=A0 }</font></div><div class=3D"s=
ubprettyprint"><font color=3D"#666600" face=3D"monospace"><br></font></div>=
<div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace">Ca=
rdinal i1(0), i2(0);</font></div><div class=3D"subprettyprint"><font color=
=3D"#666600" face=3D"monospace"><br></font></div><div class=3D"subprettypri=
nt"><font color=3D"#666600" face=3D"monospace">void thread1() { twin_thread=
&lt;&amp;i1, &amp;i2&gt;(); }</font></div><div class=3D"subprettyprint"><fo=
nt color=3D"#666600" face=3D"monospace">void thread2() { twin_thread&lt;&am=
p;i2, &amp;i1&gt;(); }</font></div><div class=3D"subprettyprint"><font colo=
r=3D"#666600" face=3D"monospace"><br></font></div><div class=3D"subprettypr=
int"><font color=3D"#666600" face=3D"monospace">#include &lt;iostream&gt;</=
font></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"mo=
nospace">#include &lt;thread&gt;</font></div><div class=3D"subprettyprint">=
<font color=3D"#666600" face=3D"monospace">#include &lt;chrono&gt;</font></=
div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace=
">=C2=A0</font></div><div class=3D"subprettyprint"><font color=3D"#666600" =
face=3D"monospace">int main()</font></div><div class=3D"subprettyprint"><fo=
nt color=3D"#666600" face=3D"monospace">=C2=A0 {</font></div><div class=3D"=
subprettyprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 st=
d::thread t1(thread1);</font></div><div class=3D"subprettyprint"><font colo=
r=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 std::thread t2(thread2);</fo=
nt></div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"mono=
space">=C2=A0 =C2=A0=C2=A0</font></div><div class=3D"subprettyprint"><font =
color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 std::this_thread::sleep_=
for(std::chrono::seconds(3));</font></div><div class=3D"subprettyprint"><fo=
nt color=3D"#666600" face=3D"monospace"><br></font></div><div class=3D"subp=
rettyprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 if (i1=
 =3D=3D 100)</font></div><div class=3D"subprettyprint"><font color=3D"#6666=
00" face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 t1.join();</font></div><div cla=
ss=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=
=A0 else</font></div><div class=3D"subprettyprint"><font color=3D"#666600" =
face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 std::cout &lt;&lt; i1 &lt;&lt; &#39=
;\n&#39;;</font></div><div class=3D"subprettyprint"><font color=3D"#666600"=
 face=3D"monospace"><br></font></div><div class=3D"subprettyprint"><font co=
lor=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 if (i2 =3D=3D 100)</font><=
/div><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospac=
e">=C2=A0 =C2=A0 =C2=A0 t2.join();</font></div><div class=3D"subprettyprint=
"><font color=3D"#666600" face=3D"monospace">=C2=A0 =C2=A0 else</font></div=
><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace">=
=C2=A0 =C2=A0 =C2=A0 std::cout &lt;&lt; i2 &lt;&lt; &#39;\n&#39;;</font></d=
iv><div class=3D"subprettyprint"><font color=3D"#666600" face=3D"monospace"=
><br></font></div><div class=3D"subprettyprint"><font color=3D"#666600" fac=
e=3D"monospace">=C2=A0 =C2=A0 return(0);</font></div><div class=3D"subprett=
yprint"><font color=3D"#666600" face=3D"monospace">=C2=A0 }</font></div><di=
v><br></div>(I tested this with gcc (6.3.1), x86, Linux.)</div><div><br></d=
iv><div>When I don&#39;t enable optimization, this code works fine, whether=
 access to the thread-shared variables is atomic or non-atomic. =C2=A0When =
I enable optimization. the threads get into infinite loops if the access to=
 the thread-shared variables is non-atomic. =C2=A0But atomic access, even w=
ith relaxed order, causes the code to work fine with optimization.</div><di=
v><br></div><div>I don&#39;t see how atomicity of access should matter to w=
hether or not the threads here get into infinite loops. =C2=A0I suspect it =
only matters in practice here because atomic FUD has caused gcc to take a &=
quot;abandon all optimization ye who use atomic&quot; approach. =C2=A0The p=
roblem with this code is not atomicity of memory access, nor order of visib=
ility of accesses in different threads. =C2=A0The problem is <i>lack</i>=C2=
=A0of visibility. =C2=A0Due to the reality that threads have (partial) priv=
ate copies of memory. =C2=A0This is why I suspect that the memory model wou=
ld be better with explicit thread-private copies of memory.</div><div><br><=
/div><div>If there&#39;s a valid reason that relaxed-order atomic access sh=
ould cause the threads to avoid infinite looping, then I&#39;ll have to cop=
 to being an atomic-naive wanker who&#39;s cluttering the proposal forum.</=
div><div><br></div><div>(Tangentially, I&#39;m surprised that gcc will emit=
 a &quot;jump to here&quot; like this with no warning:</div><div><br></div>=
<div>L7:<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 jmp =C2=A0 =C2=A0 .L7</div><div><b=
r></div></div><div>even with -Wall -Wextra --predantic .)</div><div><br><br=
><br></div><div>=C2=A0</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/8575c41d-1c8e-4e6f-9f12-a07bb70a7703%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/8575c41d-1c8e-4e6f-9f12-a07bb70a7703=
%40isocpp.org</a>.<br />

------=_Part_371_1583845232.1487361264632--

------=_Part_370_1300324002.1487361264631--

.


Author: inkwizytoryankes@gmail.com
Date: Fri, 17 Feb 2017 14:06:14 -0800 (PST)
Raw View
------=_Part_470_342774786.1487369174789
Content-Type: multipart/alternative;
 boundary="----=_Part_471_353028752.1487369174789"

------=_Part_471_353028752.1487369174789
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable



On Friday, February 17, 2017 at 8:54:24 PM UTC+1, Walt Karas wrote:
>
> On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:
>>
>> Leaving the questions about =E2=80=8Eactually changing the standard asid=
e, and=20
>> focusing on understanding (which may just mean this could be a=20
>> std-discussion question instead of std-proposal),
>>
>> In your model, if X and Y are relaxed atomic operations in thread T1, ca=
n=20
>> thread T2 see them as Y before X while thread T3 sees them as =E2=80=8EX=
 before Y?
>>
>> Sent from my BlackBerry portable Babbage Device
>> *From: *'Walt Karas' via ISO C++ Standard - Future Proposals
>> *Sent: *Wednesday, February 15, 2017 2:17 PM
>> *To: *ISO C++ Standard - Future Proposals
>> *Reply To: *std-pr...@isocpp.org
>> *Subject: *[std-proposals] Proposed alternative approach to specifying=
=20
>> required memory operation ordering
>>
>> - In a program execution, each thread defines a nominal order of (thread=
=20
>> local) memory and fence operations.
>> - For any operations X and Y in thread T, either X before Y, or Y before=
=20
>> X.
>> - A program execution defines a nominal global order of global memory=20
>> operations.
>> - If X and Y are atomic global operations, then either X before Y or Y=
=20
>> before X in the global order.
>> - If X is a global operation, and there exists a global store S where=20
>> neither X before S nor S before X in the global order, then the result o=
f X=20
>> is undefined.
>> - A program execution defines a partial function f(T, LO) -> GO where LO=
=20
>> is a local memory operation in thread T, and GO is a global operation.  =
If=20
>> LO is atomic, then GO must be atomic. The result of LO is the result of=
=20
>> GO.  If the result of GO is undefined, the result of LO is undefined. =
=20
>> (Even if f(T, LO) is not required to exist, it none the less _may_ exist=
..)
>> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T, an=
d=20
>> f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in t=
he=20
>> global order is not allowed.
>> - If:
>> 1.  In a thread T, X and Y are memory operations, and F is a fence=20
>> operation.
>> 2.  X before F and F before Y.
>> 3.  F is sequentially consistent and both X and Y are atomic, or
>> 4.  F is acquire and both X and Y are loads, or
>> 5.  F is release and both X and Y are stores.
>> then F is activated for X and Y.
>> - In a thread T, if X and Y are memory operations (where X before Y) wit=
h=20
>> an activated fence F, then f(T, X) must exist.
>> - For every global operation GO, there must exist a local operation LO i=
n=20
>> some thread T where f(T, LO) -> GO.  (Assuming no intense gamma radiatio=
n.)
>> - A sequentially consistent (thread local) atomic memory operation=20
>> implies two sequentially consistent fences, one before and one after it =
(as=20
>> well as a preceding release for a store, and a succeeding acquire for a=
=20
>> load).
>>
>> --=20
>> You received this message because you are subscribed to the Google Group=
s=20
>> "ISO C++ Standard - Future Proposals" group.
>> To unsubscribe from this group and stop receiving emails from it, send a=
n=20
>> email to std-proposal...@isocpp.org.
>> To post to this group, send email to std-pr...@isocpp.org.
>> To view this discussion on the web visit=20
>> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f0=
4a-46e6-b93a-0b9ea85cb5bd%40isocpp.org=20
>> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f=
04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&utm_source=3Dfoo=
ter>
>> .
>>
>>
> Here is some code that illustrates some points of confusion I have with=
=20
> the memory model:
>
> (...)
>
>         while (*Twins_number < equals_mine)
>           ;
>
=20
I'm not expert or even experienced in atomics but I think that is race=20
condition because you access this variable without synchronization. You=20
would need probably fence or acquire in `while` otherwise compiler can=20
amuse that value will not change there.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/a17092d7-d348-4d8c-94f1-fa272ef9e630%40isocpp.or=
g.

------=_Part_471_353028752.1487369174789
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Friday, February 17, 2017 at 8:54:24 PM UTC+1, =
Walt Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margi=
n-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"l=
tr">On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:<bl=
ockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><div style=3D"background-color:rgb(255=
,255,255);line-height:initial" lang=3D"en-US">                             =
                                                         <div style=3D"widt=
h:100%;font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif=
,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(25=
5,255,255)">Leaving the questions about =E2=80=8Eactually changing the stan=
dard aside, and focusing on understanding (which may just mean this could b=
e a std-discussion question instead of std-proposal),</div><div style=3D"wi=
dth:100%;font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-ser=
if,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(=
255,255,255)"><br></div><div style=3D"width:100%;font-size:initial;font-fam=
ily:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);=
text-align:initial;background-color:rgb(255,255,255)">In your model, if X a=
nd Y are relaxed atomic operations in thread T1, can thread T2 see them as =
Y before X while thread T3 sees them as =E2=80=8EX before Y?</div>         =
                                                                           =
                                                 <div style=3D"width:100%;f=
ont-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-se=
rif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,25=
5)"><br style=3D"display:initial"></div>                                   =
                                                                           =
                                                                           =
          <div style=3D"font-size:initial;font-family:Calibri,&#39;Slate Pr=
o&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;backgr=
ound-color:rgb(255,255,255)">Sent=C2=A0from=C2=A0my=C2=A0BlackBerry=C2=A0<w=
br>portable=C2=A0Babbage=C2=A0Device</div>                                 =
                                                                           =
                                                                      <tabl=
e style=3D"background-color:white;border-spacing:0px" width=3D"100%"> <tbod=
y><tr><td colspan=3D"2" style=3D"font-size:initial;text-align:initial;backg=
round-color:rgb(255,255,255)">                           <div style=3D"bord=
er-style:solid none none;border-top-color:rgb(181,196,223);border-top-width=
:1pt;padding:3pt 0in 0in;font-family:Tahoma,&#39;BB Alpha Sans&#39;,&#39;Sl=
ate Pro&#39;;font-size:10pt">  <div><b>From: </b>&#39;Walt Karas&#39; via I=
SO C++ Standard - Future Proposals</div><div><b>Sent: </b>Wednesday, Februa=
ry 15, 2017 2:17 PM</div><div><b>To: </b>ISO C++ Standard - Future Proposal=
s</div><div><b>Reply To: </b><a rel=3D"nofollow">std-pr...@isocpp.org</a></=
div><div><b>Subject: </b>[std-proposals] Proposed alternative approach to s=
pecifying required memory operation ordering</div></div></td></tr></tbody><=
/table><div style=3D"border-style:solid none none;border-top-color:rgb(186,=
188,209);border-top-width:1pt;font-size:initial;text-align:initial;backgrou=
nd-color:rgb(255,255,255)"></div><br><div><div dir=3D"ltr"><div>- In a prog=
ram execution,=C2=A0each thread defines a nominal order of (thread local) m=
emory and fence operations.</div><div>- For any operations X and Y in threa=
d T, either X before Y, or Y before X.</div><div>- A program execution defi=
nes a nominal global order of global memory operations.</div><div>- If X an=
d Y are atomic global operations, then either X before Y or Y before X in t=
he global order.</div><div>- If X is a global operation, and there exists a=
 global store S where neither X before S nor S before X in the global order=
, then the result of X is undefined.</div><div>- A program execution define=
s a partial function f(T, LO) -&gt; GO where LO is a local memory operation=
 in thread T, and GO is a global operation.=C2=A0 If LO is atomic, then GO =
must be atomic. The result of LO is the result of GO.=C2=A0 If the result o=
f GO is undefined, the result of LO is undefined.=C2=A0 (Even if f(T, LO) i=
s not required to exist, it none the less _may_ exist.)</div><div>- If LO1 =
and LO2 are operations in thread T, and LO1 before LO2 in T, and f(T, LO1) =
and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1) in the global ord=
er is not allowed.</div><div>- If:</div><div>1.=C2=A0 In a thread T, X and =
Y are memory operations, and F is a fence operation.</div><div>2.=C2=A0 X b=
efore=C2=A0F and=C2=A0F before Y.</div><div>3.=C2=A0 F is sequentially cons=
istent and both X and Y are atomic, or</div><div>4.=C2=A0 F is acquire and =
both X and Y are loads, or</div><div>5.=C2=A0 F is release and both X and Y=
 are stores.</div><div>then F is activated for X and Y.</div><div>-=C2=A0In=
 a thread T, if X and Y are memory operations (where X before Y) with an ac=
tivated fence F, then f(T, X) must exist.</div><div>- For every global oper=
ation GO, there must exist a local operation LO in some thread T where f(T,=
 LO) -&gt; GO.=C2=A0 (Assuming no intense gamma radiation.)</div><div>- A s=
equentially consistent (thread local) atomic memory operation implies two s=
equentially consistent fences, one before and one after it (as well as a pr=
eceding release for a store, and a succeeding acquire for a load).</div></d=
iv>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a rel=3D"nofollow">std-proposal...@isocpp.org</a>.<br>
To post to this group, send email to <a rel=3D"nofollow">std-pr...@isocpp.o=
rg</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter" rel=3D"nofollow" t=
arget=3D"_blank" onmousedown=3D"this.href=3D&#39;https://groups.google.com/=
a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40i=
socpp.org?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" on=
click=3D"this.href=3D&#39;https://groups.google.com/a/isocpp.org/d/msgid/st=
d-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium\x3=
demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com=
/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/d0fdabcd-f04a-46e6-<wbr>b93a-=
0b9ea85cb5bd%40isocpp.org</a><wbr>.<br>
<br></div></div></blockquote><div><br></div><div>Here is some code that ill=
ustrates some points of confusion I have with the memory model:</div><div><=
br></div><div><font color=3D"#666600">(...)</font><br><div><font color=3D"#=
666600" face=3D"monospace"><br></font></div><div><font color=3D"#666600" fa=
ce=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equa=
ls_mine)</font></div><div><font color=3D"#666600" face=3D"monospace">=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=
=C2=A0<br>I&#39;m not expert or even experienced in atomics but I think tha=
t is race condition because you access this variable without synchronizatio=
n. You would need probably fence or acquire in `while` otherwise compiler c=
an amuse that value will not change there.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/a17092d7-d348-4d8c-94f1-fa272ef9e630%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/a17092d7-d348-4d8c-94f1-fa272ef9e630=
%40isocpp.org</a>.<br />

------=_Part_471_353028752.1487369174789--

------=_Part_470_342774786.1487369174789--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 17 Feb 2017 14:14:11 -0800 (PST)
Raw View
------=_Part_466_1605222556.1487369651738
Content-Type: multipart/alternative;
 boundary="----=_Part_467_855914295.1487369651739"

------=_Part_467_855914295.1487369651739
Content-Type: text/plain; charset=UTF-8

On Friday, February 17, 2017 at 2:54:24 PM UTC-5, Walt Karas wrote:
>
> (I tested this with gcc (6.3.1), x86, Linux.)
>
> When I don't enable optimization, this code works fine, whether access to
> the thread-shared variables is atomic or non-atomic.  When I enable
> optimization. the threads get into infinite loops if the access to the
> thread-shared variables is non-atomic.  But atomic access, even with
> relaxed order, causes the code to work fine with optimization.
>

I think you're too focused on what compilers are doing and not focused
enough on what the memory model says. If you invoke UB, then you invoke UB,
whether your code appears to work or not.

I don't see how atomicity of access should matter to whether or not the
> threads here get into infinite loops.
>

So your question is why all of this synchronization and visibility stuff is
specific to atomic operations. Why you can't do it for non-atomic accesses
(without locking/releasing a mutex).

There are many aspects to threaded memory operations. There is
synchronization/ordering: what order do operations on different threads
happen in. There is visibility/coherency: which threads can access the
results of which operations. But there is one more you discount:

*Integrity*: what happens when two threads poke at the same piece of memory
at the same time.

Atomic operations guarantee the *integrity* of two threaded operations. If
two threads perform a `memory_order_relaxed` increment by 1 to the same
atomic integer, and if that atomic variable is 0 before the two threads
increment it, then that variable is required by the standard to have the
value 0, 1, or 2, depending on when you look at it. The standard doesn't
say which value it will have when, or who will be able to see which
particular value at what time. But the standard makes it abundantly clear
that it will have only one of those values.

By contrast, if you do the above case with an arbitrary piece of memory,
you get UB due to a data race. It could have any value. It could crash. It
could do *anything*; memory integrity is not maintained.

Atomic operations guarantee atomicity: the whole thing happens completely,
without seeing partial data and without being interfered with by any other
thread during the operation. Without a guarantee of the integrity of a
piece of memory, it makes no sense to talk about visibility. After all, if
you can't guarantee what the memory actually stores, why does it matter who
can see it?

Now, you might wonder why we don't guarantee atomicity of all operations.
Well, atomicity guarantees are *not cheap* to implement. And since C++
prefers not to pay for something you don't use, you must explicitly *ask*
for such guarantees.  Since most cost is not doing inter-thread
communication, it's better for all involved if compilers start with the
assumption that memory accesses need not be atomic. And if you want
atomicity of access, you have to pay for it.

So the C++ memory model basically says that if two threads are accessing
the same object, then you *need* atomicity of access, whether via
`std::atomic` or locking of a `mutex` or somesuch.

Also, there are two other important points to understand.

First, your `operator++` *fails* at memory integrity. It does an increment
by doing a load, add, and a store. But it is not an atomic operation.
Granted, that's fine in your case because you're never trying to increment
the value more than once from different threads. But you really should use
`fetch_add` for a proper atomic increment.

Second, atomicity is indeed necessary for your code to work. But it is not
*sufficient*.

Your code is using explicit ordering guarantees that *also* guarantee
visibility and ordering (to some degree). What makes your code work is the
fact that you release after you've done the writes, then acquire when you
read them. By using relaxed-order beforehand, it effectively allows you to
"queue-up" a bunch of non-visible operations, then make them all visible
with a single fence.

Take those guarantees away, and your code has undefined behavior. Merely
using atomic variables does not guarantee visibility.


> I suspect it only matters in practice here because atomic FUD has caused
> gcc to take a "abandon all optimization ye who use atomic" approach.  The
> problem with this code is not atomicity of memory access, nor order of
> visibility of accesses in different threads.  The problem is *lack* of
> visibility.  Due to the reality that threads have (partial) private copies
> of memory.  This is why I suspect that the memory model would be better
> with explicit thread-private copies of memory.
>

But that would basically be making the cache into a real construct. It
would require arbitrarily deciding what "thread-private" memory is and
separating it out from "global memory".

It's better to just let the writer of the code decide how it works.

If there's a valid reason that relaxed-order atomic access should cause the
> threads to avoid infinite looping, then I'll have to cop to being an
> atomic-naive wanker who's cluttering the proposal forum.
>

First, as previously stated, "relaxed-order atomic access" is not what
makes your code work. Second, define "valid reason".

"Because that's how people who know a lot about CPU pipelining, caches, and
the like decided it should" is a valid enough reason for me. If experts on
such subjects make a decision, I'm going to assume it's the right one
unless I have actual evidence to the contrary. And suppositions about what
a good threaded memory model might look like doesn't count as evidence.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/f795b0f9-69be-4413-8a26-fc7ee4d31ba3%40isocpp.org.

------=_Part_467_855914295.1487369651739
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Friday, February 17, 2017 at 2:54:24 PM UTC-5, Walt Kar=
as wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: =
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">(I t=
ested this with gcc (6.3.1), x86, Linux.)<div><br></div><div>When I don&#39=
;t enable optimization, this code works fine, whether access to the thread-=
shared variables is atomic or non-atomic. =C2=A0When I enable optimization.=
 the threads get into infinite loops if the access to the thread-shared var=
iables is non-atomic. =C2=A0But atomic access, even with relaxed order, cau=
ses the code to work fine with optimization.</div></div></blockquote><div><=
br>I think you&#39;re too focused on what compilers are doing and not focus=
ed enough on what the memory model says. If you invoke UB, then you invoke =
UB, whether your code appears to work or not.<br><br></div><blockquote clas=
s=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #c=
cc solid;padding-left: 1ex;"><div dir=3D"ltr"><div></div><div>I don&#39;t s=
ee how atomicity of access should matter to whether or not the threads here=
 get into infinite loops.</div></div></blockquote><div><br>So your question=
 is why all of this synchronization and visibility stuff is specific to ato=
mic operations. Why you can&#39;t do it for non-atomic accesses (without lo=
cking/releasing a mutex).<br><br>There are many aspects to threaded memory =
operations. There is synchronization/ordering: what order do operations on =
different threads happen in. There is visibility/coherency: which threads c=
an access the results of which operations. But there is one more you discou=
nt:<br><br><i>Integrity</i>: what happens when two threads poke at the same=
 piece of memory at the same time.<br><br>Atomic operations guarantee the <=
i>integrity</i> of two threaded operations. If two threads perform a `memor=
y_order_relaxed` increment by 1 to the same atomic integer, and if that ato=
mic variable is 0 before the two threads increment it, then that variable i=
s required by the standard to have the value 0, 1, or 2, depending on when =
you look at it. The standard doesn&#39;t say which value it will have when,=
 or who will be able to see which particular value at what time. But the st=
andard makes it abundantly clear that it will have only one of those values=
..<br><br>By contrast, if you do the above case with an arbitrary piece of m=
emory, you get UB due to a data race. It could have any value. It could cra=
sh. It could do <i>anything</i>; memory integrity is not maintained.<br><br=
>Atomic operations guarantee atomicity: the whole thing happens completely,=
 without seeing partial data and without being interfered with by any other=
 thread during the operation. Without a guarantee of the integrity of a pie=
ce of memory, it makes no sense to talk about visibility. After all, if you=
 can&#39;t guarantee what the memory actually stores, why does it matter wh=
o can see it?<br><br>Now, you might wonder why we don&#39;t guarantee atomi=
city of all operations. Well, atomicity guarantees are <i>not cheap</i> to =
implement. And since C++ prefers not to pay for something you don&#39;t use=
, you must explicitly <i>ask</i> for such guarantees.=C2=A0 Since most cost=
 is not doing inter-thread communication, it&#39;s better for all involved =
if compilers start with the assumption that memory accesses need not be ato=
mic. And if you want atomicity of access, you have to pay for it.<br><br>So=
 the C++ memory model basically says that if two threads are accessing the =
same object, then you <i>need</i> atomicity of access, whether via `std::at=
omic` or locking of a `mutex` or somesuch.<br><br>Also, there are two other=
 important points to understand.<br><br>First, your `operator++` <i>fails</=
i> at memory integrity. It does an increment by doing a load, add, and a st=
ore. But it is not an atomic operation. Granted, that&#39;s fine in your ca=
se because you&#39;re never trying to increment the value more than once fr=
om different threads. But you really should use `fetch_add` for a proper at=
omic increment.<br><br>Second, atomicity is indeed necessary for your code =
to work. But it is not <i>sufficient</i>.<br><br>Your code is using explici=
t ordering guarantees that <i>also</i> guarantee visibility and ordering (t=
o some degree). What makes your code work is the fact that you release afte=
r you&#39;ve done
 the writes, then acquire when you read them. By using relaxed-order=20
beforehand, it effectively allows you to &quot;queue-up&quot; a bunch of no=
n-visible operations, then make them all visible with a single fence.<br><b=
r>Take those guarantees away, and your code has undefined behavior. Merely =
using atomic variables does not guarantee visibility.<br>=C2=A0</div><block=
quote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-le=
ft: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><div>I suspect it o=
nly matters in practice here because atomic FUD has caused gcc to take a &q=
uot;abandon all optimization ye who use atomic&quot; approach. =C2=A0The pr=
oblem with this code is not atomicity of memory access, nor order of visibi=
lity of accesses in different threads. =C2=A0The problem is <i>lack</i>=C2=
=A0of visibility. =C2=A0Due to the reality that threads have (partial) priv=
ate copies of memory. =C2=A0This is why I suspect that the memory model wou=
ld be better with explicit thread-private copies of memory.</div></div></bl=
ockquote><div><br>But that would basically be making the cache into a real =
construct. It would require arbitrarily deciding what &quot;thread-private&=
quot; memory is and separating it out from &quot;global memory&quot;.<br><b=
r>It&#39;s better to just let the writer of the code decide how it works.<b=
r><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><=
div></div><div>If there&#39;s a valid reason that relaxed-order atomic acce=
ss should cause the threads to avoid infinite looping, then I&#39;ll have t=
o cop to being an atomic-naive wanker who&#39;s cluttering the proposal for=
um.</div></div></blockquote><div><br>First, as previously stated, &quot;rel=
axed-order atomic access&quot; is not what makes your code work. Second, de=
fine &quot;valid reason&quot;.<br><br>&quot;Because that&#39;s how people w=
ho know a lot about CPU pipelining, caches, and the like decided it should&=
quot; is a valid enough reason for me. If experts on such subjects make a d=
ecision, I&#39;m going to assume it&#39;s the right one unless I have actua=
l evidence to the contrary. And suppositions about what a good threaded mem=
ory model might look like doesn&#39;t count as evidence.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/f795b0f9-69be-4413-8a26-fc7ee4d31ba3%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/f795b0f9-69be-4413-8a26-fc7ee4d31ba3=
%40isocpp.org</a>.<br />

------=_Part_467_855914295.1487369651739--

------=_Part_466_1605222556.1487369651738--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 17 Feb 2017 14:14:55 -0800 (PST)
Raw View
------=_Part_434_218215407.1487369695462
Content-Type: multipart/alternative;
 boundary="----=_Part_435_1761947262.1487369695463"

------=_Part_435_1761947262.1487369695463
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable



On Friday, February 17, 2017 at 5:06:14 PM UTC-5, inkwizyt...@gmail.com=20
wrote:
>
>
>
> On Friday, February 17, 2017 at 8:54:24 PM UTC+1, Walt Karas wrote:
>>
>> On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:
>>>
>>> Leaving the questions about =E2=80=8Eactually changing the standard asi=
de, and=20
>>> focusing on understanding (which may just mean this could be a=20
>>> std-discussion question instead of std-proposal),
>>>
>>> In your model, if X and Y are relaxed atomic operations in thread T1,=
=20
>>> can thread T2 see them as Y before X while thread T3 sees them as =E2=
=80=8EX before=20
>>> Y?
>>>
>>> Sent from my BlackBerry portable Babbage Device
>>> *From: *'Walt Karas' via ISO C++ Standard - Future Proposals
>>> *Sent: *Wednesday, February 15, 2017 2:17 PM
>>> *To: *ISO C++ Standard - Future Proposals
>>> *Reply To: *std-pr...@isocpp.org
>>> *Subject: *[std-proposals] Proposed alternative approach to specifying=
=20
>>> required memory operation ordering
>>>
>>> - In a program execution, each thread defines a nominal order of (threa=
d=20
>>> local) memory and fence operations.
>>> - For any operations X and Y in thread T, either X before Y, or Y befor=
e=20
>>> X.
>>> - A program execution defines a nominal global order of global memory=
=20
>>> operations.
>>> - If X and Y are atomic global operations, then either X before Y or Y=
=20
>>> before X in the global order.
>>> - If X is a global operation, and there exists a global store S where=
=20
>>> neither X before S nor S before X in the global order, then the result =
of X=20
>>> is undefined.
>>> - A program execution defines a partial function f(T, LO) -> GO where L=
O=20
>>> is a local memory operation in thread T, and GO is a global operation. =
 If=20
>>> LO is atomic, then GO must be atomic. The result of LO is the result of=
=20
>>> GO.  If the result of GO is undefined, the result of LO is undefined. =
=20
>>> (Even if f(T, LO) is not required to exist, it none the less _may_ exis=
t.)
>>> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T,=
=20
>>> and f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1)=
 in=20
>>> the global order is not allowed.
>>> - If:
>>> 1.  In a thread T, X and Y are memory operations, and F is a fence=20
>>> operation.
>>> 2.  X before F and F before Y.
>>> 3.  F is sequentially consistent and both X and Y are atomic, or
>>> 4.  F is acquire and both X and Y are loads, or
>>> 5.  F is release and both X and Y are stores.
>>> then F is activated for X and Y.
>>> - In a thread T, if X and Y are memory operations (where X before Y)=20
>>> with an activated fence F, then f(T, X) must exist.
>>> - For every global operation GO, there must exist a local operation LO=
=20
>>> in some thread T where f(T, LO) -> GO.  (Assuming no intense gamma=20
>>> radiation.)
>>> - A sequentially consistent (thread local) atomic memory operation=20
>>> implies two sequentially consistent fences, one before and one after it=
 (as=20
>>> well as a preceding release for a store, and a succeeding acquire for a=
=20
>>> load).
>>>
>>> --=20
>>> You received this message because you are subscribed to the Google=20
>>> Groups "ISO C++ Standard - Future Proposals" group.
>>> To unsubscribe from this group and stop receiving emails from it, send=
=20
>>> an email to std-proposal...@isocpp.org.
>>> To post to this group, send email to std-pr...@isocpp.org.
>>> To view this discussion on the web visit=20
>>> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f=
04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org=20
>>> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-=
f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&utm_source=3Dfo=
oter>
>>> .
>>>
>>>
>> Here is some code that illustrates some points of confusion I have with=
=20
>> the memory model:
>>
>> (...)
>>
>>         while (*Twins_number < equals_mine)
>>           ;
>>
> =20
> I'm not expert or even experienced in atomics but I think that is race=20
> condition because you access this variable without synchronization. You=
=20
> would need probably fence or acquire in `while` otherwise compiler can=20
> amuse that value will not change there.
>

Actually, it does use synchronization. `operator*` executes a `load()` on=
=20
the atomic object. That forces synchronization.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/e95d8da5-4d52-4582-be4c-1ed9c1c4c84e%40isocpp.or=
g.

------=_Part_435_1761947262.1487369695463
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Friday, February 17, 2017 at 5:06:14 PM UTC-5, =
inkwizyt...@gmail.com wrote:<blockquote class=3D"gmail_quote" style=3D"marg=
in: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><d=
iv dir=3D"ltr"><br><br>On Friday, February 17, 2017 at 8:54:24 PM UTC+1, Wa=
lt Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-l=
eft:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">On =
Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:<blockquot=
e class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div style=3D"background-color:rgb(255,255,25=
5);line-height:initial" lang=3D"en-US">                                    =
                                                  <div style=3D"width:100%;=
font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-s=
erif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,2=
55)">Leaving the questions about =E2=80=8Eactually changing the standard as=
ide, and focusing on understanding (which may just mean this could be a std=
-discussion question instead of std-proposal),</div><div style=3D"width:100=
%;font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans=
-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255=
,255)"><br></div><div style=3D"width:100%;font-size:initial;font-family:Cal=
ibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-al=
ign:initial;background-color:rgb(255,255,255)">In your model, if X and Y ar=
e relaxed atomic operations in thread T1, can thread T2 see them as Y befor=
e X while thread T3 sees them as =E2=80=8EX before Y?</div>                =
                                                                           =
                                          <div style=3D"width:100%;font-siz=
e:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;col=
or:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br=
 style=3D"display:initial"></div>                                          =
                                                                           =
                                                                           =
   <div style=3D"font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,=
sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-co=
lor:rgb(255,255,255)">Sent=C2=A0from=C2=A0my=C2=A0BlackBerry=C2=A0<wbr>port=
able=C2=A0Babbage=C2=A0Device</div>                                        =
                                                                           =
                                                               <table style=
=3D"background-color:white;border-spacing:0px" width=3D"100%"> <tbody><tr><=
td colspan=3D"2" style=3D"font-size:initial;text-align:initial;background-c=
olor:rgb(255,255,255)">                           <div style=3D"border-styl=
e:solid none none;border-top-color:rgb(181,196,223);border-top-width:1pt;pa=
dding:3pt 0in 0in;font-family:Tahoma,&#39;BB Alpha Sans&#39;,&#39;Slate Pro=
&#39;;font-size:10pt">  <div><b>From: </b>&#39;Walt Karas&#39; via ISO C++ =
Standard - Future Proposals</div><div><b>Sent: </b>Wednesday, February 15, =
2017 2:17 PM</div><div><b>To: </b>ISO C++ Standard - Future Proposals</div>=
<div><b>Reply To: </b><a rel=3D"nofollow">std-pr...@isocpp.org</a></div><di=
v><b>Subject: </b>[std-proposals] Proposed alternative approach to specifyi=
ng required memory operation ordering</div></div></td></tr></tbody></table>=
<div style=3D"border-style:solid none none;border-top-color:rgb(186,188,209=
);border-top-width:1pt;font-size:initial;text-align:initial;background-colo=
r:rgb(255,255,255)"></div><br><div><div dir=3D"ltr"><div>- In a program exe=
cution,=C2=A0each thread defines a nominal order of (thread local) memory a=
nd fence operations.</div><div>- For any operations X and Y in thread T, ei=
ther X before Y, or Y before X.</div><div>- A program execution defines a n=
ominal global order of global memory operations.</div><div>- If X and Y are=
 atomic global operations, then either X before Y or Y before X in the glob=
al order.</div><div>- If X is a global operation, and there exists a global=
 store S where neither X before S nor S before X in the global order, then =
the result of X is undefined.</div><div>- A program execution defines a par=
tial function f(T, LO) -&gt; GO where LO is a local memory operation in thr=
ead T, and GO is a global operation.=C2=A0 If LO is atomic, then GO must be=
 atomic. The result of LO is the result of GO.=C2=A0 If the result of GO is=
 undefined, the result of LO is undefined.=C2=A0 (Even if f(T, LO) is not r=
equired to exist, it none the less _may_ exist.)</div><div>- If LO1 and LO2=
 are operations in thread T, and LO1 before LO2 in T, and f(T, LO1) and f(T=
, LO2) both exist, then f(T, LO2) before f(T, LO1) in the global order is n=
ot allowed.</div><div>- If:</div><div>1.=C2=A0 In a thread T, X and Y are m=
emory operations, and F is a fence operation.</div><div>2.=C2=A0 X before=
=C2=A0F and=C2=A0F before Y.</div><div>3.=C2=A0 F is sequentially consisten=
t and both X and Y are atomic, or</div><div>4.=C2=A0 F is acquire and both =
X and Y are loads, or</div><div>5.=C2=A0 F is release and both X and Y are =
stores.</div><div>then F is activated for X and Y.</div><div>-=C2=A0In a th=
read T, if X and Y are memory operations (where X before Y) with an activat=
ed fence F, then f(T, X) must exist.</div><div>- For every global operation=
 GO, there must exist a local operation LO in some thread T where f(T, LO) =
-&gt; GO.=C2=A0 (Assuming no intense gamma radiation.)</div><div>- A sequen=
tially consistent (thread local) atomic memory operation implies two sequen=
tially consistent fences, one before and one after it (as well as a precedi=
ng release for a store, and a succeeding acquire for a load).</div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a rel=3D"nofollow">std-proposal...@isocpp.org</a>.<br>
To post to this group, send email to <a rel=3D"nofollow">std-pr...@isocpp.o=
rg</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter" rel=3D"nofollow" t=
arget=3D"_blank" onmousedown=3D"this.href=3D&#39;https://groups.google.com/=
a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40i=
socpp.org?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" on=
click=3D"this.href=3D&#39;https://groups.google.com/a/isocpp.org/d/msgid/st=
d-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium\x3=
demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com=
/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/d0fdabcd-f04a-46e6-<wbr>b93a-=
0b9ea85cb5bd%40isocpp.org</a><wbr>.<br>
<br></div></div></blockquote><div><br></div><div>Here is some code that ill=
ustrates some points of confusion I have with the memory model:</div><div><=
br></div><div><font color=3D"#666600">(...)</font><br><div><font face=3D"mo=
nospace" color=3D"#666600"><br></font></div><div><font face=3D"monospace" c=
olor=3D"#666600">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equa=
ls_mine)</font></div><div><font face=3D"monospace" color=3D"#666600">=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=
=C2=A0<br>I&#39;m not expert or even experienced in atomics but I think tha=
t is race condition because you access this variable without synchronizatio=
n. You would need probably fence or acquire in `while` otherwise compiler c=
an amuse that value will not change there.<br></div></div></blockquote><div=
><br>Actually, it does use synchronization. `operator*` executes a `load()`=
 on the atomic object. That forces synchronization.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/e95d8da5-4d52-4582-be4c-1ed9c1c4c84e%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/e95d8da5-4d52-4582-be4c-1ed9c1c4c84e=
%40isocpp.org</a>.<br />

------=_Part_435_1761947262.1487369695463--

------=_Part_434_218215407.1487369695462--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 17 Feb 2017 14:26:47 -0800 (PST)
Raw View
------=_Part_1423_1958939299.1487370407937
Content-Type: multipart/alternative;
 boundary="----=_Part_1424_724303430.1487370407937"

------=_Part_1424_724303430.1487370407937
Content-Type: text/plain; charset=UTF-8

On Friday, February 17, 2017 at 5:14:55 PM UTC-5, Nicol Bolas wrote:
>
> Actually, it does use synchronization. `operator*` executes a `load()` on
> the atomic object. That forces synchronization.
>

Sorry, I said that wrong. `operator unsigned()` performs a `load()`. The
default load mechanism performs a full sync on the variable. So once the
other thread is passed its `release` fence, then this thread will be able
to see the new value.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/eba54bb4-ee11-4b75-975f-0dbf80d332ad%40isocpp.org.

------=_Part_1424_724303430.1487370407937
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Friday, February 17, 2017 at 5:14:55 PM UTC-5, Nicol Bo=
las wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><di=
v>Actually, it does use synchronization. `operator*` executes a `load()` on=
 the atomic object. That forces synchronization.<br></div></div></blockquot=
e><div><br>Sorry, I said that wrong. `operator unsigned()` performs a `load=
()`. The default load mechanism performs a full sync on the variable. So on=
ce the other thread is passed its `release` fence, then this thread will be=
 able to see the new value.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/eba54bb4-ee11-4b75-975f-0dbf80d332ad%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/eba54bb4-ee11-4b75-975f-0dbf80d332ad=
%40isocpp.org</a>.<br />

------=_Part_1424_724303430.1487370407937--

------=_Part_1423_1958939299.1487370407937--

.


Author: inkwizytoryankes@gmail.com
Date: Fri, 17 Feb 2017 14:34:29 -0800 (PST)
Raw View
------=_Part_470_2099594734.1487370869976
Content-Type: multipart/alternative;
 boundary="----=_Part_471_1945726399.1487370869976"

------=_Part_471_1945726399.1487370869976
Content-Type: text/plain; charset=UTF-8



On Friday, February 17, 2017 at 11:26:48 PM UTC+1, Nicol Bolas wrote:
>
> On Friday, February 17, 2017 at 5:14:55 PM UTC-5, Nicol Bolas wrote:
>>
>> Actually, it does use synchronization. `operator*` executes a `load()` on
>> the atomic object. That forces synchronization.
>>
>
> Sorry, I said that wrong. `operator unsigned()` performs a `load()`. The
> default load mechanism performs a full sync on the variable. So once the
> other thread is passed its `release` fence, then this thread will be able
> to see the new value.
>

I was answering:
"When I don't enable optimization, this code works fine, whether access to
the thread-shared variables is atomic or non-atomic.  When I enable
optimization. the threads get into infinite loops if the access to the
thread-shared variables is non-atomic.  But atomic access, even with
relaxed order, causes the code to work fine with optimization."
This mean when `using Cardinal = unsigned;` and we don't have any load.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/cb0cc405-f052-4cb1-8a00-ca2cc2c38186%40isocpp.org.

------=_Part_471_1945726399.1487370869976
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Friday, February 17, 2017 at 11:26:48 PM UTC+1,=
 Nicol Bolas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D=
"ltr">On Friday, February 17, 2017 at 5:14:55 PM UTC-5, Nicol Bolas wrote:<=
blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border=
-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Actually, it d=
oes use synchronization. `operator*` executes a `load()` on the atomic obje=
ct. That forces synchronization.<br></div></div></blockquote><div><br>Sorry=
, I said that wrong. `operator unsigned()` performs a `load()`. The default=
 load mechanism performs a full sync on the variable. So once the other thr=
ead is passed its `release` fence, then this thread will be able to see the=
 new value.<br></div></div></blockquote><div><br>I was answering:<br>&quot;=
When I don&#39;t enable optimization, this code works=20
fine, whether access to the thread-shared variables is atomic or=20
non-atomic. =C2=A0When I enable optimization. the threads get into infinite=
=20
loops if the access to the thread-shared variables is non-atomic. =C2=A0But=
=20
atomic access, even with relaxed order, causes the code to work fine=20
with optimization.&quot; <br>This mean when `using Cardinal =3D unsigned;` =
and we don&#39;t have any load.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/cb0cc405-f052-4cb1-8a00-ca2cc2c38186%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/cb0cc405-f052-4cb1-8a00-ca2cc2c38186=
%40isocpp.org</a>.<br />

------=_Part_471_1945726399.1487370869976--

------=_Part_470_2099594734.1487370869976--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 17 Feb 2017 14:45:30 -0800 (PST)
Raw View
------=_Part_1474_903571175.1487371530887
Content-Type: multipart/alternative;
 boundary="----=_Part_1475_762578859.1487371530887"

------=_Part_1475_762578859.1487371530887
Content-Type: text/plain; charset=UTF-8

On Friday, February 17, 2017 at 5:34:30 PM UTC-5, inkwizyt...@gmail.com
wrote:
>
>
>
> On Friday, February 17, 2017 at 11:26:48 PM UTC+1, Nicol Bolas wrote:
>>
>> On Friday, February 17, 2017 at 5:14:55 PM UTC-5, Nicol Bolas wrote:
>>>
>>> Actually, it does use synchronization. `operator*` executes a `load()`
>>> on the atomic object. That forces synchronization.
>>>
>>
>> Sorry, I said that wrong. `operator unsigned()` performs a `load()`. The
>> default load mechanism performs a full sync on the variable. So once the
>> other thread is passed its `release` fence, then this thread will be able
>> to see the new value.
>>
>
> I was answering:
> "When I don't enable optimization, this code works fine, whether access to
> the thread-shared variables is atomic or non-atomic.  When I enable
> optimization. the threads get into infinite loops if the access to the
> thread-shared variables is non-atomic.  But atomic access, even with
> relaxed order, causes the code to work fine with optimization."
> This mean when `using Cardinal = unsigned;` and we don't have any load.
>

Hmm. I assumed that "but atomic access" was referring to using `Cardinal`
class, not `using Cardinal = unsigned". But it's possible that he was
referring to the use of `std::atomic_thread_fence`.

If that's the case, then you're right; that particular fence is not a
guarantee that `Cardinal` works.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5db98f8a-68ee-4753-b46a-3409b89862a4%40isocpp.org.

------=_Part_1475_762578859.1487371530887
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Friday, February 17, 2017 at 5:34:30 PM UTC-5, inkwizyt=
....@gmail.com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;ma=
rgin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=
=3D"ltr"><br><br>On Friday, February 17, 2017 at 11:26:48 PM UTC+1, Nicol B=
olas wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:=
0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">On Frid=
ay, February 17, 2017 at 5:14:55 PM UTC-5, Nicol Bolas wrote:<blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr"><div>Actually, it does use synch=
ronization. `operator*` executes a `load()` on the atomic object. That forc=
es synchronization.<br></div></div></blockquote><div><br>Sorry, I said that=
 wrong. `operator unsigned()` performs a `load()`. The default load mechani=
sm performs a full sync on the variable. So once the other thread is passed=
 its `release` fence, then this thread will be able to see the new value.<b=
r></div></div></blockquote><div><br>I was answering:<br>&quot;When I don&#3=
9;t enable optimization, this code works=20
fine, whether access to the thread-shared variables is atomic or=20
non-atomic. =C2=A0When I enable optimization. the threads get into infinite=
=20
loops if the access to the thread-shared variables is non-atomic. =C2=A0But=
=20
atomic access, even with relaxed order, causes the code to work fine=20
with optimization.&quot; <br>This mean when `using Cardinal =3D unsigned;` =
and we don&#39;t have any load.<br></div></div></blockquote><div><br>Hmm. I=
 assumed that &quot;but atomic access&quot; was referring to using `Cardina=
l` class, not `using Cardinal =3D unsigned&quot;. But it&#39;s possible tha=
t he was referring to the use of `std::atomic_thread_fence`.<br><br>If that=
&#39;s the case, then you&#39;re right; that particular fence is not a guar=
antee that `Cardinal` works.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/5db98f8a-68ee-4753-b46a-3409b89862a4%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/5db98f8a-68ee-4753-b46a-3409b89862a4=
%40isocpp.org</a>.<br />

------=_Part_1475_762578859.1487371530887--

------=_Part_1474_903571175.1487371530887--

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Fri, 17 Feb 2017 20:27:57 -0800 (PST)
Raw View
------=_Part_526_957337272.1487392077514
Content-Type: multipart/alternative;
 boundary="----=_Part_527_1539005797.1487392077514"

------=_Part_527_1539005797.1487392077514
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Friday, February 17, 2017 at 5:06:14 PM UTC-5, inkwizyt...@gmail.com=20
wrote:
>
>
>
> On Friday, February 17, 2017 at 8:54:24 PM UTC+1, Walt Karas wrote:
>>
>> On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:
>>>
>>> Leaving the questions about =E2=80=8Eactually changing the standard asi=
de, and=20
>>> focusing on understanding (which may just mean this could be a=20
>>> std-discussion question instead of std-proposal),
>>>
>>> In your model, if X and Y are relaxed atomic operations in thread T1,=
=20
>>> can thread T2 see them as Y before X while thread T3 sees them as =E2=
=80=8EX before=20
>>> Y?
>>>
>>> Sent from my BlackBerry portable Babbage Device
>>> *From: *'Walt Karas' via ISO C++ Standard - Future Proposals
>>> *Sent: *Wednesday, February 15, 2017 2:17 PM
>>> *To: *ISO C++ Standard - Future Proposals
>>> *Reply To: *std-pr...@isocpp.org
>>> *Subject: *[std-proposals] Proposed alternative approach to specifying=
=20
>>> required memory operation ordering
>>>
>>> - In a program execution, each thread defines a nominal order of (threa=
d=20
>>> local) memory and fence operations.
>>> - For any operations X and Y in thread T, either X before Y, or Y befor=
e=20
>>> X.
>>> - A program execution defines a nominal global order of global memory=
=20
>>> operations.
>>> - If X and Y are atomic global operations, then either X before Y or Y=
=20
>>> before X in the global order.
>>> - If X is a global operation, and there exists a global store S where=
=20
>>> neither X before S nor S before X in the global order, then the result =
of X=20
>>> is undefined.
>>> - A program execution defines a partial function f(T, LO) -> GO where L=
O=20
>>> is a local memory operation in thread T, and GO is a global operation. =
 If=20
>>> LO is atomic, then GO must be atomic. The result of LO is the result of=
=20
>>> GO.  If the result of GO is undefined, the result of LO is undefined. =
=20
>>> (Even if f(T, LO) is not required to exist, it none the less _may_ exis=
t.)
>>> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T,=
=20
>>> and f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1)=
 in=20
>>> the global order is not allowed.
>>> - If:
>>> 1.  In a thread T, X and Y are memory operations, and F is a fence=20
>>> operation.
>>> 2.  X before F and F before Y.
>>> 3.  F is sequentially consistent and both X and Y are atomic, or
>>> 4.  F is acquire and both X and Y are loads, or
>>> 5.  F is release and both X and Y are stores.
>>> then F is activated for X and Y.
>>> - In a thread T, if X and Y are memory operations (where X before Y)=20
>>> with an activated fence F, then f(T, X) must exist.
>>> - For every global operation GO, there must exist a local operation LO=
=20
>>> in some thread T where f(T, LO) -> GO.  (Assuming no intense gamma=20
>>> radiation.)
>>> - A sequentially consistent (thread local) atomic memory operation=20
>>> implies two sequentially consistent fences, one before and one after it=
 (as=20
>>> well as a preceding release for a store, and a succeeding acquire for a=
=20
>>> load).
>>>
>>> --=20
>>> You received this message because you are subscribed to the Google=20
>>> Groups "ISO C++ Standard - Future Proposals" group.
>>> To unsubscribe from this group and stop receiving emails from it, send=
=20
>>> an email to std-proposal...@isocpp.org.
>>> To post to this group, send email to std-pr...@isocpp.org.
>>> To view this discussion on the web visit=20
>>> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f=
04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org=20
>>> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-=
f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&utm_source=3Dfo=
oter>
>>> .
>>>
>>>
>> Here is some code that illustrates some points of confusion I have with=
=20
>> the memory model:
>>
>> (...)
>>
>>         while (*Twins_number < equals_mine)
>>           ;
>>
> =20
> I'm not expert or even experienced in atomics but I think that is race=20
> condition because you access this variable without synchronization. You=
=20
> would need probably fence or acquire in `while` otherwise compiler can=20
> amuse that value will not change there.
>

I don't really understand that.  To me, the best way to describe the=20
problem is that *Twins_number is cached in a (thread-specific) register. =
=20
By some mechanism, a cache-invalidate or equivalent has to be performed on=
=20
*Twins_number before reading it.

The other issue is that, using gcc, this works when the read of=20
*Twins_number is atomic with relaxed order.  I think that the Standard=20
allows for this to be just a broken as when using a (actual or nominal)=20
non-atomic read, and it's just a qwerky implementation in gcc.  Maybe using=
=20
Cardinal =3D unsigned would work with optimization enabled if I put an=20
acquire fence in the loop.  But that seems a cheat.  Unless the memory=20
model considers that all the loads of *Twins_number can happen=20
simultaneously, and the acquire fence is the correct way to cause them to=
=20
happen sequentially.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/9f578372-9d47-4f5b-800d-93e08e7d2314%40isocpp.or=
g.

------=_Part_527_1539005797.1487392077514
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Friday, February 17, 2017 at 5:06:14 PM UTC-5, inkwizyt=
....@gmail.com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;ma=
rgin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=
=3D"ltr"><br><br>On Friday, February 17, 2017 at 8:54:24 PM UTC+1, Walt Kar=
as wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.=
8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">On Wednes=
day, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:<blockquote clas=
s=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc =
solid;padding-left:1ex"><div lang=3D"en-US" style=3D"background-color:rgb(2=
55,255,255);line-height:initial">                                          =
                                            <div style=3D"width:100%;font-s=
ize:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;c=
olor:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">L=
eaving the questions about =E2=80=8Eactually changing the standard aside, a=
nd focusing on understanding (which may just mean this could be a std-discu=
ssion question instead of std-proposal),</div><div style=3D"width:100%;font=
-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif=
;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"=
><br></div><div style=3D"width:100%;font-size:initial;font-family:Calibri,&=
#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:in=
itial;background-color:rgb(255,255,255)">In your model, if X and Y are rela=
xed atomic operations in thread T1, can thread T2 see them as Y before X wh=
ile thread T3 sees them as =E2=80=8EX before Y?</div>                      =
                                                                           =
                                    <div style=3D"width:100%;font-size:init=
ial;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb=
(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br style=
=3D"display:initial"></div>                                                =
                                                                           =
                                                                        <di=
v style=3D"font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-s=
erif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rg=
b(255,255,255)">Sent=C2=A0from=C2=A0my=C2=A0BlackBerry=C2=A0<wbr>portable=
=C2=A0Babbage=C2=A0Device</div>                                            =
                                                                           =
                                                           <table width=3D"=
100%" style=3D"background-color:white;border-spacing:0px"> <tbody><tr><td s=
tyle=3D"font-size:initial;text-align:initial;background-color:rgb(255,255,2=
55)" colspan=3D"2">                           <div style=3D"border-style:so=
lid none none;border-top-color:rgb(181,196,223);border-top-width:1pt;paddin=
g:3pt 0in 0in;font-family:Tahoma,&#39;BB Alpha Sans&#39;,&#39;Slate Pro&#39=
;;font-size:10pt">  <div><b>From: </b>&#39;Walt Karas&#39; via ISO C++ Stan=
dard - Future Proposals</div><div><b>Sent: </b>Wednesday, February 15, 2017=
 2:17 PM</div><div><b>To: </b>ISO C++ Standard - Future Proposals</div><div=
><b>Reply To: </b><a rel=3D"nofollow">std-pr...@isocpp.org</a></div><div><b=
>Subject: </b>[std-proposals] Proposed alternative approach to specifying r=
equired memory operation ordering</div></div></td></tr></tbody></table><div=
 style=3D"border-style:solid none none;border-top-color:rgb(186,188,209);bo=
rder-top-width:1pt;font-size:initial;text-align:initial;background-color:rg=
b(255,255,255)"></div><br><div><div dir=3D"ltr"><div>- In a program executi=
on,=C2=A0each thread defines a nominal order of (thread local) memory and f=
ence operations.</div><div>- For any operations X and Y in thread T, either=
 X before Y, or Y before X.</div><div>- A program execution defines a nomin=
al global order of global memory operations.</div><div>- If X and Y are ato=
mic global operations, then either X before Y or Y before X in the global o=
rder.</div><div>- If X is a global operation, and there exists a global sto=
re S where neither X before S nor S before X in the global order, then the =
result of X is undefined.</div><div>- A program execution defines a partial=
 function f(T, LO) -&gt; GO where LO is a local memory operation in thread =
T, and GO is a global operation.=C2=A0 If LO is atomic, then GO must be ato=
mic. The result of LO is the result of GO.=C2=A0 If the result of GO is und=
efined, the result of LO is undefined.=C2=A0 (Even if f(T, LO) is not requi=
red to exist, it none the less _may_ exist.)</div><div>- If LO1 and LO2 are=
 operations in thread T, and LO1 before LO2 in T, and f(T, LO1) and f(T, LO=
2) both exist, then f(T, LO2) before f(T, LO1) in the global order is not a=
llowed.</div><div>- If:</div><div>1.=C2=A0 In a thread T, X and Y are memor=
y operations, and F is a fence operation.</div><div>2.=C2=A0 X before=C2=A0=
F and=C2=A0F before Y.</div><div>3.=C2=A0 F is sequentially consistent and =
both X and Y are atomic, or</div><div>4.=C2=A0 F is acquire and both X and =
Y are loads, or</div><div>5.=C2=A0 F is release and both X and Y are stores=
..</div><div>then F is activated for X and Y.</div><div>-=C2=A0In a thread T=
, if X and Y are memory operations (where X before Y) with an activated fen=
ce F, then f(T, X) must exist.</div><div>- For every global operation GO, t=
here must exist a local operation LO in some thread T where f(T, LO) -&gt; =
GO.=C2=A0 (Assuming no intense gamma radiation.)</div><div>- A sequentially=
 consistent (thread local) atomic memory operation implies two sequentially=
 consistent fences, one before and one after it (as well as a preceding rel=
ease for a store, and a succeeding acquire for a load).</div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a rel=3D"nofollow">std-proposal...@isocpp.org</a>.<br>
To post to this group, send email to <a rel=3D"nofollow">std-pr...@isocpp.o=
rg</a>.<br>
To view this discussion on the web visit <a onmousedown=3D"this.href=3D&#39=
;https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a=
-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium\x3demail\x26utm_source\x3df=
ooter&#39;;return true;" onclick=3D"this.href=3D&#39;https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd=
%40isocpp.org?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;=
" href=3D"https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fd=
abcd-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&amp;utm_so=
urce=3Dfooter" target=3D"_blank" rel=3D"nofollow">https://groups.google.com=
/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/d0fdabcd-f04a-46e6-<wbr>b93a-=
0b9ea85cb5bd%40isocpp.org</a><wbr>.<br>
<br></div></div></blockquote><div><br></div><div>Here is some code that ill=
ustrates some points of confusion I have with the memory model:</div><div><=
br></div><div><font color=3D"#666600">(...)</font><br><div><font color=3D"#=
666600" face=3D"monospace"><br></font></div><div><font color=3D"#666600" fa=
ce=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equa=
ls_mine)</font></div><div><font color=3D"#666600" face=3D"monospace">=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=
=C2=A0<br>I&#39;m not expert or even experienced in atomics but I think tha=
t is race condition because you access this variable without synchronizatio=
n. You would need probably fence or acquire in `while` otherwise compiler c=
an amuse that value will not change there.<br></div></div></blockquote><div=
><br></div><div>I don&#39;t=C2=A0really understand that.=C2=A0 To me, the b=
est way to describe the problem is that *Twins_number is cached in a (threa=
d-specific) register.=C2=A0 By some mechanism, a cache-invalidate or equiva=
lent has=C2=A0to be performed on *Twins_number before reading it.</div><div=
><br></div><div>The other issue is that, using gcc, this works when the rea=
d of *Twins_number is=C2=A0atomic=C2=A0with relaxed order.=C2=A0 I think th=
at the Standard allows for this to be just a broken as when using a=C2=A0(a=
ctual or nominal) non-atomic read, and it&#39;s just a qwerky implementatio=
n in gcc.=C2=A0 Maybe using Cardinal =3D unsigned would work with optimizat=
ion enabled if I put an acquire fence in the loop.=C2=A0 But that seems a c=
heat.=C2=A0 Unless the memory model considers that all the loads of *Twins_=
number can happen simultaneously, and the acquire fence is the correct way =
to cause them to happen sequentially.</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/9f578372-9d47-4f5b-800d-93e08e7d2314%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/9f578372-9d47-4f5b-800d-93e08e7d2314=
%40isocpp.org</a>.<br />

------=_Part_527_1539005797.1487392077514--

------=_Part_526_957337272.1487392077514--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Fri, 17 Feb 2017 21:40:23 -0800 (PST)
Raw View
------=_Part_436_2030082738.1487396423706
Content-Type: multipart/alternative;
 boundary="----=_Part_437_796159868.1487396423706"

------=_Part_437_796159868.1487396423706
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Friday, February 17, 2017 at 11:27:57 PM UTC-5, Walt Karas wrote:
>
> On Friday, February 17, 2017 at 5:06:14 PM UTC-5, inkwizyt...@gmail.com=
=20
> wrote:
>>
>> On Friday, February 17, 2017 at 8:54:24 PM UTC+1, Walt Karas wrote:
>>>
>>> On Wednesday, February 15, 2017 at 7:41:40 PM UTC-5, Tony V E wrote:
>>>>
>>>> Leaving the questions about =E2=80=8Eactually changing the standard as=
ide, and=20
>>>> focusing on understanding (which may just mean this could be a=20
>>>> std-discussion question instead of std-proposal),
>>>>
>>>> In your model, if X and Y are relaxed atomic operations in thread T1,=
=20
>>>> can thread T2 see them as Y before X while thread T3 sees them as =E2=
=80=8EX before=20
>>>> Y?
>>>>
>>>> Sent from my BlackBerry portable Babbage Device
>>>> *From: *'Walt Karas' via ISO C++ Standard - Future Proposals
>>>> *Sent: *Wednesday, February 15, 2017 2:17 PM
>>>> *To: *ISO C++ Standard - Future Proposals
>>>> *Reply To: *std-pr...@isocpp.org
>>>> *Subject: *[std-proposals] Proposed alternative approach to specifying=
=20
>>>> required memory operation ordering
>>>>
>>>> - In a program execution, each thread defines a nominal order of=20
>>>> (thread local) memory and fence operations.
>>>> - For any operations X and Y in thread T, either X before Y, or Y=20
>>>> before X.
>>>> - A program execution defines a nominal global order of global memory=
=20
>>>> operations.
>>>> - If X and Y are atomic global operations, then either X before Y or Y=
=20
>>>> before X in the global order.
>>>> - If X is a global operation, and there exists a global store S where=
=20
>>>> neither X before S nor S before X in the global order, then the result=
 of X=20
>>>> is undefined.
>>>> - A program execution defines a partial function f(T, LO) -> GO where=
=20
>>>> LO is a local memory operation in thread T, and GO is a global operati=
on. =20
>>>> If LO is atomic, then GO must be atomic. The result of LO is the resul=
t of=20
>>>> GO.  If the result of GO is undefined, the result of LO is undefined. =
=20
>>>> (Even if f(T, LO) is not required to exist, it none the less _may_ exi=
st.)
>>>> - If LO1 and LO2 are operations in thread T, and LO1 before LO2 in T,=
=20
>>>> and f(T, LO1) and f(T, LO2) both exist, then f(T, LO2) before f(T, LO1=
) in=20
>>>> the global order is not allowed.
>>>> - If:
>>>> 1.  In a thread T, X and Y are memory operations, and F is a fence=20
>>>> operation.
>>>> 2.  X before F and F before Y.
>>>> 3.  F is sequentially consistent and both X and Y are atomic, or
>>>> 4.  F is acquire and both X and Y are loads, or
>>>> 5.  F is release and both X and Y are stores.
>>>> then F is activated for X and Y.
>>>> - In a thread T, if X and Y are memory operations (where X before Y)=
=20
>>>> with an activated fence F, then f(T, X) must exist.
>>>> - For every global operation GO, there must exist a local operation LO=
=20
>>>> in some thread T where f(T, LO) -> GO.  (Assuming no intense gamma=20
>>>> radiation.)
>>>> - A sequentially consistent (thread local) atomic memory operation=20
>>>> implies two sequentially consistent fences, one before and one after i=
t (as=20
>>>> well as a preceding release for a store, and a succeeding acquire for =
a=20
>>>> load).
>>>>
>>>> --=20
>>>> You received this message because you are subscribed to the Google=20
>>>> Groups "ISO C++ Standard - Future Proposals" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send=
=20
>>>> an email to std-proposal...@isocpp.org.
>>>> To post to this group, send email to std-pr...@isocpp.org.
>>>> To view this discussion on the web visit=20
>>>> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-=
f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org=20
>>>> <https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d0fdabcd=
-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium=3Demail&utm_source=3Df=
ooter>
>>>> .
>>>>
>>>>
>>> Here is some code that illustrates some points of confusion I have with=
=20
>>> the memory model:
>>>
>>> (...)
>>>
>>>         while (*Twins_number < equals_mine)
>>>           ;
>>>
>> =20
>> I'm not expert or even experienced in atomics but I think that is race=
=20
>> condition because you access this variable without synchronization. You=
=20
>> would need probably fence or acquire in `while` otherwise compiler can=
=20
>> amuse that value will not change there.
>>
>
> I don't really understand that.  To me, the best way to describe the=20
> problem is that *Twins_number is cached in a (thread-specific) register. =
=20
> By some mechanism, a cache-invalidate or equivalent has to be performed o=
n=20
> *Twins_number before reading it.
>

That's because you're thinking in terms of how it works on the actual=20
hardware, not in terms of the abstract C++ memory model. The *whole point*=
=20
of an abstract memory model is so that the C++ coder doesn't have to know=
=20
or care about which "actual hardware" their code runs on. If you follow the=
=20
rules of the memory model, the compiler for the "actual hardware" will=20
generate whatever code is needed to make your stuff work.
=20

> The other issue is that, using gcc, this works when the read of=20
> *Twins_number is atomic with relaxed order.  I think that the Standard=20
> allows for this to be just a broken as when using a (actual or nominal)=
=20
> non-atomic read, and it's just a qwerky implementation in gcc.
>

Actually... no, it isn't.

I was mistaken earlier about something. The memory orders=20
<http://en.cppreference.com/w/cpp/atomic/memory_order> are not for ensuring=
=20
visibility of atomic variables across threads. Atomic variables are=20
*always* visible across threads. Their memory orders are for ensuring=20
ordering and visibility for *non-atomic* stuff which is done around some=20
atomic operation.

For example, let's say you write some data on thread A, then set an atomic=
=20
boolean to true. You have a thread B that wants to read that data. It can=
=20
spin-lock on that boolean and read the data once the boolean is true.

However, the standard will *only* guarantee this if you use proper ordering=
=20
constraints. Thread A *must* have used a `store` operation on that boolean=
=20
with `release` ordering or stronger. And thread B *must* have used a `load`=
=20
with `acquire` ordering or stronger in its spin-lock. Doing anything less=
=20
yields a data race and therefore UB.

But if all you're using an atomic for is the value itself (note: this does=
=20
not apply to the value of the object it points to, for atomic pointers),=20
then `relaxed` memory ordering is just fine. So long as you don't care=20
about the ordering of independent atomic operations on different variables.
=20

> Maybe using Cardinal =3D unsigned would work with optimization enabled if=
 I=20
> put an acquire fence in the loop.  But that seems a cheat.
>

Why is that a cheat? You need to tell the compiler that operations from=20
another thread may affect the execution with this one. That's *exactly*=20
what a fence does.

And since you don't know exactly when the writing thread executes its=20
`release` fence, you must execute your `acquire` fence in the loop.

Unless the memory model considers that all the loads of *Twins_number can=
=20
> happen simultaneously, and the acquire fence is the correct way to cause=
=20
> them to happen sequentially.
>

Remember: a data race is undefined behavior in C++. Therefore, if I have a=
=20
`while(*some_ptr);` infinite loop, the compiler is *completely free* to=20
convert this into `if(*some_ptr) while(true);`. Why?

Because a data race is undefined behavior. There is nothing in your=20
infinite loop that would ensure visibility between this thread and another=
=20
thread. And therefore, if some other thread modified the memory at=20
`some_ptr`, your thread reading it without proper synchronization would be=
=20
a data race and therefore UB.

So, if some other thread modifies the memory of `*some_ptr`, then you get=
=20
UB. And therefore, the compiler is free to assume that your code actually=
=20
works, and therefore no other thread modifies that pointer. So if no other=
=20
thread modifies that pointer's value, then `if(*some_ptr) while(true);` is=
=20
a perfectly justifiable interpretation of your code.

And if the compiler can prove that the condition will *never* be false=20
(since the only way it could be false would be for another thread to modify=
=20
the memory, which would be UB), then it can just reduce the code to a pure=
=20
infinite loop.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/2eab187f-e1cd-4c7d-ad30-ac7981963749%40isocpp.or=
g.

------=_Part_437_796159868.1487396423706
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Friday, February 17, 2017 at 11:27:57 PM UTC-5, Walt Ka=
ras wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left:=
 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">On =
Friday, February 17, 2017 at 5:06:14 PM UTC-5, <a>inkwizyt...@gmail.com</a>=
 wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8e=
x;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">On Friday, =
February 17, 2017 at 8:54:24 PM UTC+1, Walt Karas wrote:<blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr">On Wednesday, February 15, 2017 at =
7:41:40 PM UTC-5, Tony V E wrote:<blockquote class=3D"gmail_quote" style=3D=
"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><d=
iv style=3D"background-color:rgb(255,255,255);line-height:initial" lang=3D"=
en-US">                                                                    =
                  <div style=3D"width:100%;font-size:initial;font-family:Ca=
libri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-a=
lign:initial;background-color:rgb(255,255,255)">Leaving the questions about=
 =E2=80=8Eactually changing the standard aside, and focusing on understandi=
ng (which may just mean this could be a std-discussion question instead of =
std-proposal),</div><div style=3D"width:100%;font-size:initial;font-family:=
Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text=
-align:initial;background-color:rgb(255,255,255)"><br></div><div style=3D"w=
idth:100%;font-size:initial;font-family:Calibri,&#39;Slate Pro&#39;,sans-se=
rif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb=
(255,255,255)">In your model, if X and Y are relaxed atomic operations in t=
hread T1, can thread T2 see them as Y before X while thread T3 sees them as=
 =E2=80=8EX before Y?</div>                                                =
                                                                           =
          <div style=3D"width:100%;font-size:initial;font-family:Calibri,&#=
39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(31,73,125);text-align:ini=
tial;background-color:rgb(255,255,255)"><br style=3D"display:initial"></div=
>                                                                          =
                                                                           =
                                              <div style=3D"font-size:initi=
al;font-family:Calibri,&#39;Slate Pro&#39;,sans-serif,sans-serif;color:rgb(=
31,73,125);text-align:initial;background-color:rgb(255,255,255)">Sent=C2=A0=
from=C2=A0my=C2=A0BlackBerry=C2=A0<wbr>portable=C2=A0Babbage=C2=A0Device</d=
iv>                                                                        =
                                                                           =
                               <table style=3D"background-color:white;borde=
r-spacing:0px" width=3D"100%"> <tbody><tr><td style=3D"font-size:initial;te=
xt-align:initial;background-color:rgb(255,255,255)" colspan=3D"2">         =
                  <div style=3D"border-style:solid none none;border-top-col=
or:rgb(181,196,223);border-top-width:1pt;padding:3pt 0in 0in;font-family:Ta=
homa,&#39;BB Alpha Sans&#39;,&#39;Slate Pro&#39;;font-size:10pt">  <div><b>=
From: </b>&#39;Walt Karas&#39; via ISO C++ Standard - Future Proposals</div=
><div><b>Sent: </b>Wednesday, February 15, 2017 2:17 PM</div><div><b>To: </=
b>ISO C++ Standard - Future Proposals</div><div><b>Reply To: </b><a rel=3D"=
nofollow">std-pr...@isocpp.org</a></div><div><b>Subject: </b>[std-proposals=
] Proposed alternative approach to specifying required memory operation ord=
ering</div></div></td></tr></tbody></table><div style=3D"border-style:solid=
 none none;border-top-color:rgb(186,188,209);border-top-width:1pt;font-size=
:initial;text-align:initial;background-color:rgb(255,255,255)"></div><br><d=
iv><div dir=3D"ltr"><div>- In a program execution,=C2=A0each thread defines=
 a nominal order of (thread local) memory and fence operations.</div><div>-=
 For any operations X and Y in thread T, either X before Y, or Y before X.<=
/div><div>- A program execution defines a nominal global order of global me=
mory operations.</div><div>- If X and Y are atomic global operations, then =
either X before Y or Y before X in the global order.</div><div>- If X is a =
global operation, and there exists a global store S where neither X before =
S nor S before X in the global order, then the result of X is undefined.</d=
iv><div>- A program execution defines a partial function f(T, LO) -&gt; GO =
where LO is a local memory operation in thread T, and GO is a global operat=
ion.=C2=A0 If LO is atomic, then GO must be atomic. The result of LO is the=
 result of GO.=C2=A0 If the result of GO is undefined, the result of LO is =
undefined.=C2=A0 (Even if f(T, LO) is not required to exist, it none the le=
ss _may_ exist.)</div><div>- If LO1 and LO2 are operations in thread T, and=
 LO1 before LO2 in T, and f(T, LO1) and f(T, LO2) both exist, then f(T, LO2=
) before f(T, LO1) in the global order is not allowed.</div><div>- If:</div=
><div>1.=C2=A0 In a thread T, X and Y are memory operations, and F is a fen=
ce operation.</div><div>2.=C2=A0 X before=C2=A0F and=C2=A0F before Y.</div>=
<div>3.=C2=A0 F is sequentially consistent and both X and Y are atomic, or<=
/div><div>4.=C2=A0 F is acquire and both X and Y are loads, or</div><div>5.=
=C2=A0 F is release and both X and Y are stores.</div><div>then F is activa=
ted for X and Y.</div><div>-=C2=A0In a thread T, if X and Y are memory oper=
ations (where X before Y) with an activated fence F, then f(T, X) must exis=
t.</div><div>- For every global operation GO, there must exist a local oper=
ation LO in some thread T where f(T, LO) -&gt; GO.=C2=A0 (Assuming no inten=
se gamma radiation.)</div><div>- A sequentially consistent (thread local) a=
tomic memory operation implies two sequentially consistent fences, one befo=
re and one after it (as well as a preceding release for a store, and a succ=
eeding acquire for a load).</div></div>

<p></p>

-- <br>
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br>
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a rel=3D"nofollow">std-proposal...@isocpp.org</a>.<br>
To post to this group, send email to <a rel=3D"nofollow">std-pr...@isocpp.o=
rg</a>.<br>
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%=
40isocpp.org?utm_medium=3Demail&amp;utm_source=3Dfooter" rel=3D"nofollow" t=
arget=3D"_blank" onmousedown=3D"this.href=3D&#39;https://groups.google.com/=
a/isocpp.org/d/msgid/std-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40i=
socpp.org?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" on=
click=3D"this.href=3D&#39;https://groups.google.com/a/isocpp.org/d/msgid/st=
d-proposals/d0fdabcd-f04a-46e6-b93a-0b9ea85cb5bd%40isocpp.org?utm_medium\x3=
demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com=
/a/<wbr>isocpp.org/d/msgid/std-<wbr>proposals/d0fdabcd-f04a-46e6-<wbr>b93a-=
0b9ea85cb5bd%40isocpp.org</a><wbr>.<br>
<br></div></div></blockquote><div><br></div><div>Here is some code that ill=
ustrates some points of confusion I have with the memory model:</div><div><=
br></div><div><font color=3D"#666600">(...)</font><br><div><font face=3D"mo=
nospace" color=3D"#666600"><br></font></div><div><font face=3D"monospace" c=
olor=3D"#666600">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equa=
ls_mine)</font></div><div><font face=3D"monospace" color=3D"#666600">=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=
=C2=A0<br>I&#39;m not expert or even experienced in atomics but I think tha=
t is race condition because you access this variable without synchronizatio=
n. You would need probably fence or acquire in `while` otherwise compiler c=
an amuse that value will not change there.<br></div></div></blockquote><div=
><br></div><div>I don&#39;t=C2=A0really understand that.=C2=A0 To me, the b=
est way to describe the problem is that *Twins_number is cached in a (threa=
d-specific) register.=C2=A0 By some mechanism, a cache-invalidate or equiva=
lent has=C2=A0to be performed on *Twins_number before reading it.</div></di=
v></blockquote><div><br>That&#39;s because you&#39;re thinking in terms of =
how it works on the actual hardware, not in terms of the abstract C++ memor=
y model. The <i>whole point</i> of an abstract memory model is so that the =
C++ coder doesn&#39;t have to know or care about which &quot;actual hardwar=
e&quot; their code runs on. If you follow the rules of the memory model, th=
e compiler for the &quot;actual hardware&quot; will generate whatever code =
is needed to make your stuff work.<br>=C2=A0</div><blockquote class=3D"gmai=
l_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;=
padding-left: 1ex;"><div dir=3D"ltr"><div></div><div>The other issue is tha=
t, using gcc, this works when the read of *Twins_number is=C2=A0atomic=C2=
=A0with relaxed order.=C2=A0 I think that the Standard allows for this to b=
e just a broken as when using a=C2=A0(actual or nominal) non-atomic read, a=
nd it&#39;s just a qwerky implementation in gcc.</div></div></blockquote><d=
iv><br>Actually... no, it isn&#39;t.<br><br>I was mistaken earlier about so=
mething. The <a href=3D"http://en.cppreference.com/w/cpp/atomic/memory_orde=
r">memory orders</a> are not for ensuring visibility of atomic variables ac=
ross threads. Atomic variables are *always* visible across threads. Their m=
emory orders are for ensuring ordering and visibility for <i>non-atomic</i>=
 stuff which is done around some atomic operation.<br><br>For example, let&=
#39;s say you write some data on thread A, then set an atomic boolean to tr=
ue. You have a thread B that wants to read that data. It can spin-lock on t=
hat boolean and read the data once the boolean is true.<br><br>However, the=
 standard will <i>only</i> guarantee this if you use proper ordering constr=
aints. Thread A <i>must</i> have used a `store` operation on that boolean w=
ith `release` ordering or stronger. And thread B <i>must</i> have used a `l=
oad` with `acquire` ordering or stronger in its spin-lock. Doing anything l=
ess yields a data race and therefore UB.<br><br>But if all you&#39;re using=
 an atomic for is the value itself (note: this does not apply to the value =
of the object it points to, for atomic pointers), then `relaxed` memory ord=
ering is just fine. So long as you don&#39;t care about the ordering of ind=
ependent atomic operations on different variables.<br>=C2=A0</div><blockquo=
te class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left:=
 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><div>Maybe using Cardi=
nal =3D unsigned would work with optimization enabled if I put an acquire f=
ence in the loop.=C2=A0 But that seems a cheat.</div></div></blockquote><di=
v><br>Why is that a cheat? You need to tell the compiler that operations fr=
om another thread may affect the execution with this one. That&#39;s <i>exa=
ctly</i> what a fence does.<br><br>And since you don&#39;t know exactly whe=
n the writing thread executes its `release` fence, you must execute your `a=
cquire` fence in the loop.<br><br></div><blockquote class=3D"gmail_quote" s=
tyle=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-le=
ft: 1ex;"><div dir=3D"ltr"><div>Unless the memory model considers that all =
the loads of *Twins_number can happen simultaneously, and the acquire fence=
 is the correct way to cause them to happen sequentially.</div></div></bloc=
kquote><div><br>Remember: a data race is undefined behavior in C++. Therefo=
re, if I have a `while(*some_ptr);` infinite loop, the compiler is <i>compl=
etely free</i> to convert this into `if(*some_ptr) while(true);`. Why?<br><=
br>Because a data race is undefined behavior. There is nothing in your infi=
nite loop that would ensure visibility between this thread and another thre=
ad. And therefore, if some other thread modified the memory at `some_ptr`, =
your thread reading it without proper synchronization would be a data race =
and therefore UB.<br><br>So, if some other thread modifies the memory of `*=
some_ptr`, then you get UB. And therefore, the compiler is free to assume t=
hat your code actually works, and therefore no other thread modifies that p=
ointer. So if no other thread modifies that pointer&#39;s value, then `if(*=
some_ptr) while(true);` is a perfectly justifiable interpretation of your c=
ode.<br><br>And if the compiler can prove that the condition will <i>never<=
/i> be false (since the only way it could be false would be for another thr=
ead to modify the memory, which would be UB), then it can just reduce the c=
ode to a pure infinite loop.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/2eab187f-e1cd-4c7d-ad30-ac7981963749%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/2eab187f-e1cd-4c7d-ad30-ac7981963749=
%40isocpp.org</a>.<br />

------=_Part_437_796159868.1487396423706--

------=_Part_436_2030082738.1487396423706--

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Sat, 18 Feb 2017 12:32:44 -0800 (PST)
Raw View
------=_Part_624_1596492728.1487449964943
Content-Type: multipart/alternative;
 boundary="----=_Part_625_621961316.1487449964943"

------=_Part_625_621961316.1487449964943
Content-Type: text/plain; charset=UTF-8



>
>>>> Here is some code that illustrates some points of confusion I have with
>>>> the memory model:
>>>>
>>>> (...)
>>>>
>>>>         while (*Twins_number < equals_mine)
>>>>           ;
>>>>
>>>
>>> I'm not expert or even experienced in atomics but I think that is race
>>> condition because you access this variable without synchronization. You
>>> would need probably fence or acquire in `while` otherwise compiler can
>>> amuse that value will not change there.
>>>
>>
....

>
>>
> Maybe using Cardinal = unsigned would work with optimization enabled if I
>> put an acquire fence in the loop.  But that seems a cheat.
>>
>
> Why is that a cheat? You need to tell the compiler that operations from
> another thread may affect the execution with this one. That's *exactly*
> what a fence does.
>
>
Consider this sequence:

spam, spam, spam, spam, spam, spam, spam

If I claimed that it was important to maintain the order of this sequence,
I would be rightly considered an idiot.  Likewise, it makes no sense to put
a fence in the while loop to maintain the order of *identical* memory
loads.  It would be some side effect of the fence that would be of any
benefit.  That is the sense in which I think it would be a cheat.

Maybe it's correct that atomic operations are about more than atomicity,
and fences are about more than creating rules about the partial ordering of
actual memory operations (on a single nominal memory for all threads).  But
if so, I don't think the Standard memory model is very clear about this.

....

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/d9325d7f-e39a-4f02-bd4a-7eadb5fed3b7%40isocpp.org.

------=_Part_625_621961316.1487449964943
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margi=
n-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"l=
tr"><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote=
" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:=
0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=
=3D"background-color:rgb(255,255,255);line-height:initial" lang=3D"en-US"><=
div></div></div></blockquote><div><br></div><div>Here is some code that ill=
ustrates some points of confusion I have with the memory model:</div><div><=
br></div><div><font color=3D"#666600">(...)</font><br><div><font face=3D"mo=
nospace" color=3D"#666600"><br></font></div><div><font face=3D"monospace" c=
olor=3D"#666600">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equa=
ls_mine)</font></div><div><font face=3D"monospace" color=3D"#666600">=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=
=C2=A0<br>I&#39;m not expert or even experienced in atomics but I think tha=
t is race condition because you access this variable without synchronizatio=
n. You would need probably fence or acquire in `while` otherwise compiler c=
an amuse that value will not change there.<br></div></div></blockquote><div=
></div></div></blockquote></div></blockquote><div><br></div><div>...=C2=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8e=
x;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1=
px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>=C2=A0<br></div></div=
></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-le=
ft:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div=
>Maybe using Cardinal =3D unsigned would work with optimization enabled if =
I put an acquire fence in the loop.=C2=A0 But that seems a cheat.</div></di=
v></blockquote><div><br>Why is that a cheat? You need to tell the compiler =
that operations from another thread may affect the execution with this one.=
 That&#39;s <i>exactly</i> what a fence does.<br><br></div></div></blockquo=
te><div>=C2=A0</div><div>Consider this sequence:</div><div><br></div><div>s=
pam, spam, spam, spam, spam, spam, spam</div><div><br></div><div>If I claim=
ed that it was important to maintain the order of this sequence, I would be=
 rightly considered an idiot. =C2=A0Likewise, it makes no sense to put a fe=
nce in the while loop to maintain the order of <i>identical</i>=C2=A0memory=
 loads. =C2=A0It would be some side effect of the fence that would be of an=
y benefit. =C2=A0That is the sense in which I think it would be a cheat.</d=
iv><div><br></div><div>Maybe it&#39;s correct that atomic operations are ab=
out more than atomicity, and fences are about more than creating rules abou=
t the partial ordering of actual memory operations (on a single nominal mem=
ory for all threads). =C2=A0But if so, I don&#39;t think the Standard memor=
y model is very clear about this.</div><div><br></div><div>...</div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/d9325d7f-e39a-4f02-bd4a-7eadb5fed3b7%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/d9325d7f-e39a-4f02-bd4a-7eadb5fed3b7=
%40isocpp.org</a>.<br />

------=_Part_625_621961316.1487449964943--

------=_Part_624_1596492728.1487449964943--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Sat, 18 Feb 2017 15:17:09 -0800 (PST)
Raw View
------=_Part_3017_1178098549.1487459829364
Content-Type: multipart/alternative;
 boundary="----=_Part_3018_370307726.1487459829364"

------=_Part_3018_370307726.1487459829364
Content-Type: text/plain; charset=UTF-8

On Saturday, February 18, 2017 at 3:32:45 PM UTC-5, Walt Karas wrote:
>
>
>
>>
>>>>> Here is some code that illustrates some points of confusion I have
>>>>> with the memory model:
>>>>>
>>>>> (...)
>>>>>
>>>>>         while (*Twins_number < equals_mine)
>>>>>           ;
>>>>>
>>>>
>>>> I'm not expert or even experienced in atomics but I think that is race
>>>> condition because you access this variable without synchronization. You
>>>> would need probably fence or acquire in `while` otherwise compiler can
>>>> amuse that value will not change there.
>>>>
>>>
> ...
>
>>
>>>
>> Maybe using Cardinal = unsigned would work with optimization enabled if I
>>> put an acquire fence in the loop.  But that seems a cheat.
>>>
>>
>> Why is that a cheat? You need to tell the compiler that operations from
>> another thread may affect the execution with this one. That's *exactly*
>> what a fence does.
>>
>>
> Consider this sequence:
>
> spam, spam, spam, spam, spam, spam, spam
>
> If I claimed that it was important to maintain the order of this sequence,
> I would be rightly considered an idiot.  Likewise, it makes no sense to put
> a fence in the while loop to maintain the order of *identical* memory
> loads.
>

For two reasons. First, they're *not identical*. Or more specifically, you
don't want them to be identical. Identical reads would result in identical
values. You explicitly want them to result in potentially different values.
Therefore, you need non-identical reads.

Second, memory ordering is not about the ordering of your reads relative to
one another. It's about the ordering of your reads relative to *someone
else*. The fences permit executions in one thread to be ordered before or
after executions in another. Without those fences, there is no ordering
between your read and their write.

And without ordering, you have a data race. And therefore undefined
behavior.

It would be some side effect of the fence that would be of any benefit.
> That is the sense in which I think it would be a cheat.
>

No, the primary effect of the fence is that it allows you to read data
written in other threads. That's its purpose. Using something for its
intended purpose is not cheating.


> Maybe it's correct that atomic operations are about more than atomicity,
> and fences are about more than creating rules about the partial ordering of
> actual memory operations (on a single nominal memory for all threads).
>

But creating ordering for memory operations is exactly what you want. If
you cannot guarantee ordering then you cannot guarantee visibility.

But if so, I don't think the Standard memory model is very clear about this.
>

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/0677d75b-3136-4dbb-948d-9bfc8b0199e7%40isocpp.org.

------=_Part_3018_370307726.1487459829364
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Saturday, February 18, 2017 at 3:32:45 PM UTC-5, Walt K=
aras wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left=
: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div>=C2=A0</div><b=
lockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D=
"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc soli=
d;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex=
"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0;marg=
in-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"=
><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div style=3D"background-color:rgb=
(255,255,255);line-height:initial" lang=3D"en-US"><div></div></div></blockq=
uote><div><br></div><div>Here is some code that illustrates some points of =
confusion I have with the memory model:</div><div><br></div><div><font colo=
r=3D"#666600">(...)</font><br><div><font face=3D"monospace" color=3D"#66660=
0"><br></font></div><div><font face=3D"monospace" color=3D"#666600">=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equals_mine)</font></div><di=
v><font face=3D"monospace" color=3D"#666600">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 ;</font></div></div></div></blockquote><div>=C2=A0<br>I&#39;m not ex=
pert or even experienced in atomics but I think that is race condition beca=
use you access this variable without synchronization. You would need probab=
ly fence or acquire in `while` otherwise compiler can amuse that value will=
 not change there.<br></div></div></blockquote><div></div></div></blockquot=
e></div></blockquote><div><br></div><div>...=C2=A0</div><blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:=
1ex"><div dir=3D"ltr"><div>=C2=A0<br></div></div></blockquote><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #c=
cc solid;padding-left:1ex"><div dir=3D"ltr"><div>Maybe using Cardinal =3D u=
nsigned would work with optimization enabled if I put an acquire fence in t=
he loop.=C2=A0 But that seems a cheat.</div></div></blockquote><div><br>Why=
 is that a cheat? You need to tell the compiler that operations from anothe=
r thread may affect the execution with this one. That&#39;s <i>exactly</i> =
what a fence does.<br><br></div></div></blockquote><div>=C2=A0</div><div>Co=
nsider this sequence:</div><div><br></div><div>spam, spam, spam, spam, spam=
, spam, spam</div><div><br></div><div>If I claimed that it was important to=
 maintain the order of this sequence, I would be rightly considered an idio=
t. =C2=A0Likewise, it makes no sense to put a fence in the while loop to ma=
intain the order of <i>identical</i>=C2=A0memory loads.</div></blockquote><=
div><br>For two reasons. First, they&#39;re <i>not identical</i>. Or more s=
pecifically, you don&#39;t want them to be identical. Identical reads would=
 result in identical values. You explicitly want them to result in potentia=
lly different values. Therefore, you need non-identical reads.<br><br>Secon=
d, memory ordering is not about the ordering of your reads relative to one =
another. It&#39;s about the ordering of your reads relative to <i>someone e=
lse</i>. The fences permit executions in one thread to be ordered before or=
 after executions in another. Without those fences, there is no ordering be=
tween your read and their write.<br><br>And without ordering, you have a da=
ta race. And therefore undefined behavior.<br><br></div><blockquote class=
=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #cc=
c solid;padding-left: 1ex;"><div> It would be some side effect of the fence=
 that would be of any benefit. That is the sense in which I think it would =
be a cheat.</div></blockquote><div><br>No, the primary effect of the fence =
is that it allows you to read data written in other threads. That&#39;s its=
 purpose. Using something for its intended purpose is not cheating.<br>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: =
0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div></div><div>Maybe=
 it&#39;s correct that atomic operations are about more than atomicity, and=
 fences are about more than creating rules about the partial ordering of ac=
tual memory operations (on a single nominal memory for all threads).</div><=
/blockquote><div><br>But creating ordering for memory operations is exactly=
 what you want. If you cannot guarantee ordering then you cannot guarantee =
visibility.<br><br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div>=
But if so, I don&#39;t think the Standard memory model is very clear about =
this.</div></blockquote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/0677d75b-3136-4dbb-948d-9bfc8b0199e7%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/0677d75b-3136-4dbb-948d-9bfc8b0199e7=
%40isocpp.org</a>.<br />

------=_Part_3018_370307726.1487459829364--

------=_Part_3017_1178098549.1487459829364--

.


Author: inkwizytoryankes@gmail.com
Date: Sat, 18 Feb 2017 16:44:57 -0800 (PST)
Raw View
------=_Part_629_955441759.1487465097670
Content-Type: multipart/alternative;
 boundary="----=_Part_630_1689355805.1487465097671"

------=_Part_630_1689355805.1487465097671
Content-Type: text/plain; charset=UTF-8



On Saturday, February 18, 2017 at 9:32:45 PM UTC+1, Walt Karas wrote:
>
>
>
>>
>>>>> Here is some code that illustrates some points of confusion I have
>>>>> with the memory model:
>>>>>
>>>>> (...)
>>>>>
>>>>>         while (*Twins_number < equals_mine)
>>>>>           ;
>>>>>
>>>>
>>>> I'm not expert or even experienced in atomics but I think that is race
>>>> condition because you access this variable without synchronization. You
>>>> would need probably fence or acquire in `while` otherwise compiler can
>>>> amuse that value will not change there.
>>>>
>>>
> ...
>
>>
>>>
>> Maybe using Cardinal = unsigned would work with optimization enabled if I
>>> put an acquire fence in the loop.  But that seems a cheat.
>>>
>>
>> Why is that a cheat? You need to tell the compiler that operations from
>> another thread may affect the execution with this one. That's *exactly*
>> what a fence does.
>>
>>
> Consider this sequence:
>
> spam, spam, spam, spam, spam, spam, spam
>
> If I claimed that it was important to maintain the order of this sequence,
> I would be rightly considered an idiot.  Likewise, it makes no sense to put
> a fence in the while loop to maintain the order of *identical* memory
> loads.  It would be some side effect of the fence that would be of any
> benefit.  That is the sense in which I think it would be a cheat.
>
> Maybe it's correct that atomic operations are about more than atomicity,
> and fences are about more than creating rules about the partial ordering of
> actual memory operations (on a single nominal memory for all threads).  But
> if so, I don't think the Standard memory model is very clear about this.
>
> ...
>

Consider this trivial program:


int i = 0;
int main()
{
    i = 1;
    while (i)
        ;
}

What behavior you expect from this code? GCC will made infinite loop there.
Why? because this is single thread program and nobody can change value of
`i` after assignment.
Your loop is exactly the same, without atomics compiler assume your code is
locally "single thread" and nobody can change this values outside this
threads.
Remember atomic operation are very costly, if every line use atomic
implicitly then you would pay this cost even if you not need or even didn't
want.
Consider fact that some people do not want use `shared_ptr` because it use
internally atomic operations and this cost them too much.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/ebbc4b8e-070b-45b8-ac52-06c8db064909%40isocpp.org.

------=_Part_630_1689355805.1487465097671
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Saturday, February 18, 2017 at 9:32:45 PM UTC+1=
, Walt Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div>=C2=A0=
</div><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex=
;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote =
class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #=
ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quo=
te" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-=
left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margi=
n:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.=
8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"background-c=
olor:rgb(255,255,255);line-height:initial" lang=3D"en-US"><div></div></div>=
</blockquote><div><br></div><div>Here is some code that illustrates some po=
ints of confusion I have with the memory model:</div><div><br></div><div><f=
ont color=3D"#666600">(...)</font><br><div><font face=3D"monospace" color=
=3D"#666600"><br></font></div><div><font face=3D"monospace" color=3D"#66660=
0">=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equals_mine)</font=
></div><div><font face=3D"monospace" color=3D"#666600">=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=C2=A0<br>I&#39;=
m not expert or even experienced in atomics but I think that is race condit=
ion because you access this variable without synchronization. You would nee=
d probably fence or acquire in `while` otherwise compiler can amuse that va=
lue will not change there.<br></div></div></blockquote><div></div></div></b=
lockquote></div></blockquote><div><br></div><div>...=C2=A0</div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding=
-left:1ex"><div dir=3D"ltr"><div>=C2=A0<br></div></div></blockquote><blockq=
uote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:=
1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Maybe using Cardinal=
 =3D unsigned would work with optimization enabled if I put an acquire fenc=
e in the loop.=C2=A0 But that seems a cheat.</div></div></blockquote><div><=
br>Why is that a cheat? You need to tell the compiler that operations from =
another thread may affect the execution with this one. That&#39;s <i>exactl=
y</i> what a fence does.<br><br></div></div></blockquote><div>=C2=A0</div><=
div>Consider this sequence:</div><div><br></div><div>spam, spam, spam, spam=
, spam, spam, spam</div><div><br></div><div>If I claimed that it was import=
ant to maintain the order of this sequence, I would be rightly considered a=
n idiot. =C2=A0Likewise, it makes no sense to put a fence in the while loop=
 to maintain the order of <i>identical</i>=C2=A0memory loads. =C2=A0It woul=
d be some side effect of the fence that would be of any benefit. =C2=A0That=
 is the sense in which I think it would be a cheat.</div><div><br></div><di=
v>Maybe it&#39;s correct that atomic operations are about more than atomici=
ty, and fences are about more than creating rules about the partial orderin=
g of actual memory operations (on a single nominal memory for all threads).=
 =C2=A0But if so, I don&#39;t think the Standard memory model is very clear=
 about this.</div><div><br></div><div>...</div></blockquote><div><br>Consid=
er this trivial program:<br><br>=C2=A0<div style=3D"background-color: rgb(2=
50, 250, 250); border-color: rgb(187, 187, 187); border-style: solid; borde=
r-width: 1px; overflow-wrap: break-word;" class=3D"prettyprint"><code class=
=3D"prettyprint"><div class=3D"subprettyprint"><span style=3D"color: #008;"=
 class=3D"styled-by-prettify">int</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> i </span><span style=3D"color: #660;" class=3D"sty=
led-by-prettify">=3D</span><span style=3D"color: #000;" class=3D"styled-by-=
prettify"> </span><span style=3D"color: #066;" class=3D"styled-by-prettify"=
>0</span><span style=3D"color: #660;" class=3D"styled-by-prettify">;</span>=
<span style=3D"color: #000;" class=3D"styled-by-prettify"><br></span><span =
style=3D"color: #008;" class=3D"styled-by-prettify">int</span><span style=
=3D"color: #000;" class=3D"styled-by-prettify"> main</span><span style=3D"c=
olor: #660;" class=3D"styled-by-prettify">()</span><span style=3D"color: #0=
00;" class=3D"styled-by-prettify"><br></span><span style=3D"color: #660;" c=
lass=3D"styled-by-prettify">{</span><span style=3D"color: #000;" class=3D"s=
tyled-by-prettify"><br>=C2=A0 =C2=A0 i </span><span style=3D"color: #660;" =
class=3D"styled-by-prettify">=3D</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> </span><span style=3D"color: #066;" class=3D"style=
d-by-prettify">1</span><span style=3D"color: #660;" class=3D"styled-by-pret=
tify">;</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br=
>=C2=A0 =C2=A0 </span><span style=3D"color: #008;" class=3D"styled-by-prett=
ify">while</span><span style=3D"color: #000;" class=3D"styled-by-prettify">=
 </span><span style=3D"color: #660;" class=3D"styled-by-prettify">(</span><=
span style=3D"color: #000;" class=3D"styled-by-prettify">i</span><span styl=
e=3D"color: #660;" class=3D"styled-by-prettify">)</span><span style=3D"colo=
r: #000;" class=3D"styled-by-prettify"><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 </sp=
an><span style=3D"color: #660;" class=3D"styled-by-prettify">;</span><span =
style=3D"color: #000;" class=3D"styled-by-prettify"><br></span><span style=
=3D"color: #660;" class=3D"styled-by-prettify">}</span><span style=3D"color=
: #000;" class=3D"styled-by-prettify"><br></span></div></code></div><br>Wha=
t behavior you expect from this code? GCC will made infinite loop there. Wh=
y? because this is single thread program and nobody can change value of `i`=
 after assignment.<br>Your loop is exactly the same, without atomics compil=
er assume your code is locally &quot;single thread&quot; and nobody can cha=
nge this values outside this threads.<br>Remember atomic operation are very=
 costly, if every line use atomic implicitly then you would pay this cost e=
ven if you not need or even didn&#39;t want.<br>Consider fact that some peo=
ple do not want use `shared_ptr` because it use internally atomic operation=
s and this cost them too much.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/ebbc4b8e-070b-45b8-ac52-06c8db064909%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/ebbc4b8e-070b-45b8-ac52-06c8db064909=
%40isocpp.org</a>.<br />

------=_Part_630_1689355805.1487465097671--

------=_Part_629_955441759.1487465097670--

.


Author: "'Walt Karas' via ISO C++ Standard - Future Proposals" <std-proposals@isocpp.org>
Date: Sat, 18 Feb 2017 18:01:23 -0800 (PST)
Raw View
------=_Part_555_246537066.1487469683153
Content-Type: multipart/alternative;
 boundary="----=_Part_556_1921074933.1487469683154"

------=_Part_556_1921074933.1487469683154
Content-Type: text/plain; charset=UTF-8



On Saturday, February 18, 2017 at 7:44:57 PM UTC-5, inkwizyt...@gmail.com
wrote:
>
>
>
> On Saturday, February 18, 2017 at 9:32:45 PM UTC+1, Walt Karas wrote:
>>
>>
>>
>>>
>>>>>> Here is some code that illustrates some points of confusion I have
>>>>>> with the memory model:
>>>>>>
>>>>>> (...)
>>>>>>
>>>>>>         while (*Twins_number < equals_mine)
>>>>>>           ;
>>>>>>
>>>>>
>>>>> I'm not expert or even experienced in atomics but I think that is race
>>>>> condition because you access this variable without synchronization. You
>>>>> would need probably fence or acquire in `while` otherwise compiler can
>>>>> amuse that value will not change there.
>>>>>
>>>>
>> ...
>>
>>>
>>>>
>>> Maybe using Cardinal = unsigned would work with optimization enabled if
>>>> I put an acquire fence in the loop.  But that seems a cheat.
>>>>
>>>
>>> Why is that a cheat? You need to tell the compiler that operations from
>>> another thread may affect the execution with this one. That's *exactly*
>>> what a fence does.
>>>
>>>
>> Consider this sequence:
>>
>> spam, spam, spam, spam, spam, spam, spam
>>
>> If I claimed that it was important to maintain the order of this
>> sequence, I would be rightly considered an idiot.  Likewise, it makes no
>> sense to put a fence in the while loop to maintain the order of
>> *identical* memory loads.  It would be some side effect of the fence
>> that would be of any benefit.  That is the sense in which I think it would
>> be a cheat.
>>
>> Maybe it's correct that atomic operations are about more than atomicity,
>> and fences are about more than creating rules about the partial ordering of
>> actual memory operations (on a single nominal memory for all threads).  But
>> if so, I don't think the Standard memory model is very clear about this.
>>
>> ...
>>
>
> Consider this trivial program:
>
>
> int i = 0;
> int main()
> {
>     i = 1;
>     while (i)
>         ;
> }
>
> What behavior you expect from this code? GCC will made infinite loop
> there. Why? because this is single thread program and nobody can change
> value of `i` after assignment.
> Your loop is exactly the same, without atomics compiler assume your code
> is locally "single thread" and nobody can change this values outside this
> threads.
> Remember atomic operation are very costly, if every line use atomic
> implicitly then you would pay this cost even if you not need or even didn't
> want.
> Consider fact that some people do not want use `shared_ptr` because it use
> internally atomic operations and this cost them too much.
>

Yes, but the issue is, what is a good memory model for representing that
and other issues with multi-threading.  The reality is threads do
"unsniffed" partial caching of the nominal  memory that is seen
consistently by all threads.  And compiler and instruction pipeline
optimization cause the actual order of memory accesses by threads to be
different from the nominal order (that the programmer determines with the
code).  The Standard's approach is to model all of this as fairly unlimited
reordering of actual memory accesses versus nominal ones, and then provides
mechanisms to limit the reordering.  It's not clear to me whether the
Standard specifies a partial or full ordering of accesses.  If it's
partial, then all the loads caused by while (i); can be in the same
equivalence class, or, if it's full, all the loads caused by while (i) can
be successive on the location of i with no intervening stores to i.  I
think it's probably better a better model where each thread nominally
caches all of memory.  The order of accesses on the cache can nominally be
considers the nominal one for the thread.  But then there is
per-memory-location read/write throughs of the thread cache, that can
happen unpredictably and with unpredictable order from the global
perspective.  So mechanisms are needed to limit the unpredictability in
order to avoid race conditions.  In the case of while (i); you'd want to
cache-invalidate i for each iteration of the loop.  But, from the global
perspective, all the resulting read-throughs to common memory of i would be
identical loads, so any idea of imposing order on them would make no sense.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3f1014c6-9ea0-4253-b1bb-72b4e5fd1284%40isocpp.org.

------=_Part_556_1921074933.1487469683154
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Saturday, February 18, 2017 at 7:44:57 PM UTC-5=
, inkwizyt...@gmail.com wrote:<blockquote class=3D"gmail_quote" style=3D"ma=
rgin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">=
<div dir=3D"ltr"><br><br>On Saturday, February 18, 2017 at 9:32:45 PM UTC+1=
, Walt Karas wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;marg=
in-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div>=C2=A0</div=
><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=
=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:=
1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0;m=
argin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"l=
tr"><blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><div style=3D"background-color:=
rgb(255,255,255);line-height:initial" lang=3D"en-US"><div></div></div></blo=
ckquote><div><br></div><div>Here is some code that illustrates some points =
of confusion I have with the memory model:</div><div><br></div><div><font c=
olor=3D"#666600">(...)</font><br><div><font face=3D"monospace" color=3D"#66=
6600"><br></font></div><div><font face=3D"monospace" color=3D"#666600">=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 while (*Twins_number &lt; equals_mine)</font></div=
><div><font face=3D"monospace" color=3D"#666600">=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 ;</font></div></div></div></blockquote><div>=C2=A0<br>I&#39;m no=
t expert or even experienced in atomics but I think that is race condition =
because you access this variable without synchronization. You would need pr=
obably fence or acquire in `while` otherwise compiler can amuse that value =
will not change there.<br></div></div></blockquote><div></div></div></block=
quote></div></blockquote><div><br></div><div>...=C2=A0</div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"ltr"><blockquote class=3D"gmail_quote"=
 style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><div dir=3D"ltr"><div>=C2=A0<br></div></div></blockquote><blockquote=
 class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Maybe using Cardinal =3D=
 unsigned would work with optimization enabled if I put an acquire fence in=
 the loop.=C2=A0 But that seems a cheat.</div></div></blockquote><div><br>W=
hy is that a cheat? You need to tell the compiler that operations from anot=
her thread may affect the execution with this one. That&#39;s <i>exactly</i=
> what a fence does.<br><br></div></div></blockquote><div>=C2=A0</div><div>=
Consider this sequence:</div><div><br></div><div>spam, spam, spam, spam, sp=
am, spam, spam</div><div><br></div><div>If I claimed that it was important =
to maintain the order of this sequence, I would be rightly considered an id=
iot. =C2=A0Likewise, it makes no sense to put a fence in the while loop to =
maintain the order of <i>identical</i>=C2=A0memory loads. =C2=A0It would be=
 some side effect of the fence that would be of any benefit. =C2=A0That is =
the sense in which I think it would be a cheat.</div><div><br></div><div>Ma=
ybe it&#39;s correct that atomic operations are about more than atomicity, =
and fences are about more than creating rules about the partial ordering of=
 actual memory operations (on a single nominal memory for all threads). =C2=
=A0But if so, I don&#39;t think the Standard memory model is very clear abo=
ut this.</div><div><br></div><div>...</div></blockquote><div><br>Consider t=
his trivial program:<br><br>=C2=A0<div style=3D"background-color:rgb(250,25=
0,250);border-color:rgb(187,187,187);border-style:solid;border-width:1px"><=
code><div><span style=3D"color:#008">int</span><span style=3D"color:#000"> =
i </span><span style=3D"color:#660">=3D</span><span style=3D"color:#000"> <=
/span><span style=3D"color:#066">0</span><span style=3D"color:#660">;</span=
><span style=3D"color:#000"><br></span><span style=3D"color:#008">int</span=
><span style=3D"color:#000"> main</span><span style=3D"color:#660">()</span=
><span style=3D"color:#000"><br></span><span style=3D"color:#660">{</span><=
span style=3D"color:#000"><br>=C2=A0 =C2=A0 i </span><span style=3D"color:#=
660">=3D</span><span style=3D"color:#000"> </span><span style=3D"color:#066=
">1</span><span style=3D"color:#660">;</span><span style=3D"color:#000"><br=
>=C2=A0 =C2=A0 </span><span style=3D"color:#008">while</span><span style=3D=
"color:#000"> </span><span style=3D"color:#660">(</span><span style=3D"colo=
r:#000">i</span><span style=3D"color:#660">)</span><span style=3D"color:#00=
0"><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 </span><span style=3D"color:#660">;</spa=
n><span style=3D"color:#000"><br></span><span style=3D"color:#660">}</span>=
<span style=3D"color:#000"><br></span></div></code></div><br>What behavior =
you expect from this code? GCC will made infinite loop there. Why? because =
this is single thread program and nobody can change value of `i` after assi=
gnment.<br>Your loop is exactly the same, without atomics compiler assume y=
our code is locally &quot;single thread&quot; and nobody can change this va=
lues outside this threads.<br>Remember atomic operation are very costly, if=
 every line use atomic implicitly then you would pay this cost even if you =
not need or even didn&#39;t want.<br>Consider fact that some people do not =
want use `shared_ptr` because it use internally atomic operations and this =
cost them too much.<br></div></div></blockquote><div><br></div><div>Yes, bu=
t the issue is, what is a good memory model for representing that and other=
 issues with multi-threading. =C2=A0The reality is threads do &quot;unsniff=
ed&quot; partial caching of the nominal =C2=A0memory that is seen consisten=
tly by all threads. =C2=A0And compiler and instruction pipeline optimizatio=
n cause the actual order of memory accesses by threads to be different from=
 the nominal order (that the programmer determines with the code). =C2=A0Th=
e Standard&#39;s approach is to model all of this as fairly unlimited reord=
ering of actual memory accesses versus nominal ones, and then provides mech=
anisms to limit the reordering. =C2=A0It&#39;s not clear to me whether the =
Standard specifies a partial or full ordering of accesses. =C2=A0If it&#39;=
s partial, then all the loads caused by while (i); can be in the same equiv=
alence class, or, if it&#39;s full, all the loads caused by while (i) can b=
e successive on the location of i with no intervening stores to i. =C2=A0I =
think it&#39;s probably better a better model where each thread nominally c=
aches all of memory. =C2=A0The order of accesses on the cache can nominally=
 be considers the nominal one for the thread. =C2=A0But then there is per-m=
emory-location read/write throughs of the thread cache, that can happen unp=
redictably and with unpredictable order from the global perspective. =C2=A0=
So mechanisms are needed to limit the unpredictability in order to avoid ra=
ce conditions. =C2=A0In the case of while (i); you&#39;d want to cache-inva=
lidate i for each iteration of the loop. =C2=A0But, from the global perspec=
tive, all the resulting read-throughs to common memory of i would be identi=
cal loads, so any idea of imposing order on them would make no sense.=C2=A0=
</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/3f1014c6-9ea0-4253-b1bb-72b4e5fd1284%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/3f1014c6-9ea0-4253-b1bb-72b4e5fd1284=
%40isocpp.org</a>.<br />

------=_Part_556_1921074933.1487469683154--

------=_Part_555_246537066.1487469683153--

.


Author: Nicol Bolas <jmckesson@gmail.com>
Date: Sun, 19 Feb 2017 11:26:08 -0800 (PST)
Raw View
------=_Part_4563_929129897.1487532368402
Content-Type: multipart/alternative;
 boundary="----=_Part_4564_902743531.1487532368403"

------=_Part_4564_902743531.1487532368403
Content-Type: text/plain; charset=UTF-8

On Saturday, February 18, 2017 at 9:01:23 PM UTC-5, Walt Karas wrote:
>
> On Saturday, February 18, 2017 at 7:44:57 PM UTC-5, inkwizyt...@gmail.com
> wrote:
>>
>> Consider this trivial program:
>>
>>
>> int i = 0;
>> int main()
>> {
>>     i = 1;
>>     while (i)
>>         ;
>> }
>>
>> What behavior you expect from this code? GCC will made infinite loop
>> there. Why? because this is single thread program and nobody can change
>> value of `i` after assignment.
>> Your loop is exactly the same, without atomics compiler assume your code
>> is locally "single thread" and nobody can change this values outside this
>> threads.
>> Remember atomic operation are very costly, if every line use atomic
>> implicitly then you would pay this cost even if you not need or even didn't
>> want.
>> Consider fact that some people do not want use `shared_ptr` because it
>> use internally atomic operations and this cost them too much.
>>
>
> Yes, but the issue is, what is a good memory model for representing that
> and other issues with multi-threading.
>

No, the issue is "what is wrong with the current memory model?" That's the
question I don't see that you have answered. If you want changes to it,
then you need to explain where it is deficient. And saying that it's
complex or that you don't understand it isn't a deficiency.

The reality is threads do "unsniffed" partial caching of the nominal
>  memory that is seen consistently by all threads.  And compiler and
> instruction pipeline optimization cause the actual order of memory accesses
> by threads to be different from the nominal order (that the programmer
> determines with the code).  The Standard's approach is to model all of this
> as fairly unlimited reordering of actual memory accesses versus nominal
> ones, and then provides mechanisms to limit the reordering.  It's not clear
> to me whether the Standard specifies a partial or full ordering of
> accesses.  If it's partial, then all the loads caused by while (i); can be
> in the same equivalence class, or, if it's full, all the loads caused by
> while (i) can be successive on the location of i with no intervening stores
> to i.
>

The standard is quite easy to understand in this regard. There are no
ordering or synchronization primitives in play with this code. And because
of that, if a thread writes to `i`, then that would invoke undefined
behavior via a data race. So the compiler is free to disregard the
possibility of `i` being written by other threads.

That's all you really need to understand.

I think it's probably better a better model where each thread nominally
> caches all of memory.  The order of accesses on the cache can nominally be
> considers the nominal one for the thread.  But then there is
> per-memory-location read/write throughs of the thread cache, that can
> happen unpredictably and with unpredictable order from the global
> perspective.  So mechanisms are needed to limit the unpredictability in
> order to avoid race conditions.  In the case of while (i); you'd want to
> cache-invalidate i for each iteration of the loop.  But, from the global
> perspective, all the resulting read-throughs to common memory of i would be
> identical loads, so any idea of imposing order on them would make no sense.
>

That's not a model; that's *describing what happens*. It's like the
difference between pointing to a globe of the Earth and pointing to the
Earth itself.

The point of a model is to have an abstraction, one that is distinct from
the practical implementation of that model.

You keep trying to figure out which threading operations mean "invalidate
the cache" or somesuch. That's the wrong way to think about it. C++ is not
written about caches and CPU architecture; it's written against a memory
model. The only way to understand the C++ memory model is to understand *the
C++ memory model*, not how it gets implemented.

If an operation is ordered before another one, then the second one can see
the results of the first. That part of the C++ memory model. How a compiler
implements this on a particular platform, whether caches have to be
invalidated or flushed, and so on are *irrelevant*. What matters is that
there are certain things you have to do which will ensure ordering between
operations across threads. If you do them, then you are guaranteed by the
memory model to see the results. If you don't do them, UB results.

Whether "each thread caches nominally" or some such is ultimately
irrelevant.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/f1322c43-a505-419d-9048-964bfc983b1e%40isocpp.org.

------=_Part_4564_902743531.1487532368403
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Saturday, February 18, 2017 at 9:01:23 PM UTC-5, Walt K=
aras wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left=
: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">On=
 Saturday, February 18, 2017 at 7:44:57 PM UTC-5, <a>inkwizyt...@gmail.com<=
/a> wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0=
..8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Con=
sider this trivial program:<br><br>=C2=A0<div style=3D"background-color:rgb=
(250,250,250);border-color:rgb(187,187,187);border-style:solid;border-width=
:1px"><code><div><span style=3D"color:#008">int</span><span style=3D"color:=
#000"> i </span><span style=3D"color:#660">=3D</span><span style=3D"color:#=
000"> </span><span style=3D"color:#066">0</span><span style=3D"color:#660">=
;</span><span style=3D"color:#000"><br></span><span style=3D"color:#008">in=
t</span><span style=3D"color:#000"> main</span><span style=3D"color:#660">(=
)</span><span style=3D"color:#000"><br></span><span style=3D"color:#660">{<=
/span><span style=3D"color:#000"><br>=C2=A0 =C2=A0 i </span><span style=3D"=
color:#660">=3D</span><span style=3D"color:#000"> </span><span style=3D"col=
or:#066">1</span><span style=3D"color:#660">;</span><span style=3D"color:#0=
00"><br>=C2=A0 =C2=A0 </span><span style=3D"color:#008">while</span><span s=
tyle=3D"color:#000"> </span><span style=3D"color:#660">(</span><span style=
=3D"color:#000">i</span><span style=3D"color:#660">)</span><span style=3D"c=
olor:#000"><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 </span><span style=3D"color:#660=
">;</span><span style=3D"color:#000"><br></span><span style=3D"color:#660">=
}</span><span style=3D"color:#000"><br></span></div></code></div><br>What b=
ehavior you expect from this code? GCC will made infinite loop there. Why? =
because this is single thread program and nobody can change value of `i` af=
ter assignment.<br>Your loop is exactly the same, without atomics compiler =
assume your code is locally &quot;single thread&quot; and nobody can change=
 this values outside this threads.<br>Remember atomic operation are very co=
stly, if every line use atomic implicitly then you would pay this cost even=
 if you not need or even didn&#39;t want.<br>Consider fact that some people=
 do not want use `shared_ptr` because it use internally atomic operations a=
nd this cost them too much.<br></div></div></blockquote><div><br></div><div=
>Yes, but the issue is, what is a good memory model for representing that a=
nd other issues with multi-threading.</div></div></blockquote><div><br>No, =
the issue is &quot;what is wrong with the current memory model?&quot; That&=
#39;s the question I don&#39;t see that you have answered. If you want chan=
ges to it, then you need to explain where it is deficient. And saying that =
it&#39;s complex or that you don&#39;t understand it isn&#39;t a deficiency=
..<br><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-=
left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr=
"><div>The reality is threads do &quot;unsniffed&quot; partial caching of t=
he nominal =C2=A0memory that is seen consistently by all threads. =C2=A0And=
 compiler and instruction pipeline optimization cause the actual order of m=
emory accesses by threads to be different from the nominal order (that the =
programmer determines with the code). =C2=A0The Standard&#39;s approach is =
to model all of this as fairly unlimited reordering of actual memory access=
es versus nominal ones, and then provides mechanisms to limit the reorderin=
g. =C2=A0It&#39;s not clear to me whether the Standard specifies a partial =
or full ordering of accesses.=C2=A0 If it&#39;s partial, then all the loads=
 caused by while (i); can be in the same equivalence class, or, if it&#39;s=
 full, all the loads caused by while (i) can be successive on the location =
of i with no intervening stores to i.</div></div></blockquote><div><br>The =
standard is quite easy to understand in this regard. There are no ordering =
or synchronization primitives in play with this code. And because of that, =
if a thread writes to `i`, then that would invoke undefined behavior via a =
data race. So the compiler is free to disregard the possibility of `i` bein=
g written by other threads.<br><br>That&#39;s all you really need to unders=
tand.<br><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D=
"ltr"><div>I think it&#39;s probably better a better model where each threa=
d nominally caches all of memory.=C2=A0 The order of accesses on the cache =
can nominally be considers the nominal one for the thread. =C2=A0But then t=
here is per-memory-location read/write throughs of the thread cache, that c=
an happen unpredictably and with unpredictable order from the global perspe=
ctive. =C2=A0So mechanisms are needed to limit the unpredictability in orde=
r to avoid race conditions. =C2=A0In the case of while (i); you&#39;d want =
to cache-invalidate i for each iteration of the loop. =C2=A0But, from the g=
lobal perspective, all the resulting read-throughs to common memory of i wo=
uld be identical loads, so any idea of imposing order on them would make no=
 sense.=C2=A0</div></div></blockquote><div><br>That&#39;s not a model; that=
&#39;s <i>describing what happens</i>. It&#39;s like the difference between=
 pointing to a globe of the Earth and pointing to the Earth itself.<br><br>=
The point of a model is to have an abstraction, one that is distinct from t=
he practical implementation of that model.<br><br>You keep trying to figure=
 out which threading operations mean &quot;invalidate the cache&quot; or so=
mesuch. That&#39;s the wrong way to think about it. C++ is not written abou=
t caches and CPU=20
architecture; it&#39;s written against a memory model. The only way to unde=
rstand the C++ memory model is to understand <i>the C++ memory model</i>, n=
ot how it gets implemented.<br><br>If an operation is ordered before anothe=
r one, then the second one can see the results of the first. That part of t=
he C++ memory model. How a compiler implements this on a particular platfor=
m, whether caches have to be invalidated or flushed, and so on are <i>irrel=
evant</i>. What matters is that there are certain things you have to do whi=
ch will ensure ordering between operations across threads. If you do them, =
then you are guaranteed by the memory model to see the results. If you don&=
#39;t do them, UB results.<br><br>Whether &quot;each thread caches nominall=
y&quot; or some such is ultimately irrelevant.<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/f1322c43-a505-419d-9048-964bfc983b1e%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/f1322c43-a505-419d-9048-964bfc983b1e=
%40isocpp.org</a>.<br />

------=_Part_4564_902743531.1487532368403--

------=_Part_4563_929129897.1487532368402--

.