Thread

Topic: Possible defect? common operations undefined for built-in types

Author: Christopher Eltschka <celtschk@web.de>
Date: Wed, 26 Sep 2001 14:43:42 GMT Raw View

Francis Glassborow <francis.glassborow@ntlworld.com> writes:

> In article <3B72B5AB.73989F1@sensor.com>, Ron Natalie <ron@sensor.com>
> writes
> >       x = y = z;
> >
> >which is a common idiom but technically undefined behavior based on the
> >discussion in question (since the resultant y value is used for perposes
> >other than computation of the new value between sequence points).
>
> I am not convinced. The program already knows the rvalue that it
> assigned to y and has no need to access y to determine what it is.

Maybe it helps to invent a certain concrete implementation of the
abstract machine. This implementation works as follows:

* lvalues are always represented by memory addresses in registers
* rvalues are always represented by values in registers
  (the machine has enough register space)
* side effects are only done directly at the sequence points, by
  storing the value from the register into memory.
* No optimization is done

Now let's translate the statement "j = ++i;" to this machine using the
C++ definitions:

// i
load reg1 with address of i // get lvalue i
// pre-increment
load reg1 with mem[reg1]    // lvalue to rvalue conversion
increment reg1              // gives rvalue to be stored in i
load reg2 with address of i // the result of ++i is an lvalue
                            // referring to i
// j
load reg3 with address of j // get lvalue j
// assignment
load reg2 with mem[reg2]    // lvalue to rvalue conversion
                            // now reg2 contains the value to be
                            // stored to j
load reg4 with address of j // result of assignment expression
                            // (not used, but we are not optimizing)
// sequence point
store reg1 to i             // side effect of ++i
store reg2 to j             // side effect of assignment

Now, after this code, i will be incremented, and j will contain the
_old_ value of i.

Now there are two possibilities:

1. The above implementation of the abstract machine is not conforming.
   Then there must be at least one clause which is not fulfilled.

2. The above implementation _is_ conforming.
   Then the standard allows behaviour which surely wasn't intended to.

Now the question is: Which one is true?

Note that the C definition does not have this problem, since the
rvalues pending to be stored as side effect are also the resulting
values of the expressions:

// i
load reg1 with address of i // get lvalue i
// pre-increment
load reg1 with mem[reg1]    // lvalue to rvalue conversion
increment reg1              // gives rvalue to be stored in i
load reg2 with reg1         // which is also the return value of ++i
// j
load reg3 with address of j // get lvalue j
// assignment
load reg4 with reg2         // result of assignment expression
                            // (not used, but we are not optimizing)
// sequence point
store reg1 to i             // side effect of ++i
store reg2 to j             // side effect of assignment

As you see, here i and j get the same, incremented value. Therefore
the implementation seems to be valid at least for C.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Jack Klein <jackklein@spamcop.net>
Date: Tue, 14 Aug 2001 03:00:34 GMT Raw View

On Mon, 13 Aug 2001 03:29:58 GMT, jpotter@falcon.lhup.edu (John
Potter) wrote in comp.std.c++:

> On Sat, 11 Aug 2001 23:07:10 GMT, Jack Klein <jackklein@spamcop.net>
> wrote:
>
> > extern bool horrible_bug;
> > extern volatile char uart_tx_reg;
> >
> > void uart_tx(char ch)
> > {
> >    static char last_char;
> >
> >    // conditional stuff based on value
> >    // of last_char
> >
> >    if (horrible_bug)
> >    {
> >       // read and discard top character in
> >       // uart's receive fifo buffer,
> >       // read access to volatile lvalue can't
> >       // be optimized away
> >       last_char = uart_tx_reg = ch;
>
> I think the top character is stored in last_char, not discarded.
> Maybe worse since it was expected to hold ch.  Assuming your
> interpretation.

Yes, again I slipped up on the little details of the illustration.
The intended purpose was a driver that kept some state information,
for such purposes as end of line translation, adding page breaks, tab
to space conversions, whatever.  So not only would last_char get the
wrong character, the receive driver in another function would not know
where to look for it.

> >    }
> >    else
> >    {
> >       // access uart hardware once only, to
> >       // write the character to be output
> >       uart_tx_reg = last_char = ch;
> >    }
> > }
>
> On Thu,  9 Aug 2001 18:24:46 GMT, Jack Klein <jackklein@spamcop.net>
> wrote:
>
> > First the wording.  The sentence "The value is the new value of the
> > operand; it is an lvalue." is just plain contradictory.  It appears
> > that the wording of the C standard (ISO C90 and C99) was quickly
> > edited, as it said "An assignment expression has the value of the left
> > operand after the assignment, but is not an lvalue."
>
> That is the C++ wording for increment and, I assume, The C wording for
> assignment.  Note that C requires the value of the left operand after
> the assignment, not the value assigned.  In your example above, that
> would still require the read of the volatile after the assignment.  It

I think you're drifting a bit here.  With the possible exception of
volatile objects, the value assigned to an object in C (after type
conversion, if the type is different from the source) is the same as
the rvalue assigned to the operand.

> is even worse than C++ because it also requires a read in the else
> branch to produce the unused value of the complete expression.

It is easy to get tangled up in rvalues and lvalues.  The C++
semantics specify that the result is an lvalue.  An lvalue is not
automatically accessed.  lvalue to rvalue conversion only takes place
when an lvalue is used in an expression where an rvalue is required.
This is not one of these cases, anymore than:

int i, j = 10;
int *ip;

   ip = &(i = j);

In this case the result of the assignment to i, that is the address of
i after it is assigned to (even though that is the same as the address
of i before it is assigned to) is stored in ip.  i is not accessed
after the assignment because its rvalue is not needed.

By the same process the rvalue of uart_tx_reg in the else branch is
not needed, lvalue to rvalue conversion is not performed, and the
volatile object is not read after being written.  That is why the
lvalue result is so dastardly.  It silently breaks some code, but not
all.

> Interesting that it is not possible to assign to a volatile in C without
> getting a read access.  How did anything ever get done with that wording
> in the C standard? :)

It is possible to assign to a volatile without the value being read
back in C, although I think you have just pointed out an ambiguity in
the C standard.

Here is the entire paragraph from the C99 standard:

       [#3] An assignment operator stores a  value  in  the  object
       designated  by  the  left operand.  An assignment expression
       has the value of the left operand after the assignment,  but
       is  not  an lvalue.  The type of an assignment expression is
       the type of the left operand unless  the  left  operand  has
       qualified  type, in which case it is the unqualified version
       of the type  of  the  left  operand.   The  side  effect  of
       updating  the  stored  value of the left operand shall occur
       between the previous and the next sequence point.

I think the point of the "...value of the left operand after the
assignment..." portion is intended to mean that given:

double d1, d2 = 3.14159;
char ch;

  d1 = ch = d2;

The value assigned to d1 is not the value of d2, but is
(double)((char)d2).  d1 will be set to 3.0 and not 3.14159, because it
receives not the original value specified to be assigned to ch, but
the rvalue that will actually be written to ch sometime before the
next sequence point.

As it is actually worded it could be interpreted to imply that the
value must be read back from the object, at least if the object is
volatile.

> Making it an rvalue in C++ would not solve the problem and would add
> a new exception of the only rvalue which may be bound to a non-const
> reference or have its address taken.

I was proposing changing the result of these operators (and remember,
these definitions only apply to operators for built-in types) back to
rvalues, but adding explicit wording that an expression requiring an
lvalue is applied to the assignment, pre-increment, or pre-decrement
is allowed and it results in the address of the modified object.

> I think the intent is to produce an lvalue with an lvalue to rvalue
> conversion which is the value assigned without requiring an access.

I think not.  Although the operation or one or many compilers do not
define the language, I did a few tests, summarized at the bottom of
the post.

> That is a mess with volatile, but volatile is a mess anyway and
> anyone using it must be very careful.  Functions with side effects
> are bad enough without combining them with self modifying data.
>
> We know what we want, how to put it in standardize is the question.
>
> John

The sad thing is that all of this is really unnecessary.  We want user
defined operators on user defined types to be able to take advantage
of the efficiency of passing and returning references rather than
copying potentially large objects or creating temporaries.  But the
definitions we are discussing have nothing to do with user defined
operators, they are not bound to these definitions.  Even if we change
these rules in the standard back to yielding rvalues, the operator=(),
operator++(), and operator--() user defined functions for a class can
be defined to return references, and they do not need to be
dereferenced unless the rvalue is explicitly taken.

There are two things that bother me here.  The first is that the only
formal definition of a method placing an order on when modified values
must be stored into the modified object is the sequence point.  The
C++ standard is quite specific about that, in paragraph 7 of 1.9:

       Accessing an object designated by a volatile lvalue (3.10),
       modifying an object, calling a library I/ O function, or
       calling a function that does any of those operations are all
       side effects, which are changes in the state of the execution
       environment. Evaluation of an expression might produce side
       effects. At certain specified points in the execution sequence
       called sequence points, all side effects of previous
       evaluations shall be complete and no side effects of subsequent
       evaluations shall have taken place.

Yet here is the wording of certain operators stating that operators
have "sequence point-ish" behavior, that is they require that their
side effects take place before other subexpressions of a full
expression may be evaluated.  Are they sequence points or not?

1.9 seems fairly conclusive that they are not, defining sequence
points in exactly the same manner as C, at the end of a full
expression and as imposed by the built-in &&, ||, ternary, and comma
operators, and in function calls and returns.

So even if the currently specified behavior is exactly what is wanted
(warts and all) for those operators, the standard does not really
define their behavior.  1.9 both prohibits them from being sequence
points, by leaving them out of the completely defined list of sequence
points, and prohibits anything other than a sequence point from
forcing an order of storing results to variables.

If the standard wants to properly retain the current behavior it needs
to properly define it.  Either these operators must be as sequence
points, or another mechanism must be introduced to specifically allow
operators which are not sequence points to impost an order of
assigning a new value to a modified object.

I am in favor of scrapping the current definitions, as I do not think
that it would break any conforming code except perhaps a very few
cases involving volatiles like this, and untangle the mess.

================
Do you know what your C++ compiler is doing?

I built a test program and tried it with two relatively recent
compilers I had handy, Microsoft Visual C++ 6.0, and MetroWerks
CodeWarrior 6.1.  Here is the program:

#include <iostream>

extern volatile int access_type;
extern volatile char uart_vol;
extern          char uart_nv;

static int send_uart(char ch)
{
 static char last_char;
 switch (access_type)
 {
  case 0:
   last_char = uart_vol = ch;
   break;
  case 1:
   uart_vol = last_char = ch;
   break;
  case 2:
   last_char = uart_nv = ch;
   break;
  default:
   uart_nv = last_char = ch;
   break;
 }
 return ch + 2;
}

int main(void)
{
 std::cout << send_uart('a') << std::endl;
 return 0;
}

The point was to see what the compilers generated for the chained
assignments, without and without a volatile object as the center or
left hand destination.  I built the program twice with each compiler,
once at the default debug setting (no or minimal optimization which
might rearrange object code and make debugging difficult), and with
maximum optimization for speed.

I had thought that compilers would optimize away the lvalue to rvalue
conversion and generate the same code they would when compiling as C,
except possibly when a volatile was involved, but I was quite
incorrect.  The compiler vendors are quite good at implementing the
wording of the standard, even if it does not quite match the intent.

In the debug built, both compilers stored to the center variable, then
read back the center variable and stored the result into the left hand
destination.  Even when the center variable was not volatile.

When compiling for maximum speed optimization, CodeWarrior still
performed the read back of the center assigned value to store in the
left hand destination in all four cases.  Visual C++ optimized away
the read backs except in the case of the volatile object in the
center, when it did read it back.  In the other three cases it
assigned the value read from the right hand source to both the center
and left objects with no read backs.

Neither of the compilers read back the left hand destination in any
case, since lvalue to rvalue conversion was not required.

This can also turn optimization into a time bomb, leading an optimizer
to mistakenly change the meaning of a program.  With full
optimization, Visual C++ optimized away all the read backs except for
this one:

 last_char = uart_vol = ch;

But suppose a similar situation where last_char were not static but
automatic, and there was a return immediately after this statement, or
data flow analysis could detect that on this particular branch the
value of last_char was never used.

In this case the compiler can quite properly optimize away the
assignment, but it would matter a great deal exactly how it did so.
If it naively eliminated the assignment too early in its processing,
code generation for the statement might be reduced to that for:

   uart_vol = ch;

Which does not require, and does not generate, a read back of
uart_vol.  This would indeed change the operation of the program a
great deal, if uart_vol is volatile.  The non-optimized/debugging
build of the code would read back the value, the optimized version
would not.  Under the current wording of the standard, the read back
can't be optimized away, only the assignment after the read back, but
I wonder if that sort of mistake might slip into an optimizer.

Finally, I find the fact that changing the order of assignment
involving volatiles can totally change the behavior of a program
without obvious cause disturbing.

 last_char = uart_vol = ch;

....requires that the volatile object uart_vol be written and read
(even though is doesn't seem to clearly state the order of the
accesses!).

 uart_vol = last_char = ch;

....does not require that uart_vol be read.  But I do not believe that
the standard prohibits it from being read, either.

Since C++ is intended to be every bit as suitable as a system
programming language as C, the uncertainty and hidden traps in
operations on volatile objects is really intolerable.

--
Jack Klein
Home: http://JK-Technology.Com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Jack Klein <jackklein@spamcop.net>
Date: Fri, 10 Aug 2001 09:39:15 GMT Raw View

On Thu,  9 Aug 2001 22:28:19 GMT, Francis Glassborow
<francis.glassborow@ntlworld.com> wrote in comp.std.c++:

> In article <3B72B5AB.73989F1@sensor.com>, Ron Natalie <ron@sensor.com>
> writes
> >       x = y = z;
> >
> >which is a common idiom but technically undefined behavior based on the
> >discussion in question (since the resultant y value is used for perposes
> >other than computation of the new value between sequence points).
>
> I am not convinced. The program already knows the rvalue that it
> assigned to y and has no need to access y to determine what it is.
>
>
> Francis Glassborow      ACCU

That's exactly the problem that I am trying to point out in the
wording of the standard, and in fact the chained assignment case is
better than the cases I tried to use initially.

An expression in C or C++ can produce exactly one value or zero values
(in the case of an expression such as calling a function with a void
return type).  Nowhere in either language is there such a thing as an
expression that produces two values.

If the assignment to y produces the rvalue assigned to y, the program
"knows" it and can therefore assign it to x once it has resolved it.
It does not need to read y nor does it need to have performed the
assignment to y prior to reading y.

But the value of the assignment operation to y is not an rvalue in
C++, it is the lvalue &y.  In the real world we can expect the program
to keep this value around, perhaps in a register, and not really need
to access y.  But we are talking about the abstract machine here, and
in no case does the standard ever require that a value created in the
evaluation of an expression that is not the result of that expression
be retained.

The statement above can be broken down into primitives:

1.   access lvalue &z to get rvalue (call it rv1)
2.   convert rv1 to type of y yielding rv2 (may be same type)
3.   assign rv2 to lvalue &y.  rv2 can now cease to exist.
4.   yield result, &y.
5.   access lvalue &y to get rvalue (call it rv3)
6.   convert rv3 to type of x yielding rv4 (may be same type)
7.   assign rv4 to lvalue &x.  rv4 can now cease to exist.
8.   yield result, &x.

Since there are no sequence points involved these primitives can be
executed in any order that provides each primitive with the necessary
preconditions.  Since the lvalue &y that is returned by the assignment
to y is unaffected and unchanged by the assignment to y (the rvalue in
the lvalue changes, the lvalue does not), all of the preconditions for
performing step 5 are met at the very beginning of evaluating the
statement.

The only thing that the standard requires of a conforming
implementation executing this statement is that by the sequence point
at the end of the statement both x and y will have been updated with a
new value and the value of the entire statement, &x, will be produced.

In the absence of sequence points, I do not see anything in the
standard that forces the compiler to store the eventual final new
value to y before &y is accessed to convert it to an rvalue.  We know
that is what we want, it is just that the wording does not actually
specify what it is we want, and what compilers do because the compiler
writers know what it is we want as well.

The problem goes away if we state in C++, like in C, that the
assignment operator yields the rvalue assigned to the destination.
The primitives are:

1.  access lvalue &z to get rvalue rv1
2.  convert rv1 to the type of y yielding rv2 (may be same type)
3.  assign rv2 to lvalue &y yield result rv2
4.  convert rv2 to the type of x yielding rv3 (may be same type)
5.  assign rv3 to lvalue &y yield result rv3

The order of the actual modification of the destination rvalues has no
effect on the computation.  A partial ordering is imposed by the fact
that rv2 is a necessary input to the calculation of rv3.

The only sensible way out of this that I can see is to change the
definitions of operators I mentioned (and any others I missed) back to
yielding an rvalue, with added text to indicate that the address of
such an expression can be taken and it yields the address of the
modified object.

The compiler can optimize away the temporaries when user defined
operators with reference return types are evaluated and the result is
not used as an rvalue.

--
Jack Klein
Home: http://JK-Technology.Com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: jpotter@falcon.lhup.edu (John Potter)
Date: Fri, 10 Aug 2001 21:27:28 GMT Raw View

On Thu,  9 Aug 2001 18:24:46 GMT, Jack Klein <jackklein@spamcop.net>
wrote:

There was a long discussion on this before.  There may even be an issue.

> or to specify that
> they yield an rvalue sometimes and an lvalue sometimes depending on
> what you do with the result (much, much nastier).

No matter how nasty, that seems to be the only way to make any sense
of it.  However,

int i = 5, j;
j = *&++i;

can only make sense if the *& is considered a no-op and removed.  We
could confuse things a bit more by adding an int* p.

j = *(p = &++i);

Since there are no sequence points and no undefined or unspecified
behavior, I expect j == 6 and do not care when things are stored.

> I hope I have made myself clearer this time.  If someone can point out
> something I am missing in the standard that does impose an ordering
> that I fail to see, I would appreciate it.

You are quite clear.  The one thing I can question is your statement
that the lvalue of the expression ++x may be used before the expression
is evaluated.  Just because it is known without evaluating the
expression does not mean that it is available before the calculation.
I think it is just as clear here as in other cases that the value of
a subexpression may not be used prior to evaluation of it.

John

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: jpotter@falcon.lhup.edu (John Potter)
Date: Fri, 10 Aug 2001 21:29:33 GMT Raw View

On Thu,  9 Aug 2001 18:27:13 GMT, Ron Natalie <ron@sensor.com> wrote:

> I think ++x + ++x is less of an issue than:
>
>  x = y = z;
>
> which is a common idiom but technically undefined behavior based on the
> discussion in question (since the resultant y value is used for perposes
> other than computation of the new value between sequence points).

Not a problem.  It is using the OLD value for purposes other than
computation of the new value that gives undefined behavior.  There are
no restrictions on use of the new value.

John

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Jack Klein <jackklein@spamcop.net>
Date: Sat, 11 Aug 2001 12:14:26 GMT Raw View

On Fri, 10 Aug 2001 21:27:28 GMT, jpotter@falcon.lhup.edu (John
Potter) wrote in comp.std.c++:

> On Thu,  9 Aug 2001 18:24:46 GMT, Jack Klein <jackklein@spamcop.net>
> wrote:
>
> There was a long discussion on this before.  There may even be an issue.
>
> > or to specify that
> > they yield an rvalue sometimes and an lvalue sometimes depending on
> > what you do with the result (much, much nastier).
>
> No matter how nasty, that seems to be the only way to make any sense
> of it.  However,
>
> int i = 5, j;
> j = *&++i;
>
> can only make sense if the *& is considered a no-op and removed.  We
> could confuse things a bit more by adding an int* p.

The "as-if" rule indeed covers this, and I am sure that just about
every existing C++ compiler actually compiles a statement like this
with built-in types exactly as C would, and uses the lvalue.  But what
happens if you mix in the volatile type qualifier?  See my post down
thread for a reasonable example based on the assignment operator.

> j = *(p = &++i);
>
> Since there are no sequence points and no undefined or unspecified
> behavior, I expect j == 6 and do not care when things are stored.

Since there are no sequence points there is nothing requiring that the
new value be assigned to i before p is dereferenced.

> > I hope I have made myself clearer this time.  If someone can point out
> > something I am missing in the standard that does impose an ordering
> > that I fail to see, I would appreciate it.
>
> You are quite clear.  The one thing I can question is your statement
> that the lvalue of the expression ++x may be used before the expression
> is evaluated.  Just because it is known without evaluating the
> expression does not mean that it is available before the calculation.
> I think it is just as clear here as in other cases that the value of
> a subexpression may not be used prior to evaluation of it.
>
> John

I don't see your final question as really being relevant to the issue,
even though it sounds like it might be.  In terms of the abstract
machine &i must be dereferenced to obtain a value to be assigned to j.
Even if the compiler generates code to access the rvalue of i (in a
register, for example) and increments the rvalue, it most certainly
has now had to perform an operation that caused it to generate &i.  It
can now proceed to access that lvalue again to retrieve the rvalue to
assign to j.

There is nothing I have ever seen in the standard at all that forces
the storage of a value to an lvalue prior to a sequence point.  If I
have missed such a requirement, please point it out.

--
Jack Klein
Home: http://JK-Technology.Com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Jack Klein <jackklein@spamcop.net>
Date: Sat, 11 Aug 2001 23:07:10 GMT Raw View

Replying to my own post to supply a horrible, but not terribly
contrived, example.  Based on two assumptions:

Assumption 1:  The implementation defines access to a volatile object
to include reading and writing the object.

Assumption 2:  Uses one of the many popular UARTs like the 16C550 or
Zilog SCC variants where the read only receive register and write only
transmit register are mapped to the same address.

extern bool horrible_bug;
extern volatile char uart_tx_reg;

void uart_tx(char ch)
{
   static char last_char;

   // conditional stuff based on value
   // of last_char

   if (horrible_bug)
   {
      // read and discard top character in
      // uart's receive fifo buffer,
      // read access to volatile lvalue can't
      // be optimized away
      last_char = uart_tx_reg = ch;
   }
   else
   {
      // access uart hardware once only, to
      // write the character to be output
      uart_tx_reg = last_char = ch;
   }
}

--
Jack Klein
Home: http://JK-Technology.Com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: jpotter@falcon.lhup.edu (John Potter)
Date: Mon, 13 Aug 2001 03:29:58 GMT Raw View

On Sat, 11 Aug 2001 23:07:10 GMT, Jack Klein <jackklein@spamcop.net>
wrote:

> extern bool horrible_bug;
> extern volatile char uart_tx_reg;
>
> void uart_tx(char ch)
> {
>    static char last_char;
>
>    // conditional stuff based on value
>    // of last_char
>
>    if (horrible_bug)
>    {
>       // read and discard top character in
>       // uart's receive fifo buffer,
>       // read access to volatile lvalue can't
>       // be optimized away
>       last_char = uart_tx_reg = ch;

I think the top character is stored in last_char, not discarded.
Maybe worse since it was expected to hold ch.  Assuming your
interpretation.

>    }
>    else
>    {
>       // access uart hardware once only, to
>       // write the character to be output
>       uart_tx_reg = last_char = ch;
>    }
> }

On Thu,  9 Aug 2001 18:24:46 GMT, Jack Klein <jackklein@spamcop.net>
wrote:

> First the wording.  The sentence "The value is the new value of the
> operand; it is an lvalue." is just plain contradictory.  It appears
> that the wording of the C standard (ISO C90 and C99) was quickly
> edited, as it said "An assignment expression has the value of the left
> operand after the assignment, but is not an lvalue."

That is the C++ wording for increment and, I assume, The C wording for
assignment.  Note that C requires the value of the left operand after
the assignment, not the value assigned.  In your example above, that
would still require the read of the volatile after the assignment.  It
is even worse than C++ because it also requires a read in the else
branch to produce the unused value of the complete expression.
Interesting that it is not possible to assign to a volatile in C without
getting a read access.  How did anything ever get done with that wording
in the C standard? :)

Making it an rvalue in C++ would not solve the problem and would add
a new exception of the only rvalue which may be bound to a non-const
reference or have its address taken.

I think the intent is to produce an lvalue with an lvalue to rvalue
conversion which is the value assigned without requiring an access.
That is a mess with volatile, but volatile is a mess anyway and
anyone using it must be very careful.  Functions with side effects
are bad enough without combining them with self modifying data.

We know what we want, how to put it in standardize is the question.

John

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Jack Klein <jackklein@spamcop.net>
Date: Tue, 7 Aug 2001 03:04:15 GMT Raw View

The pre-increment operator and the assignment operators in C++, unlike
C, yield an lvalue, not an rvalue.  It is possible to use these
operators in an expression where lvalue to rvalue conversion is
required:

T func()
{
   int i = 10;
   int j = ++i;
   int k = j += 2;
}

5.3.2 paragraph 1 states in part:

"The value is the new value of the operand; it is an lvalue. If x is
not of type bool, the expression ++x is equivalent to x+=1."

5.17 paragraph 1 states in part:

"All require a modifiable lvalue as their left operand, and the type
of an assignment expression is that of its left operand. The result of
the assignment operation is the value stored in the left operand after
the assignment has taken place; the result
is an lvalue."

There might be other clauses in the standard

This wording implies that there must be a sequence point guaranteeing
that the increment or assignment operator on the rhs of the full
assignment statement must be completed before the lvalue is accessed
to yield its rvalue contents for assigning to the lvalue on the lhs.
When used purely as an lvalue:

   int i = 10;
   int* pi = &++i;

....no ordering is needed between use of the lvalue (taking its
address) and the calculation of the new rvalue it contains, since the
rvalue is not accessed.

Such a sequence point is implicit in overloaded operators for user
defined data types, since these are functions and create a minimum of
three sequence points (after argument evaluation prior to calling, at
the end of function execution prior to returning, and in the calling
function prior to resuming execution).  But no such sequence point is
mandated for the built-in pre-increment and assignment operators for
the built-in types.

Although I may be missing something, I do not find any wording in the
standard other than those two paragraphs that implies a sequence point
in these operations, and none is needed in C since they yield rvalues
that cannot exist until after the increment or assignment is
performed.

Would it clear things up if the standard contained something along
either of these lines:

1.  These operators can yield either an rvalue or an lvalue, depending
on context (ugh!).

2.  Explicitly impose a sequence point after the rvalue result of the
pre-increment or assignment operator is stored in its immediate
destination, and before any use of the lvalue is allowed in a full
expression.

Or is there something already there I am missing?

--
Jack Klein
Home: http://JK-Technology.Com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Ron Natalie <ron@sensor.com>
Date: Tue, 7 Aug 2001 16:10:06 GMT Raw View

> This wording implies that there must be a sequence point guaranteeing
> that the increment or assignment operator on the rhs of the full
> assignment statement must be completed before the lvalue is accessed
> to yield its rvalue contents for assigning to the lvalue on the lhs.

I don't agree with this.  Why must there be a sequence point?
The standard already precludes you from doing much with the
lvalue that has been modified until you hit a sequence point
anyhow.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: "Heinz Ozwirk" <wansor42@gmx.de>
Date: Wed, 8 Aug 2001 22:57:40 GMT Raw View

> "The value is the new value of the operand; it is an lvalue. If x is
> not of type bool, the expression ++x is equivalent to x+=1."
[...]
> Although I may be missing something, I do not find any wording in the
> standard other than those two paragraphs that implies a sequence point
> in these operations, and none is needed in C since they yield rvalues
> that cannot exist until after the increment or assignment is
> performed.

I don't think, that the standard implies a sequence point here. (Otherwise
something like ++x + ++x wouldn't be implementation dependend.) And I don't
think that sequence points are required either. On the other hand, the
standard does define a sequence of actions that are performed by ++x (as
long as no user defined types are involved):
-    first ++x increments the content of x
-    then it returns the result (no matter whether it is an l- or an
r-value).
This defines a sequence of steps to calculate the result of the ++x (sub-)
expression. A sequence point, however, would not only effect how ++x is
evaluated, but it would also influence how an expression containing ++x.
Basically this is the same as something like

    a() + (b() * c())

As far as I remember, the standard doesn't say anything about the order in
which a(), b() and c() should be evaluated. But it is obvious, that b() and
c() must be evaluated (in any order) before * can be applied to their
results. And a() and * must have been evaluated before + can be applied.

So, the only difference I see between ++x being an r-value in C and an
l-value in C++ is, that something like

    ++x = 1 or &++x

is legal in C++ but not in C. On the other hand

    (x += 1) = 1 and &(x += 1)

are legal in both languages. (Even thoudh the first one looks quite stupid
to me.) So, if ++x means x+=1, and x+=1 is an l-value, why shouldn't ++x be
one, too.

> Would it clear things up if the standard contained something along
> either of these lines:
>
> 1.  These operators can yield either an rvalue or an lvalue, depending
> on context (ugh!).

If someone would describe the way to his home saying "Once you got there
take first or second road to the left" would be less clear than "Once you
got there, take the first road to the left if you have a blue car and the
second if it has any other color"?

> 2.  Explicitly impose a sequence point after the rvalue result of the
> pre-increment or assignment operator is stored in its immediate
> destination, and before any use of the lvalue is allowed in a full
> expression.

Probably that would help the reader, but it would prevent parallizing all
expressions containing operator++(). But if would also help the reader, if
the standard would explicitly say "There is no sequence point". But then,
almost every other sentence would be "Here is no sequence point".

Regards
    Heinz


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: markw65@my-deja.com (Mark Williams)
Date: Wed, 8 Aug 2001 22:57:40 GMT Raw View

Ron Natalie <ron@sensor.com> wrote in message news:<3B6FECB0.108A4304@sensor.com>...
> > This wording implies that there must be a sequence point guaranteeing
> > that the increment or assignment operator on the rhs of the full
> > assignment statement must be completed before the lvalue is accessed
> > to yield its rvalue contents for assigning to the lvalue on the lhs.
>
> I don't agree with this.  Why must there be a sequence point?
> The standard already precludes you from doing much with the
> lvalue that has been modified until you hit a sequence point
> anyhow.

I think the question is about whether you can do anything useful with
the rvalue. The claim appears to be that if there is no sequence
point, then performing the lvalue-to-rvalue conversion may get the old
value, not the new one. I think, however, that this is covered by the
wording "after the assignment has taken place".

I think the real problem is that the wording is terribly confused:
"The result of the assignment operation is the value stored in the
left operand after
the assignment has taken place; the result is an lvalue."

This really doesnt make sense, since the value stored is an rvalue.
And is the lvalue required to be that of the lhs, or can the compiler
insert a temporary and use that?

I think better wording would be "The result of the assignment
operation is the value of the left operand after the assignment has
taken place; the result is an lvalue". Or maybe just "The result of
the assignment operation is the lvalue of the left operand after the
assignment has taken place" in order to avoid the question of "which
lvalue?".

Of course, that has implications when the lhs is volatile, which the
actual wording may have been trying to avoid (of course, it fails
miserably).

Mark Williams

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Jack Klein <jackklein@spamcop.net>
Date: Thu, 9 Aug 2001 18:24:46 GMT Raw View

On Tue,  7 Aug 2001 16:10:06 GMT, Ron Natalie <ron@sensor.com> wrote
in comp.std.c++:

>
> > This wording implies that there must be a sequence point guaranteeing
> > that the increment or assignment operator on the rhs of the full
> > assignment statement must be completed before the lvalue is accessed
> > to yield its rvalue contents for assigning to the lvalue on the lhs.
>
> I don't agree with this.  Why must there be a sequence point?
> The standard already precludes you from doing much with the
> lvalue that has been modified until you hit a sequence point
> anyhow.

I know I didn't explain this well.  The standard specifically allows
you to do the one thing that specifically causes undefined behavior
with the lvalue, and that is dereference it to access its value in a
manner unrelated to calculating the new value.

I really want to try this again.  First, I have no problem at all with
the concept that we want C++ to be able to take the address of
expressions in places where C can't and doesn't need to.  The problem
I see is that the wording of the standard literally does not make
sense in some cases and does not, as written, allow some things to be
done without generating undefined behavior.

First the wording.  The sentence "The value is the new value of the
operand; it is an lvalue." is just plain contradictory.  It appears
that the wording of the C standard (ISO C90 and C99) was quickly
edited, as it said "An assignment expression has the value of the left
operand after the assignment, but is not an lvalue."

Now you can take the address of anything that is an lvalue in C++.
Example, the following:

#include <iostream>

int main(void)
{
   int a = 0, *p1, *p2, *p3;
   p1 = &a;
   p2 = &++a;
   p3 = &a;

   if (p1 == p2 && p2 == p3)
      std::cout << "all the same";
   else
      std::cout << "not the same";
   std::cout << std::endl;
}

This program must output "all the same".  Specifically, the lvalue
yielded by the preincrement, predecrement, or assignment operators has
nothing to do with whether it is the object contains the new or old
object at the time, the lvalue is the same.

The C++ abstract machine, based on and extended from the C virtual
machine, is quantized.  The quanta are sequence points.  At each
sequence point in a program, the state of the actual machine must
match that defined by the abstract machine (at least in so far as a
program checks).  In between sequence points an uncertainty principle
applies, in the sense that anything can be in any state between
sequence points and attempting to peek at the uncertainty invokes
undefined behavior.

In theory an expression which has the side effect of modifying an
object can perform the actual modification at any point between the
preceding and following sequence point, in any order with regard to
any other evaluations and actions taking place between the same two
sequence points.

In practice there is another mechanism that imposes an order on the
evaluation of some expressions, even though I have never noticed it
explicitly stated in the standard for either language:  an expression
with the side effect of modifying the value of an object can't perform
the final modification of the object until the new value has been
computed.

Keeping the two facts above in mind, and assuming definition of ints i
and j and an initial value for i, let us look at the same expression
in C and C++:

   j = ++i;

In C this causes the following events:

1.  Access the value of i (lvalue to rvalue conversion).
2.  Add 1 to this value.
3.  Store this value into i.
4.  Store this value into j.

Steps 3 and 4 can occur in either order, or both simultaneously if the
hardware permits.  There is no defined ordering between them.  But
step 1 must be performed before step 2, and step 2 must be performed
before either 3 or 4.  So although there is no sequence point until
the end of the full expression, there is a partial ordering imposed by
the simple fact that a value cannot be used until it is computed.

Now look at how the C++ abstract machine treats the same expression.
Of course the compiler can optimize much away, but we are examining
what the abstract machine describes:

1.  Access the value of i (lvalue to rvalue conversion).
2.  Add 1 to this value.
3.  Store this value into i.
4.  Access the value of i (lvalue to rvalue conversion).
5.  Store this value into j.

Note that step 1 must still be performed before step 2, which in turn
must be performed before step 3.  We want steps 1 through 3 to be
performed before steps 4 and 5.  In fact we expect C++ compilers to
evaluate this expression and generate their side effects in exactly
this order.

However the lvalue of i has not changed at all during the evaluation
of the statement.  Saying that the result of preincrement statement is
the lvalue of the object after it is modified is meaningless, because
incrementing the value does not change the lvalue, only the result of
lvalue to rvalue conversion.

Essentially, the equivalent of the C++ statement could be expressed in
C as:

   j = (++i, *(&i));

Except that the comma operator imposes a sequence point but no such
sequence point exists in the C++ statement.

So the wording of the standard does not prevent this order of
execution:

4.  Access the value of i (lvalue to rvalue conversion).
5.  Store this value into j.
1.  Access the value of i (lvalue to rvalue conversion).
2.  Add 1 to this value.
3.  Store this value into i.

Since the lvalue is the same in step 1 and step 4, completion of steps
1 through 3 are not required prior to steps 4 and 5, because steps 1
through 3 are not required to determine the lvalue to be accessed in
step 4.

Since there is no sequence point between any of these steps, and step
4 does not depend on any of the other steps to know what lvalue it
needs, there is nothing at all in the wording of the standard that
prevents the last order of execution above, except that we all know
what we want it to mean and how we want it to work.

None of this applies to user defined operators for user defined data
types, typically defined as returning references, because the function
must be called to get the reference:

   T i, j;

   j = ++i;

....is equivalent to:

   j = i.operator++();

The function must be called and the sequence point prior to its return
means that j can access *(&i) and be sure of getting the final result
of operator.

I think the basic problem is merely changing the definitions of
certain operators to yield lvalues instead of rvalues.  We really want
them to yield rvalues most of the time when used with built-in types,
as in j = ++i.

I think it would be much cleaner to change the definitions of those
operators back to yielding an rvalue but add wording to the effect
that it is permissible to take the address of these expressions and
doing so results in the address of the destination operand.

Then given a type T with operators defined:

  T& T::operator++();

....and an assignment operator that accepts a reference to const T, and
the code:

  T i, j;
  // initialize i to some value
  j = ++i;

The compiler can merely optimize away the temporary rvalue (object of
type T) by generating code exactly the way it does now.

The other choice is specifying that these operators yield an lvalue
but also impose a sequence point before any other expression or
subexpression that uses the lvalue (very nasty), or to specify that
they yield an rvalue sometimes and an lvalue sometimes depending on
what you do with the result (much, much nastier).

I hope I have made myself clearer this time.  If someone can point out
something I am missing in the standard that does impose an ordering
that I fail to see, I would appreciate it.

--
Jack Klein
Home: http://JK-Technology.Com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Ron Natalie <ron@sensor.com>
Date: Thu, 9 Aug 2001 18:27:13 GMT Raw View


> I don't think, that the standard implies a sequence point here. (Otherwise
> something like ++x + ++x wouldn't be implementation dependend.)

I think ++x + ++x is less of an issue than:

 x = y = z;

which is a common idiom but technically undefined behavior based on the
discussion in question (since the resultant y value is used for perposes
other than computation of the new value between sequence points).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: "Igor A. Goussarov" <igusarov@akella.com>
Date: Thu, 9 Aug 2001 18:27:50 GMT Raw View

Heinz Ozwirk wrote:
>
> [...] On the other hand
>
>     (x += 1) = 1 and &(x += 1)
>
> are legal in both languages. (Even thoudh the first one looks quite stupid
> to me.)

   The first expression is modifying 'x' twice without a sequence point
between modifications. It is not legal, see 5.4.

   As for the OP's question:
For any given X, which is an l-value and has no overloaded operator &,
I'm pretty sure that '*&X' is effectively the same as 'X'. But the OP's
thoughts bring the question would it be the same if 'X' is an l-value
result of an assignment or a preincrement expression. I.e.

1)
int      x = 13;
int      y = ++x;   // 'y' is 14, 'x' is 14

2)
int      x = 13;
int      y = *&++x; // 'x' is 14, but what is 'y'?

   And more importantly: is it unspecified according to the Standard?
This expression doesn't seem to fit in 5.4, because it is not accessing
the _prior_ value of 'x', so it is not explicitly classified as
undefined.
   But technically, the result of the '++x' subexpression can be stored
back to the memory where 'x' resides at any moment before the nearest
sequence point. If the compiler has decided not to write the result of
the preincrement back immediately, then the program may fetch an old
value of 'x' via taking its address and dereferencing that pointer. This
makes a nice example of '*&' operator combination not being an identity
operator (of course with none of them overloaded)...

Igor

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Thu, 9 Aug 2001 22:28:19 GMT Raw View

In article <3B72B5AB.73989F1@sensor.com>, Ron Natalie <ron@sensor.com>
writes
>       x = y = z;
>
>which is a common idiom but technically undefined behavior based on the
>discussion in question (since the resultant y value is used for perposes
>other than computation of the new value between sequence points).

I am not convinced. The program already knows the rvalue that it
assigned to y and has no need to access y to determine what it is.


Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]