Topic: Inconsistencies in the standard


Author: nmm1@cus.cam.ac.uk (Nick Maclaren)
Date: Wed, 1 Feb 2006 23:36:59 CST
Raw View
Since some people seem to have difficulty in accepting that even C++
is seriously ambiguous, here is a new introductory section to my
Objects diatribe.  It may clarify the depth of the problem.


This is a new version of a document that was circulated to the UK C
panel and to interested people.  I am not sure what it is proposing,
except that effective types need an improved specification, but it
attempts to describe a very serious problem.  The big change is that I
start with a simple, concrete and undeniable example, because it seems
that many people have difficulty in working through the document and,
consequentially, think that the problem is in the detail and not the
principle.


WHAT DO C/C++ MEAN HERE?
------------------------

The following definitions are quoted because they clearly indicate that
two pointer values refer to the same object if they are of the same type
and compare equal (i.e. contain the same address).  They are not the
only places in the standards where such a statement is made or implied.

C++: 3.9.2 "If an object of type T is located at an address A, a pointer
of type cv T* whose value is the address A is said to point to that
object, regardless of how the value was obtained."

C99: 6.6.9 "Two pointers compare equal if and only if ..., both are
pointers to the same object, ...."  [ The equivalent wording to C++ is
scattered over the C99 standard, and has to be derived. ]

The following definitions are quoted because they clearly indicate that
access to an object in one member via a pointer to another is undefined
behaviour.

C++: 9.5 "In a union, at most one of the data members can be active at
any one time, that is the value of at most one of the data members can
be stored in a union at any time."

C99: 6.7.2.1 "The value of at most one of the members can be stored in a
union object at any time."

Now consider:

typedef struct {int c; int d;} A;
typedef union {int a[2]; A b;} B;
B e;
int *p = (int *)&e, *q = &e.a[0], *r = &e.b.c;
int *x = (int *)&e+1, *y = &e.a[1], *z = &e.b.d;
assert (p == q && q == r && x == y && y == z);

So far, so good.  Now consider:

(*p = 0)+(*x = 0);          /* Legal */
(*q = 0)+(*y = 0);          /* Legal */
(*r = 0)+(*z = 0);          /* Legal */
(*q = 0)+(*z = 0);          /* Illegal */
(*r = 0)+(*y = 0);          /* Illegal */
(*p = 0)+(*y = 0);          /* Well, eagle? */
(*p = 0)+(*z = 0);          /* Well, eagle? */
(*q = 0)+(*x = 0);          /* Well, eagle? */
(*r = 0)+(*x = 0);          /* Well, eagle? */

Most people are agreed so far, and most people agree that the last four
examples are illegal (even though the wording of the standards do not
clearly forbid them).  The latter loophole could easily be closed, so
let's not worry about it, but we shall show that it doesn't help.  Now
consider:

void fred (int *i, int *j) { (*i = 0)+(*j = 0);}

fred(p,x); fred(q,y); fred(r,z);               /* Legal */
fred(q,z); fred(r,y);                          /* Illegal? */
fred(p,y); fred(p,z); fred(q,x); fred(r,x);    /* Well, eagle? */

Preserving the same legality/illegality would imply that pointers have a
history, and that two pointers that contain the same address do NOT
point to the same object.  C++ states that is not the case, and it
conflicts with a great many other sections in C99.  So we have a
situation where two pointers refer to the same object, but accesses
via those pointers are not equivalent.

Now consider:

int n = 1;
assert (offsetof(A,d) == sizeof(int));
e.a[0] = e.a[1] = 0;
memcpy((char *)&e.a+sizeof(int),&n,sizeof(int));
e.a[1];
e.a[0] = e.a[1] = 0;
memcpy((char *)&e.b+sizeof(int),&n,sizeof(int));
e.a[1];
e.a[0] = e.a[1] = 0;
memcpy((char *)&e.a+offsetof(A,d),&n,sizeof(int));
e.a[1];
e.a[0] = e.a[1] = 0;
memcpy((char *)&e.b+offsetof(A,d),&n,sizeof(int));
e.a[1];

Well, which of the above is illegal, if any?  And we can extend these
nasty examples to pointer differences, I/O, conversion to and from
integers and more.  The only sane approach is to say that they are
all legal.

We deduce that access via two union members at once is illegal only if
it is obvious, which implies that the eleventh commandment applies to
the C++ and C99 standards.  But most people deny this is the case.

I assert that the C++ and C99 standards are not self-consistent in
this area.


Regards,
Nick Maclaren.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Niklas Matthies <usenet-nospam@nmhq.net>
Date: Thu, 2 Feb 2006 10:56:40 CST
Raw View
On 2006-02-02 05:36, Nick Maclaren wrote:
:
> Preserving the same legality/illegality would imply that pointers have a
> history, and that two pointers that contain the same address do NOT
> point to the same object.  C++ states that is not the case, and it
> conflicts with a great many other sections in C99.  So we have a
> situation where two pointers refer to the same object, but accesses
> via those pointers are not equivalent.

Doesn't it follow from C++98:3.8p7 that whenever the active member of
the union changes, pointers (in)to the previously active member
automatically refer to the new active member (or subobjects thereof)
when the types match, hence losing their history?

-- Niklas Matthies

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Thu, 2 Feb 2006 20:43:38 GMT
Raw View
Nick Maclaren ha scritto:
> typedef struct {int c; int d;} A;
> typedef union {int a[2]; A b;} B;
> B e;
> int *p = (int *)&e, *q = &e.a[0], *r = &e.b.c;
> int *x = (int *)&e+1, *y = &e.a[1], *z = &e.b.d;
> assert (p == q && q == r && x == y && y == z);
>
> So far, so good.  Now consider:

Hold your horses. The expression (int*)&e has implementation-defined
value. According to 9.2/17 you can portably cast &e to int[2]* but not
to int*. Moreover, I'm not so sure that y is required to be equal to z.
Could you state the clauses that support your assertion?

> (*p = 0)+(*x = 0);          /* Legal */
> (*q = 0)+(*y = 0);          /* Legal */
> (*r = 0)+(*z = 0);          /* Legal */
> (*q = 0)+(*z = 0);          /* Illegal */
> (*r = 0)+(*y = 0);          /* Illegal */
> (*p = 0)+(*y = 0);          /* Well, eagle? */
> (*p = 0)+(*z = 0);          /* Well, eagle? */
> (*q = 0)+(*x = 0);          /* Well, eagle? */
> (*r = 0)+(*x = 0);          /* Well, eagle? */

I have completely lost you here. Please state exactly what do you mean
with "illegal". Is it ill-formed? Is it undefined behaviour? It makes no
sense to discuss the rest of the post if that is not clear.

Regards,

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: wade@stoner.com
Date: Fri, 3 Feb 2006 10:42:37 CST
Raw View
Nick Maclaren wrote:
> Now consider:
>
> typedef struct {int c; int d;} A;
> typedef union {int a[2]; A b;} B;
B b;

1) I don't believe the standard requires &b.b.d == &b.a[2].  However, I
think the standard should require that, and I think that it is true for
all reasonable implementations, so we'll pretend that requirement is
already in the standard.

2) Your examples fall into the category of things that are technically
illegal, but compilers behave as-if they were well defined (unless the
compiler is trying to be extra-picky).

IMO, the reason for the last-access rule is so that given

struct C{ int a; int b; };
struct D{ float a; int b; };
union U{C g; D h; };
int foo(const C& x, D&y)
{
  register int xv = x.b;
  ++y.b;
  return x.b+xv;
}

The compiler may optimize the return statement to
  return xv + xv;

In other words, when two structs don't have common initial sequences,
you are promising that their members (even members of the same type)
are not aliased (ala FORTRAN), and the compiler can perform some
'restrict' optimizations.

I think the people writing the standard had difficulty coming up with
the wording that would allow this optimization, but would also make
your examples legal.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: wkaras@yahoo.com
Date: Fri, 3 Feb 2006 22:23:57 CST
Raw View
Nick Maclaren wrote:
> Since some people seem to have difficulty in accepting that even C++
> is seriously ambiguous, here is a new introductory section to my
> Objects diatribe.  It may clarify the depth of the problem.
>
>
> This is a new version of a document that was circulated to the UK C
> panel and to interested people.  I am not sure what it is proposing,
> except that effective types need an improved specification, but it
> attempts to describe a very serious problem.  The big change is that I
> start with a simple, concrete and undeniable example, because it seems
> that many people have difficulty in working through the document and,
> consequentially, think that the problem is in the detail and not the
> principle.
>
>
> WHAT DO C/C++ MEAN HERE?
> ------------------------
>
> The following definitions are quoted because they clearly indicate that
> two pointer values refer to the same object if they are of the same type
> and compare equal (i.e. contain the same address).  They are not the
> only places in the standards where such a statement is made or implied.
>
> C++: 3.9.2 "If an object of type T is located at an address A, a pointer
> of type cv T* whose value is the address A is said to point to that
> object, regardless of how the value was obtained."
>
> C99: 6.6.9 "Two pointers compare equal if and only if ..., both are
> pointers to the same object, ...."  [ The equivalent wording to C++ is
> scattered over the C99 standard, and has to be derived. ]
>
> The following definitions are quoted because they clearly indicate that
> access to an object in one member via a pointer to another is undefined
> behaviour.
>
> C++: 9.5 "In a union, at most one of the data members can be active at
> any one time, that is the value of at most one of the data members can
> be stored in a union at any time."
>
> C99: 6.7.2.1 "The value of at most one of the members can be stored in a
> union object at any time."

It would be interesting if the Committee tried to add a comprehensive
glossary to the Standard.  I think they would end up rewriting alot of
stuff to avoid throwing around so many ambiguous terms.

I think the above is trying to say that if you perform a valid
operation on one union member, you can't in general assume
that the value of another member is unchanged or valid.

>
> Now consider:
>
> typedef struct {int c; int d;} A;
> typedef union {int a[2]; A b;} B;
> B e;
> int *p = (int *)&e, *q = &e.a[0], *r = &e.b.c;
> int *x = (int *)&e+1, *y = &e.a[1], *z = &e.b.d;
> assert (p == q && q == r && x == y && y == z);
>
> So far, so good.  Now consider:
>
> (*p = 0)+(*x = 0);          /* Legal */
> (*q = 0)+(*y = 0);          /* Legal */
> (*r = 0)+(*z = 0);          /* Legal */
> (*q = 0)+(*z = 0);          /* Illegal */
> (*r = 0)+(*y = 0);          /* Illegal */
> (*p = 0)+(*y = 0);          /* Well, eagle? */
> (*p = 0)+(*z = 0);          /* Well, eagle? */
> (*q = 0)+(*x = 0);          /* Well, eagle? */
> (*r = 0)+(*x = 0);          /* Well, eagle? */
>
> Most people are agreed so far, and most people agree that the last four
> examples are illegal (even though the wording of the standards do not
> clearly forbid them).  The latter loophole could easily be closed, so
> let's not worry about it, but we shall show that it doesn't help.  Now
> consider:
>
> void fred (int *i, int *j) { (*i = 0)+(*j = 0);}
>
> fred(p,x); fred(q,y); fred(r,z);               /* Legal */
> fred(q,z); fred(r,y);                          /* Illegal? */
> fred(p,y); fred(p,z); fred(q,x); fred(r,x);    /* Well, eagle? */
>
> Preserving the same legality/illegality would imply that pointers have a
> history, and that two pointers that contain the same address do NOT
> point to the same object.  C++ states that is not the case, and it
> conflicts with a great many other sections in C99.  So we have a
> situation where two pointers refer to the same object, but accesses
> via those pointers are not equivalent.
>
> Now consider:
>
> int n = 1;
> assert (offsetof(A,d) == sizeof(int));
> e.a[0] = e.a[1] = 0;
> memcpy((char *)&e.a+sizeof(int),&n,sizeof(int));
> e.a[1];
> e.a[0] = e.a[1] = 0;
> memcpy((char *)&e.b+sizeof(int),&n,sizeof(int));
> e.a[1];
> e.a[0] = e.a[1] = 0;
> memcpy((char *)&e.a+offsetof(A,d),&n,sizeof(int));
> e.a[1];
> e.a[0] = e.a[1] = 0;
> memcpy((char *)&e.b+offsetof(A,d),&n,sizeof(int));
> e.a[1];
>
> Well, which of the above is illegal, if any?  And we can extend these
> nasty examples to pointer differences, I/O, conversion to and from
> integers and more.  The only sane approach is to say that they are
> all legal.
>
> We deduce that access via two union members at once is illegal only if
> it is obvious, which implies that the eleventh commandment applies to
> the C++ and C99 standards.  But most people deny this is the case.
>
> I assert that the C++ and C99 standards are not self-consistent in
> this area.

Generally, standards are useful because they tell us when things
are required to be valid.  Standards rarely if ever *require* things to
be invalid.  It seems like you're saying the rules of unions  are
requiring
certain objects to be invalid under certain scenarios.  Your code above
is not required to be valid or invalid by the rules governing pointers
or
unions.  But I believe it is required to be valid by the rules govering
Plain Old Data.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: johnchx2@yahoo.com
Date: Sat, 4 Feb 2006 15:36:23 CST
Raw View
Nick Maclaren wrote:

> The following definitions are quoted because they clearly indicate that
> access to an object in one member via a pointer to another is undefined
> behaviour.
>
> C++: 9.5 "In a union, at most one of the data members can be active at
> any one time, that is the value of at most one of the data members can
> be stored in a union at any time."

The implication you draw from this sentence is anything but clear.

The sentence, read alone, is sufficiently ambiguous that it could give
rise to the implication you draw.  When confronted with an ambiguity,
we look at the rest of the standard.  Which says:

> C++: 3.9.2 "If an object of type T is located at an address A, a pointer
> of type cv T* whose value is the address A is said to point to that
> object, regardless of how the value was obtained."

Aha: all pointers are equal, none is more equal than others.  Pointers
have no "memory" of how they were obtained.

So the ambiguity is eliminated, since the possible interpretation you
posit is inconsistent with the rest of the standard.

So what does 9.5 mean?  More or less what it says: storing a value to
one member of a union ends the lifetime of any other data member.
(Another poster has already pointed out that 3.8/7 defines what happens
when the lifetime of a pointer's "pointee" ends, and another object's
lifetime begins at the same address.)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]