Thread

Topic: Aliasing rules question.

Author: osv@javad.ru (Sergei Organov)
Date: Wed, 7 May 2003 17:07:58 +0000 (UTC) Raw View

allan_w@my-dejanews.com (Allan W) writes:
> >  osv@javad.ru (Sergei Organov) wrote
> > > > Am I misreading/misunderstanding the clause?
>
> > allan_w@my-dejanews.com (Allan W) writes:
> > > I think you're missing something, anyway.
> > >
> > > Your function foo takes the address of a float and casts it to a
> > > pointer to struct A. Already you have UB.
> >
> osv@javad.ru (Sergei Organov) wrote
> > Is it UB or not doesn't matter as I'm asking only about UB with
> > respect to the aliasing rules (please note the subject). I've got
> > a lot of responses why this or those function invokes UB due to
> > other reasons, but the aliasing rules and, in particular, the
> > quote above, still aren't clear to me.
>
> If there was a smiley, I missed it.

No, there wasn't one. I just didn't want to discuss all the issues
simultaneously as my intent is to understand particular clause I've
quoted.

> Not sure what you're trying to get at here. It's undefined with
> respect to this, but not that...? Your program is only PARTLY
> broken, so you want the rest to behave correctly?

I believe it isn't broken. At least it doesn't invoke *undefined behavior*, I
think.

> Sorry, but once you've got UB, _everything_ is undefined.

I know, but conversion of pointer to one type to pointer to another type
doesn't invoke UB, the behavior is *unspecified* (see reinterpret_cast). So
in my understanding my example function has *unspecified behavior* as opposed
to *undefined behavior*.

>
> > Could you please quote the standard where it says something about "common
> > prefix" when defines the aliasing rules (clause 3.10/15)?
>
> Alas, not today -- I've just changed work locations, and don't have
> access to a copy of the standard from here. My personal copy is on
> my computer at home.

Not a problem, please take a look if/when you find time.

--
Sergei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Wed, 7 May 2003 04:26:11 +0000 (UTC) Raw View

>  osv@javad.ru (Sergei Organov) wrote
> > > Am I misreading/misunderstanding the clause?

> allan_w@my-dejanews.com (Allan W) writes:
> > I think you're missing something, anyway.
> >
> > Your function foo takes the address of a float and casts it to a
> > pointer to struct A. Already you have UB.
>
osv@javad.ru (Sergei Organov) wrote
> Is it UB or not doesn't matter as I'm asking only about UB with
> respect to the aliasing rules (please note the subject). I've got
> a lot of responses why this or those function invokes UB due to
> other reasons, but the aliasing rules and, in particular, the
> quote above, still aren't clear to me.

If there was a smiley, I missed it.

Not sure what you're trying to get at here. It's undefined with
respect to this, but not that...? Your program is only PARTLY
broken, so you want the rest to behave correctly?

Sorry, but once you've got UB, _everything_ is undefined.

> Could you please quote the standard where it says something about "common
> prefix" when defines the aliasing rules (clause 3.10/15)?

Alas, not today -- I've just changed work locations, and don't have
access to a copy of the standard from here. My personal copy is on
my computer at home.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 2 May 2003 14:37:09 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote in message
news:<87fzo3ggoq.fsf@osv.javad.ru>...
> kanze@gabi-soft.de (James Kanze) writes:
> > osv@javad.ru (Sergei Organov) wrote in message
>  [...]
> > > Sorry, I don't see such a clause in 3.10/15. I see it for unions,
> > > but not in aliasing rules :( The only apparently relevant clause
> > > is:

> > >   --an aggregate or union type that includes one of the
> > >     aforementioned types among its members (including,
> > >     recursively, a member of a sub-aggregate or contained union),

> > > but I must say that I don't understand this clause at all :( Could
> > > somebody give an example, please?

> > The example is easy:

> [... the easy example skipped ...]

> But it seems that the quoted clause allows, for example, this ugly
> trick:

> struct A {
>   unsigned int u;
>   float f;
> };
> unsigned int foo(float arg) { return ((A*)&arg)->u; }

> as a replacement for UB:

> unsigned int ub(float arg) { return *(float*)&arg; }

> as 'arg' in the foo() is accessed through aggregate type A that
> includes 'float' type (that is the dynamic type of the object) among
> its members.

> Am I misreading/misunderstanding the clause?

I think so.  (I'm not too sure where we are in the discussion.)  The
rule only allows for identical prefixes in struct's.  Since float is not
a struct, the rule doesn't apply.  And if you try to get away with
struct B { float f ; }, then the prefix isn't identical, so the rule
doesn't apply either.

I'm no longer sure about the exact reading of the rule, but I do know
why it is there, and the intent.  The intent is to support C code like
that I showed (and nothing else).  And in C++, the main intent is to be
C compatible in this respect, since IMHO, C++ has better ways of
handling the problem.

--
James Kanze             GABI Software             mailto:kanze@gabi-soft.fr
Conseils en informatique orient   e objet/
                           Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, T   l. : +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: osv@javad.ru (Sergei Organov)
Date: Tue, 29 Apr 2003 05:54:26 +0000 (UTC) Raw View

kanze@gabi-soft.de (James Kanze) writes:
> osv@javad.ru (Sergei Organov) wrote in message
[...]
> > Sorry, I don't see such a clause in 3.10/15. I see it for unions, but
> > not in aliasing rules :( The only apparently relevant clause is:
>
> >   --an  aggregate  or union type that includes one of the aforementioned
> >     types among its members (including, recursively, a member of a  sub-
> >     aggregate or contained union),
>
> > but I must say that I don't understand this clause at all :( Could
> > somebody give an example, please?
>
> The example is easy:

[... the easy example skipped ...]

But it seems that the quoted clause allows, for example, this ugly trick:

struct A {
  unsigned int u;
  float f;
};
unsigned int foo(float arg) { return ((A*)&arg)->u; }

as a replacement for UB:

unsigned int ub(float arg) { return *(float*)&arg; }

as 'arg' in the foo() is accessed through aggregate type A that includes
'float' type (that is the dynamic type of the object) among its members.

Am I misreading/misunderstanding the clause?

--
Sergei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Tue, 29 Apr 2003 18:49:11 +0000 (UTC) Raw View

> > osv@javad.ru (Sergei Organov) wrote in message
> > >   --an  aggregate  or union type that includes one of the aforementioned
> > >     types among its members (including, recursively, a member of a  sub-
> > >     aggregate or contained union),
> > > but I must say that I don't understand this clause at all :( Could
> > > somebody give an example, please?

osv@javad.ru (Sergei Organov) wrote
> But it seems that the quoted clause allows, for example, this ugly trick:
>
> struct A {
>   unsigned int u;
>   float f;
> };
> unsigned int foo(float arg) { return ((A*)&arg)->u; }
>
> as a replacement for UB:
>
> unsigned int ub(float arg) { return *(float*)&arg; }
>
> as 'arg' in the foo() is accessed through aggregate type A that includes
> 'float' type (that is the dynamic type of the object) among its members.
>
> Am I misreading/misunderstanding the clause?

I think you're missing something, anyway.

Your function foo takes the address of a float and casts it to a
pointer to struct A. Already you have UB. There is no "common prefix"
between a float, and a structure containing an unsigned int and float.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: osv@javad.ru (Sergei Organov)
Date: Wed, 30 Apr 2003 18:42:27 +0000 (UTC) Raw View

allan_w@my-dejanews.com (Allan W) writes:
> > > osv@javad.ru (Sergei Organov) wrote in message
> > > >
> > > > --an  aggregate  or union type that includes one of the aforementioned
> > > >   types among its members (including, recursively, a member of a  sub-
> > > >   aggregate or contained union),
> > > >
> osv@javad.ru (Sergei Organov) wrote
> > But it seems that the quoted clause allows, for example, this ugly trick:
> >
> > struct A {
> >   unsigned int u;
> >   float f;
> > };
> > unsigned int foo(float arg) { return ((A*)&arg)->u; }
> >
> > as a replacement for UB:
> >
> > unsigned int ub(float arg) { return *(float*)&arg; }
> >
> > as 'arg' in the foo() is accessed through aggregate type A that includes
> > 'float' type (that is the dynamic type of the object) among its members.
> >
> > Am I misreading/misunderstanding the clause?
>
> I think you're missing something, anyway.
>
> Your function foo takes the address of a float and casts it to a
> pointer to struct A. Already you have UB.

Is it UB or not doesn't matter as I'm asking only about UB with respect to the
aliasing rules (please note the subject). I've got a lot of responses why this
or those function invokes UB due to other reasons, but the aliasing rules and,
in particular, the quote above, still aren't clear to me.

> There is no "common prefix" between a float, and a structure containing an
> unsigned int and float.

Could you please quote the standard where it says something about "common
prefix" when defines the aliasing rules (clause 3.10/15)?

--
Sergei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 25 Apr 2003 18:03:38 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote in message
news:<878yu08d7i.fsf@osv.javad.ru>...
> kanze@gabi-soft.de (James Kanze) writes:
> > osv@javad.ru (Sergei Organov) wrote in message
> > > If I'm right about f1, does it mean that the following also is UB
> > > with respect to 3.10/15:

> > > struct B;
> > > extern "C" void foo(B* b);

> > > struct A { ...something... };

> > > void boo() {
> > >   A a;
> > >   foo((B*)&a);
> > > }

> > There is a special clause that any common prefix of the two structs
> > can be accessed, which would cover this specific case.

> Sorry, I don't see such a clause in 3.10/15. I see it for unions, but
> not in aliasing rules :( The only apparently relevant clause is:

>   --an  aggregate  or union type that includes one of the aforementioned
>     types among its members (including, recursively, a member of a  sub-
>     aggregate or contained union),

> but I must say that I don't understand this clause at all :( Could
> somebody give an example, please?

The example is easy:

    struct Node
    {
        enum NodeType type ;
    } ;

    struct IConstNode
    {
        enum NodeType type ;
        int value ;
    } ;

    struct OpNode
    {
        enum NodeType type ;
        int childCount ;
        Node* children[ maxChildCount ] ;
    } ;

    //  ...

    void
    dumpTree( Node* root )
    {
        static int  indent ;
        int         i ;
        switch ( root->type )
        {
        case iConst :
            printf( "%*sICONST: %d\n",
                      indent, "",
                                ((IConstNode*)root)->value ) ;
            break ;

        case add :
        case sub :
            //  ...
            printf( "%*sOP %s:\n", indent, "", opname[ root->type ] ) ;
            indent += 2 ;
            for ( i = 0 ; i < ((OpNode*)root)->childCount ; ++ i ) {
                dumpTree( ((OpNode*)root)children[ i ] ) ;
            }
            indent -= 2 ;
            break ;
        //  ...
        }
    }

If this looks more like C, guess what? It is.  This was the more or less
standard way of implementing "polymorphism" in C.  Note that it involves
accessing something that was almost certainly created as an IConstNode
or an OpNode as if it were a Node.

The above is legal and well defined in C.  While there are more
idiomatic ways of doing it in C++, I believe that the intent is that
this frequent C idiom also be legal and well defined in C++.

--
James Kanze             GABI Software             mailto:kanze@gabi-soft.fr
Conseils en informatique orient   e objet/
                           Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, T   l. : +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com (johnchx)
Date: Mon, 21 Apr 2003 21:52:15 +0000 (UTC) Raw View

a9804814@unet.univie.ac.at (Thomas Mang) wrote in message news:<3EA0826C.6D2BB5B3@unet.univie.ac.at>...
> johnchx schrieb:


> >  (1) A member becomes active when it is initialized or assigned to.
>
> Initialization of the members of a union is another problem,
> because 8.5 / 1 tells us built-in and types with trivial
> constructors are intialized when their memory is
> allocated.

I'm not sure I understand where you're seeing this.  Non-static
objects of built-in type which are declared without initializers have
indeterminate value (i.e. they are not initialized).  8.5/9

> How does that fit with a union?  Memory is
> allocated to hold the largest possible type. Are now
> any members initialized when the union itself is default-
> initialized?  If yes, which one? Or simply in order
> as they are declared, and the last one is the active?
> Or does a formal initialization of it's members not take
> place at all?

There ARE a couple of (possibly) serious holes in the rules for
initialization of unions.  But I'll start with the easy cases and work
up to the nasty ones.

It turns out to be rather tricky to get a POD-union default
initialized at all.  If an object of POD-union type has static storage
duration (and has no initializer), its storage will be zero
initialized.  If an object of POD-union type has automatic storage
duration (and no initializer), it will have indeterminate state (i.e.
it will be uninitialized).

A POD-union allocated via operator new can be default initialized:

  union MyUnion {
    int i;
    char* pc;
  };

  MyUnion* pmu = new MyUnion();  // note the ()

Default initialization for an object that is not non-POD and not an
array is defined as zero-initializing its storage.  Zero initializing
the storage for a union is defined as zero-initializing the storage
for the first member. (8.5/5)

So far, so good.  I don't see any ambiguity or difficulty.

But...there is trouble ahead.  Consider:

 void Foo() {
    MyUnion mu = {5};
 }

8.5.1/15 says "When a union is initialized with a brace-enclosed
initializer, the braces shall only contain an initializer for the
first member of the union."

However, 8.5.1/7 says "If there are fewer initializers in the list
than there are members in the aggregate, then each member not
explicitly initialized shall be default initialized (8.5)."  Ka-boom!

That means:

  void Foo() {
    MyUnion mu = {5};
    int j = mu.i;      // undefined!  j probably == 0!
    char* p = mu.pc;   // OK!
  }

I.e. if a union has more than one member, initializing it with a
braced initializer will almost certainly NOT do what the programmer
expects.  In particular, the value used in the initialization is lost
AND the last member is active, not the first.

Moreover:

  MyUnion cmu = MyUnion();

cmu is copy-initialized from a default-initialized temporary.  But the
implicitly defined copy-ctor will almost certainly exhibit undefined
behavior:

  // implicitly defined:
  MyUnion::MyUnion(const MyUnion& rhs) {
    i = rhs.i;     // ok
    pc = rhs.pc;   // undefined!  rhs.pc not active
  }


>  However, taking into account the sentence about how the values
>  are stored, together with some further implicit assumptions mixed
>  with a bit of imagination would can also lead to a (still
>  doubtful) assumption it's guaranteed to work.

Well...see James Kanze's post earlier in this thread, in which he
pointed out (my paraphrase):

(a) Once upon a time there was interest in checked implementations,
    which would actually keep track of the "active" member and
    do bad things to you if you tried to read an "inactive" member.
    This is perfectly legal in C++.  So, no matter what the
    physical layout of the union is, the implementation is allowed
    to notice what you're doing and crash your program (or whatever).
    Hardly what I'd call "guaranteed to work."

(b) The compiler isn't required to notice when you write a value to
    one member of a union, then read it from another, when it's
    implementing the "keep the value in a register" optimization.

Type punning via a union might work.  It may even be likely to work.
But I don't think there's an implicit guarantee there.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: osv@javad.ru (Sergei Organov)
Date: Wed, 16 Apr 2003 15:41:02 +0000 (UTC) Raw View

kanze@gabi-soft.de (James Kanze) writes:

[...]

> There are legitimate uses for "type punning".  The results are never
> portable, of course, but the C and C++ standards provide a standard way
> of doing it: by casting a pointer (reinterpret_cast in C++).

But now it seems that aliasing rules I've initially asked about make using of
pointers casting to be UB at some conditions, and union rules make using of
unions to be UB at the same conditions.

Probably I should explain where the initial question came from to make things
more clear:

Suppose I have to write code that is portable between architectures where
sizeof(float) == sizeof(unsigned int) = 4 and ints and floats have the same
(but unknown) bytes order and alignment requirement. The code should serialize
both floats and ints in big endian without invoking undefined behavior.

Now, for ints the solution is trivial:

void out_uint(unsigned int u) {
  out_byte(u >> 24);
  out_byte(u >> 16);
  out_byte(u >>  8);
  out_byte(u);
}

For floats I see three ways:

void out_float1(float f) {
  out_uint(*reinterpret_cast<unsigned int*>(&f));
}

void out_float2(float f) {
  union { float f; unsigned int u } u;
  u.f = f;
  out_uint(u.u);
}

void out_float3(float f) {
  unsigned int u;
  unsigned char const* pf = reinterpret_cast<unsigned char*>(&f);
  unsigned char* pu = reinterpret_cast<unsigned char*>(&u);
  for(int i = 0; i < sizeof(u); ++i) *pu++ = *pf++; // or memcpy() instead
  out_uint(u);
}

from which only the most ugly (and probably inefficient) out_float3() doesn't
invoke undefined behavior (or does it?) as out_float1() seems to invoke UB due
to aliasing rules (3.10/15) and out_float2() seems to invoke UB due to union
rules (9.5/1) :(

Is this an intention or just an oversight in the standard?

IMHO, the comment to the aliasing rules 3.10/15:

     "The intent of this list is to specify those circumstances in which
      an object may or may not be aliased."

suggests that out_float1() could be made valid as compiler has been
explicitly told that 'f' is indeed aliased by another type of pointer.

As for unions, why not change 9.5/1 so that it would be 'unspecified behavior'
instead of implicitly implied 'undefined behavior'? It would make at least
out_float2() to be a non-UB.

Anyway, the 9.5/1 probably needs to be formulated in a more clean and strict
way. For me, the quoted

   "In a union, at most one of the data members can be active at any
    time, that is, the value of at most one of the data members can
    be stored in a union at any time."

doesn't make it obvious that storing one member and fetching another is UB.
This is because the term "active" used in the first part is not well defined,
and the second part doesn't tell about fetching at all. For me this quote
sounds more like "you can't store to a field and expect that (previously
stored) value of another field doesn't change."

--
Sergei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: osv@javad.ru (Sergei Organov)
Date: Wed, 16 Apr 2003 17:54:24 +0000 (UTC) Raw View

spambo_steffan_lankinensucks@hotmail.com ("Bo-Staffan Lankinen") writes:
>
> Sergei Organov writes:
> >
> > Could somebody please explain which of the three functions below invoke
> > "undefined behavior" due to the aliasing rules defined in the Standard
> > (those that could be found in CD2 in [basic.lval] 15)?
>
> It's mentioned in 3.10/15.
>
> > union U { long l; int i };
> >
> > int f1(long l) { return *(int*)&l;  }
> > int f2(long l) { return ((U*)&l)->i; }
> > int f3(long l) { U u; u.l=l; return u.i; }
> >
> > My own understanding is that f3() is OK, f1() invokes "undefined
> > behavior", and I'm unsure about f2().
>
> You're right about f1 and f3. f2 also invokes undefined behavior with
> respect to aliasing.

If I'm right about f1, does it mean that the following also is UB with respect
to 3.10/15:

struct B;
extern "C" void foo(B* b);

struct A { ...something... };

void boo() {
  A a;
  foo((B*)&a);
}

If so, it means calling some useful C functions in a usual way from C++ is UB,
for example, UNIX socket routines 'bind' and 'getsockname' fall into this
category as they use fake 'struct sockaddr *' as one of their arguments :(

--
Sergei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: a9804814@unet.univie.ac.at (Thomas Mang)
Date: Wed, 16 Apr 2003 19:28:52 +0000 (UTC) Raw View

Allan W schrieb:

> a9804814@unet.univie.ac.at (Thomas Mang) wrote
> > > > union U { long l; int i; };
> > > > int f3(long l) { U u; u.l=l; return u.i; }
>
> > Allan W schrieb:
> > > The standard does say in 9.5/1:
> > >     In a union, at most one of the data members can be active at any
> > >     time, that is, the value of at most one of the data members can
> > >     be stored in a union at any time. [Note: one special guarantee is
> > >     made in order to simplify the use of unions: if a POD-union
> > >     contains several POD-structs that share a common initial sequence
> > >     (9.2), and if an object of this POD-union type contains one of
> > >     the POD-structs, it is permitted to inspect the common initial
> > >     sequence of any of POD-struct members; see 9.2. ]
> > >
> > > When you store L into the local union, that's the active member. When
> > > you access I in the union, you evoke UB.
> >
> > I don't think the standard says that:

Okay, although I have the true feeling your comment has a flavor of personal
attack, I'll answer your questions, at least the ones on-topic.

>
>
> Why would I lie?never accused you of lying.

I never accused you of lying.
[But it seems your are now accusing me of something I never wrote]

>  I gave a direct quote from the standard. I also
> listed the exact section number so that you could look it up for
> yourself if you don't believe me. Please do; I'll wait. It's chapter
> 9.5, paragraph 1. Search for "[class.union]". Or go to printed page
> 158 (in the PDF file, it's page 184).

Just read again the chapter. No word of undefined behavior. No different
interpretation than at the time I wrote my reply.

>
>
> Did you read it yet? The standard _DOES_ say that!

No it doesn't.
[At least I don't read it this way. - You are welcome to answer now that you
read it this way, but that won't change MY interpretation of the paragraph in
question.]

>
>
> > What if on the OP machine int and long are represented the same way?
>
> Then it will probably work correctly on that one machine (although the
> standard doesn't guarantee that either!). Undefined Behavior doesn't
> mean that you won't happen to get exactly what you expect -- it means
> that you can't say for sure what you will get. What if you port that
> same program to a machine where int and long are NOT represented the
> same way?

Where did I mention something like porting?
I said specifically "on the OP machine".
Before accusing me of not being able to read the standard [or better say
interpret it the way you interpret it], please read first carefully what I
really wrote and reply to this.

>
>
> A program that conforms to the standard, will work correctly with any
> compiler that conforms to the standard. That's the very definition of
> "standard."
>
> > Yes, only one member can be active. But does this imply reading
> > other members (with exactly the same bit representation as the
> > active one) is ill - formed?
>
> Yes. See 9.5/1 (quoted above). I'm not lying! -- But you don't
> have to take my word for it: buy your own copy for $18.

Again, where did I say you lied?

>
>
> > My reading of the standard doesn't make this clear - neither
> > that such an action [assuming identical bit-representation] is
> > defined, nor that such an action is ill-formed.
>
> Read it again. Pay particular attention to 9.5/1 (quoted above).
> According to the standard, even obvious cases like this:
>     union U4 { int i; int j; };
>     int f4(int x) { U4 u; u.i=x; return u.j; }
> are illegal.

Read again what I wrote:
I said "My reading of the standard doesn't make this clear".

Maybe you can point me directly to the section in 9.5/1 that makes it clear
in your eyes?
In my eyes, the standard uses terms not defined elsewhere: "active",
"stored", .....

After all, in my very personal interpretation it is simply not clear for me
of what actions with unions are allowed and what not.

>
>
> On the other hand, local implementations work differently. I suspect
> that my f4() example would work "correctly" on any platform that I
> regularly use.
>
> As for the original example -- despite what the standard says,
> I think this is likely to work the way you expect on most (or even
> all) platforms, if (and only if) your assumption is correct. But
> is it a safe assumption? Can you know for a fact that int and long
> have exactly the same representation on every computer you use
> either now or in the future?

I *think* the standard backs my expectation. I also made clear that the types
in questions would need to share the same representation [size, bit pattern,
alignment]. And I have never said it to be portable.

>
>
> In general you cannot know that. There's plenty of programs written
> for Windows 3.0 that no longer exist, because they would nead a
> near-complete rewrite to work on Windows 95. (The 16-bit to 32-bit
> migration was NOT painless.)
>
> In particular -- if long is larger than int, then f3() is very
> likely to return the least significant bits of the long, which was
> apparently what was intended. But it could also just as easily
> return the MOST significant bits of the long!
>

Again, read one more time what I wrote. Or maybe we are communication in a
different language?

>
> Furthermore, if you COULD know that EVERY computer you will EVER
> use, had the same representation for int and long -- then why would
> you need two members of the union?
>
>     union U5 { long l; };
>     long f5(long l) { U5 u; u.l=l; return u.l; }
>
> If int has the same representation as long, this is identical to
> f3() except that the standard guarantees it will work correctly.
>
> > A Defect?
>
> Not in my opinion. I'd say it was quite clear, but some people
> choose not to pay attention because it's not as convenient to write
> code that way.

In my opinion (which was even marked with an '?'), yes.

First, the standard uses words like "active" and "storing", but I couldn't
find exact definitions for it. Especially the "active" interests me. Last
write-access????

Second, it says "The size of a union is sufficient to contain the largest of
its data members. Each data member is allocated as if it were the sole member
of a struct".
For me, this reads that each object of the union is represented the same way
[bit pattern, size, alignment] as if it were member of a struct, but with
[potentially] sharing memory. So reading such a bit pattern should be fine
for all types matching the exact properties [bit pattern, size,
alignment...], or that is which are represented the same way as sole members
of a struct.

My conclusion is that on a given machine (note, I never told it to be 100%
portable, but rather implementation dependent - but not undefined behavior)
that represents 2 types exactly the same way as sole members of a struct -
including even bit patterns - then reading of any of these values should be
defined as long as one of them is the "active" one, whatever the standard
means by this word.

best regards,

Thomas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: spambo_steffan_lankinensucks@hotmail.com ("Bo-Staffan Lankinen")
Date: Thu, 17 Apr 2003 17:44:34 +0000 (UTC) Raw View

> If I'm right about f1, does it mean that the following also is UB with
respect
> to 3.10/15:
>
> struct B;
> extern "C" void foo(B* b);
>
> struct A { ...something... };
>
> void boo() {
>   A a;
>   foo((B*)&a);
> }
>
> If so, it means calling some useful C functions in a usual way from C++ is
UB,
> for example, UNIX socket routines 'bind' and 'getsockname' fall into this
> category as they use fake 'struct sockaddr *' as one of their arguments :(

I'm not familiar with the UNIX socket routines but I think it's possible the
code-snippet you submitted invokes UB with repect to aliasing. IMO, it's
impossible to determine whether it invokes UB without knowing the definition
of foo, even though the type of B invalidates the aliasing rules with
respect to the type of A. For instance, I think aliasing would be enabled if
the pointer to B, in the body of foo, is converted to a pointer to char or
unsigned char that is dereferenced.

Bo-Staffan


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: brangdon@cix.co.uk (Dave Harris)
Date: Thu, 17 Apr 2003 17:46:00 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote (abridged):
> void out_float3(float f) {
>   unsigned int u;
>   unsigned char const* pf = reinterpret_cast<unsigned char*>(&f);
>   unsigned char* pu = reinterpret_cast<unsigned char*>(&u);
>   for(int i = 0; i < sizeof(u); ++i) *pu++ = *pf++; // or memcpy()
> instead
>   out_uint(u);
> }
>
> from which only the most ugly (and probably inefficient) out_float3()
> doesn't invoke undefined behavior (or does it?)

It looks like undefined behaviour to me although I can't quite quote
chapter and verse to prove it. Here's my reasoning.

You may know about "signalling floats" - these are special floating point
bit-patterns which yield undefined behaviour when accessed. Eg on IEEE
architectures they can throw a floating point exception. As I understand
it, the standard also allows for "signalling ints". That is the intend of
the draft $3.9.1/1:
   For  character types, all bits of the object representation
   participate in the value representation. For unsigned character
   types, all possible bit patterns of the value representation
   represent numbers. These requirements do not hold for other types.

Thus the byte representation of an unsigned int may include bit patterns
which do not correspond to any actual unsigned int. Accessing one of these
leads to undefined behaviour (that is the part I can't find a citation
for). It might be a signalling int.

And further, it is possible that a single bit pattern will represent both
a non-signalling float and a signalling int. That is part of the reasoning
behind 3.10/15. It means you cannot take an arbitrary float bit pattern
and access it as an unsigned int without undefined behaviour, by any
route. There may be no valid unsigned int which corresponds to the float
for you to convert to.

You can convert to unsigned chars (which are guaranteed never to trap) and
output them directly. Alternatively you can find something corresponding
to your out_uint() routine, which uses portable maths functions to get a
unique byte representation. Something like:

    void out_float1(float f) {
        int exp;
        int mantissa = frexp( f, &exp ) * 0x1000000;
        out_byte( exp );
        out_byte( mantissa >> 16 );
        out_byte( mantissa >> 8 );
        out_byte( mantissa );
    }

might do. This assumes an 8-bit exponent and 24 bits of mantissa, which
corresponds to 32-bit IEEE. This code is untested and probably wrong, as
I've not thought through how the implied leading 1 in the mantissa should
be handled, but might serve as a starting point.

A third alternative is to stop caring about the undefined behaviour.
An implementation is allowed to define its own behaviour where the
standard says undefined, and your best bet may be to rely on what your
local implementations do. You are already assuming 32-bit floats etc.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      brangdon@cix.co.uk      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Simon@sfbone.fsnet.co.uk ("Simon F Bone")
Date: Thu, 17 Apr 2003 17:46:21 +0000 (UTC) Raw View

Sergei Organov <osv@javad.ru> wrote ...
> If I'm right about f1, does it mean that the following also is UB with
respect
> to 3.10/15:
>
> struct B;
> extern "C" void foo(B* b);
>
> struct A { ...something... };
>
> void boo() {
>   A a;
>   foo((B*)&a);
> }
>
> If so, it means calling some useful C functions in a usual way from C++ is
UB,
> for example, UNIX socket routines 'bind' and 'getsockname' fall into this
> category as they use fake 'struct sockaddr *' as one of their arguments :(
>

As long as struct A is a POD type this should be OK, because layout
compatibility
with C is guaranteed. If you add something to struct A (such as a virtual
destructor)
that can't be in a POD you will be in trouble. This means old API's like
those you
referred to are OK.

Simon Bone



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Thu, 17 Apr 2003 21:47:48 +0000 (UTC) Raw View

>  Allan W schrieb:
> > > > The standard does say in 9.5/1:

a9804814@unet.univie.ac.at (Thomas Mang) wrote
> > > I don't think the standard says that:

> > Why would I lie?

> I never accused you of lying.
> [But it seems your are now accusing me of something I never wrote]

You didn't use the word "lie."

I directly quoted the standard. You didn't say (at that point) "I
understood it to mean something different." You said "I don't think
the standard says that." But I wasn't expressing an opinion, I was
stating a fact -- which was either true or not true. When you
expressed doubt, I took this to mean that you doubted my veracity,
for reasons I could not fathom -- after all, you could easily have
verified this fact for yourself.

> Where did I mention something like porting?
> I said specifically "on the OP machine".

There is no such thing as a program that is standard compliant "on
the OP machine." Except for the passages marked
"implementation-defined," a standard-compliant program will run
identically on all machines, by definition.

Please remember that this is not comp.lang.c++ -- this is a standards
newsgroup. It's okay with me if you aren't always concerned about the
standard -- as far as I know, all production-quality programs need at
least a few exceptions from purely-standard code. But messages about
such practices rarely belong here, unless you're proposing that the
next standard make such practices legal.

In the case of unions of int and long, I think you can see that it's
very unlikely to ever have standard-specified results. (We would have
to mandate that int and long are the same size, or at least mandate
that the machine use big-endian or little-endian format -- neither of
which is practical.)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Thu, 17 Apr 2003 22:00:50 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote
> Does it mean that the following also is UB with respect
> to 3.10/15:
>
> struct B;
> extern "C" void foo(B* b);
>
> struct A { ...something... };
>
> void boo() {
>   A a;
>   foo((B*)&a);
> }

Yes, this is UB with respect to the C++ standard.

> If so, it means calling some useful C functions in a usual way from C++ is UB,
> for example, UNIX socket routines 'bind' and 'getsockname' fall into this
> category as they use fake 'struct sockaddr *' as one of their arguments :(

C++ leaves this behavior undefined. That doesn't mean that the UNIX
API can't go ahead and define this behavior. Please note that there are
platforms that do not support sockets; that does not mean that they
violate the C++ standard.

Similarly, the examples in your OP will work the way you expect on a
great many platforms. That doesn't mean that the C++ standard
guarantees it.

There is nothing wrong with writing platform-specific code! Just
because the C++ standard doesn't sanction it doesn't mean it isn't
useful. Just because the C++ standard doesn't guarantee any
particular result, doesn't mean it's against the law!

Here's an analogy: the management of your local grocery store won't
guarantee that ++a does anything useful in any program. That doesn't
mean you shouldn't do it!

You should segregate platform-specific code from the rest of
your program, in case you ever have to port it -- if you don't
segregate it, you should at least note it with a comment. And
yes, this includes code that directly calls Unix sockets. I
recommend designing your own "TCP/IP" (or whatever) classes, and
using Unix sockets only in the implementation file -- that way,
if you ever port to a machine that doesn't have Unix sockets,
you only need to rewrite that one file.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: a9804814@unet.univie.ac.at (Thomas Mang)
Date: Fri, 18 Apr 2003 00:01:15 +0000 (UTC) Raw View


Allan W schrieb:

> >  Allan W schrieb:
> > > > > The standard does say in 9.5/1:
>
> a9804814@unet.univie.ac.at (Thomas Mang) wrote
> > > > I don't think the standard says that:
>
> > > Why would I lie?
>
> > I never accused you of lying.
> > [But it seems your are now accusing me of something I never wrote]
>
> You didn't use the word "lie."
>
> I directly quoted the standard. You didn't say (at that point) "I
> understood it to mean something different." You said "I don't think
> the standard says that." But I wasn't expressing an opinion, I was
> stating a fact -- which was either true or not true. When you
> expressed doubt, I took this to mean that you doubted my veracity,
> for reasons I could not fathom -- after all, you could easily have
> verified this fact for yourself.

Ah!
Now it's clear. I was referring to your 2 sentences following directly after
your quote, and you thought I was referring to the quote. That's the problem
when one (me, precisely) is expressing something not as clearly as it should
be done to avoid misunderstandings in the first place [I never intended to
cause confusion]



>
>
> > Where did I mention something like porting?
> > I said specifically "on the OP machine".
>
> There is no such thing as a program that is standard compliant "on
> the OP machine." Except for the passages marked
> "implementation-defined," a standard-compliant program will run
> identically on all machines, by definition.
>
> Please remember that this is not comp.lang.c++ -- this is a standards
> newsgroup. It's okay with me if you aren't always concerned about the
> standard -- as far as I know, all production-quality programs need at
> least a few exceptions from purely-standard code. But messages about
> such practices rarely belong here, unless you're proposing that the
> next standard make such practices legal.

Took note of it.

Well, I am concerned a lot about the standard, much more than you might
suggest here. But it seems I interprete some sections of the standard
differently - or as is the case with unions, I am also partially confused as
to how interprete it, see below


>
>
> In the case of unions of int and long, I think you can see that it's
> very unlikely to ever have standard-specified results. (We would have
> to mandate that int and long are the same size, or at least mandate
> that the machine use big-endian or little-endian format -- neither of
> which is practical.)

Correct.
Still, I don't know wether this is defined or not:


union someUnion{
int a;
int b;
};

someUnion aUnion;
aUnion.a = 12;
int c = AUnion.b;


>From what you posted, it is clearly undefined behavior for you. Still, there
is this very much sentence in 9.5/1 which causes doubts in my eyes: "Each
data member is allocated as if it were the sole member
of a struct".
Does this tell us that the variables a and b have to be laid out in a way
that they exactly overlap?
I know (better say believe, couldn't find the section now) there is no
guarantee in C++ that the layout of two POD-structs each containing a single
int member has to be the same.

But I interpret (with doubts, please note) the quoted sentence in the
standard in a way that in the union given above, a and b share exactly the
same memory. Why? Because we are not concerned about the layout of the
overall struct, but rather of just a single member, and the types of the 2
members are identical. And AFAIK, the C standard gives some useful guarantees
about preserved type representations.
Then there is also this wonderful first sentence of the paragraph:
"In a union, at most one of the data members can be active at any time, ...".
What does 'active' here mean? Which member is active after
default-construction? None? When does one become active? When is one not
active any more? Is it possible that after one was active,none is active
later?

It seems for me there is (too) much room for personal interpretation. And
obviously, what you interpret as undefined behavior did I (at least parts of
it) not read this way.


best regards


Thomas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com (johnchx)
Date: Fri, 18 Apr 2003 19:33:36 +0000 (UTC) Raw View

a9804814@unet.univie.ac.at (Thomas Mang) wrote i

> Then there is also this wonderful first sentence of the paragraph:
> "In a union, at most one of the data members can be active at any time, ...".
> What does 'active' here mean?

Well, you elided the part of the sentence that tell us.  Picking up
where you left off, it says, "...that is, the value of at most one of
the data members can be stored in a union at any time."

This seems entirely clear: if a union has two data members, a and b,
the union stores EITHER the value of a OR the value of b (or neither
one).

> Which member is active after
> default-construction? None?

Possibly none (note the phrase "at most" above) (if, for example, the
union has a user-defined ctor which is a no-op).  Otherwise, if the
union has a user-defined ctor, it would be the member last
initialized/assigned to by the ctor.  If there is no user-defined
ctor, 8.5.1 applies.

> When does one become active?

Well...let's make a list of the possible reasonable answers.

 (1) A member becomes active when it is initialized or assigned to.

 (2) ... I'm out.  Is there a reasonable alternative?

So, I'm stuck with (1).  (I don't include "A member becomes active
when its value is read," because this would render the "at most one
active member" rule entirely meaningless, and I don't believe that it
is reasonable to conclude that it is meaningless.)

> When is one not active any more?

Again, let's list the possible reasonable answers:

  (1) A member becomes inactive when another member is
      initialized or assigned to.

  (2) ...I'm out again.

So, I'm stuck with (1).

> Is it possible that after one was active,none is active
> later?

I don't see how.  Do you?

> It seems for me there is (too) much room for personal interpretation.

I just don't see the that much freedom here.  You are correct to point
out that the standard doesn't explicitly answer the questions you
raise.  But is there really a large set of reasonable alternatives?  I
suspect that these things weren't made explicit simply because they
fell into what the standards committee felt was obvious by
implication.

To get back to your example:

union someUnion{
  int a;
  int b;
};

someUnion aUnion;
aUnion.a = 12;
int c = AUnion.b;  // (A)

The value of b is not stored in aUnion.  So there's no telling what,
if anything, gets stored in c.  This conclusion does not depend on the
physical layout of someUnion.  (What gets stored in c MAY depend on
the physical layout of someUnion...that's how undefined behavior
works!)

Maybe I'm just experiencing a failure of the imagination...perhaps
there are reasonable alternative answers to your questions that simply
haven't occurred to me....

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: a9804814@unet.univie.ac.at (Thomas Mang)
Date: Fri, 18 Apr 2003 23:06:35 +0000 (UTC) Raw View


johnchx schrieb:

> a9804814@unet.univie.ac.at (Thomas Mang) wrote i
>
> > Then there is also this wonderful first sentence of the paragraph:
> > "In a union, at most one of the data members can be active at any time, ...".
> > What does 'active' here mean?
>
> Well, you elided the part of the sentence that tell us.  Picking up
> where you left off, it says, "...that is, the value of at most one of
> the data members can be stored in a union at any time."

Hm, I thought the part of the sentence you quoted was referring to the beginning of
the sentence I quoted, not to the 'active'.
Reading it the way you suggest here gives some light on active, although some
explicit note what exactly active is; meaning to define 'active' in it's own
senctence would be useful in order to avoid misunderstandings.

>
>
> This seems entirely clear: if a union has two data members, a and b,
> the union stores EITHER the value of a OR the value of b (or neither
> one).
>
> > Which member is active after
> > default-construction? None?
>
> Possibly none (note the phrase "at most" above) (if, for example, the
> union has a user-defined ctor which is a no-op).  Otherwise, if the
> union has a user-defined ctor, it would be the member last
> initialized/assigned to by the ctor.  If there is no user-defined
> ctor, 8.5.1 applies.
>
> > When does one become active?
>
> Well...let's make a list of the possible reasonable answers.
>
>  (1) A member becomes active when it is initialized or assigned to.

Initialization of the members of a union is another problem, because 8.5 / 1 tells
us built-in and types with trivial constructors are intialized when their memory is
allocated. How does that fit with a union? Memory is allocated to hold the largest
possible type. Are now any members initialized when the union itself is default-
initialized? If yes, which one? Or simply in order as they are declared, and the
last one is the active?Or does a formal initialization of it's members not take
place at all?


>
>
>  (2) ... I'm out.  Is there a reasonable alternative?
>
> So, I'm stuck with (1).  (I don't include "A member becomes active
> when its value is read," because this would render the "at most one
> active member" rule entirely meaningless, and I don't believe that it
> is reasonable to conclude that it is meaningless.)
>
> > When is one not active any more?
>
> Again, let's list the possible reasonable answers:
>
>   (1) A member becomes inactive when another member is
>       initialized or assigned to.

>

>
>
>   (2) ...I'm out again.
>
> So, I'm stuck with (1).
>
> > Is it possible that after one was active,none is active
> > later?
>
> I don't see how.  Do you?

>
>
> > It seems for me there is (too) much room for personal interpretation.
>
> I just don't see the that much freedom here.  You are correct to point
> out that the standard doesn't explicitly answer the questions you
> raise.  But is there really a large set of reasonable alternatives?  I
> suspect that these things weren't made explicit simply because they
> fell into what the standards committee felt was obvious by
> implication.

Yes and no. I probably depends what one considers to be reasonable by implication.
You are certainly very reasonable with your answers to my questions, and personally
I had come to the same conclusions. However, this considered a fairly high amount
of implicit conclusions. It would be nice if the standard were more explicit.


>
>
> To get back to your example:
>
> union someUnion{
>   int a;
>   int b;
> };
>
> someUnion aUnion;
> aUnion.a = 12;
> int c = AUnion.b;  // (A)
>
> The value of b is not stored in aUnion.  So there's no telling what,
> if anything, gets stored in c.  This conclusion does not depend on the
> physical layout of someUnion.  (What gets stored in c MAY depend on
> the physical layout of someUnion...that's how undefined behavior
> works!)

Taking your interpretation of active and all implicit assumptions into ccount, this
makes sense.
However, taking into account the sentence about how the values are stored, together
with some further implicit assumptions mixed with a bit of imagination would can
also lead to a (still doubtful) assumption it's guaranteed to work.


best regards,

Thomas



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: a9804814@unet.univie.ac.at (Thomas Mang)
Date: Fri, 11 Apr 2003 18:50:30 +0000 (UTC) Raw View


Allan W schrieb:

>
> > int f3(long l) { U u; u.l=l; return u.i; }
>
> The standard does say in 9.5/1:
>     In a union, at most one of the data members can be active at any
>     time, that is, the value of at most one of the data members can
>     be stored in a union at any time. [Note: one special guarantee is
>     made in order to simplify the use of unions: if a POD-union
>     contains several POD-structs that share a common initial sequence
>     (9.2), and if an object of this POD-union type contains one of
>     the POD-structs, it is permitted to inspect the common initial
>     sequence of any of POD-struct members; see 9.2.
>
> When you store L into the local union, that's the active member. When
> you access I in the union, you evoke UB.

I don't think the standard says that:
What if on the OP machine int and long are represented the same way? Yes, only
one member can be active. But does this imply reading other members (with exactly
the same bit representation as the active one) is ill - formed?

My reading of the standard doesn't make this clear - neither that such an action
[assuming identical bit-representation] is defined, nor that such an action is
ill-formed.
A Defect?


regards,

Thomas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Wed, 16 Apr 2003 00:01:16 +0000 (UTC) Raw View

a9804814@unet.univie.ac.at (Thomas Mang) wrote
> > > union U { long l; int i; };
> > > int f3(long l) { U u; u.l=l; return u.i; }

> Allan W schrieb:
> > The standard does say in 9.5/1:
> >     In a union, at most one of the data members can be active at any
> >     time, that is, the value of at most one of the data members can
> >     be stored in a union at any time. [Note: one special guarantee is
> >     made in order to simplify the use of unions: if a POD-union
> >     contains several POD-structs that share a common initial sequence
> >     (9.2), and if an object of this POD-union type contains one of
> >     the POD-structs, it is permitted to inspect the common initial
> >     sequence of any of POD-struct members; see 9.2. ]
> >
> > When you store L into the local union, that's the active member. When
> > you access I in the union, you evoke UB.
>
> I don't think the standard says that:

Why would I lie? I gave a direct quote from the standard. I also
listed the exact section number so that you could look it up for
yourself if you don't believe me. Please do; I'll wait. It's chapter
9.5, paragraph 1. Search for "[class.union]". Or go to printed page
158 (in the PDF file, it's page 184).

No, don't keep reading until you've looked it up for yourself.
Buy a copy if you need to -- it's only $18.

Did you read it yet? The standard _DOES_ say that!

> What if on the OP machine int and long are represented the same way?

Then it will probably work correctly on that one machine (although the
standard doesn't guarantee that either!). Undefined Behavior doesn't
mean that you won't happen to get exactly what you expect -- it means
that you can't say for sure what you will get. What if you port that
same program to a machine where int and long are NOT represented the
same way?

A program that conforms to the standard, will work correctly with any
compiler that conforms to the standard. That's the very definition of
"standard."

> Yes, only one member can be active. But does this imply reading
> other members (with exactly the same bit representation as the
> active one) is ill - formed?

Yes. See 9.5/1 (quoted above). I'm not lying! -- But you don't
have to take my word for it: buy your own copy for $18.

> My reading of the standard doesn't make this clear - neither
> that such an action [assuming identical bit-representation] is
> defined, nor that such an action is ill-formed.

Read it again. Pay particular attention to 9.5/1 (quoted above).
According to the standard, even obvious cases like this:
    union U4 { int i; int j; };
    int f4(int x) { U4 u; u.i=x; return u.j; }
are illegal.

On the other hand, local implementations work differently. I suspect
that my f4() example would work "correctly" on any platform that I
regularly use.

As for the original example -- despite what the standard says,
I think this is likely to work the way you expect on most (or even
all) platforms, if (and only if) your assumption is correct. But
is it a safe assumption? Can you know for a fact that int and long
have exactly the same representation on every computer you use
either now or in the future?

In general you cannot know that. There's plenty of programs written
for Windows 3.0 that no longer exist, because they would nead a
near-complete rewrite to work on Windows 95. (The 16-bit to 32-bit
migration was NOT painless.)

In particular -- if long is larger than int, then f3() is very
likely to return the least significant bits of the long, which was
apparently what was intended. But it could also just as easily
return the MOST significant bits of the long!

Furthermore, if you COULD know that EVERY computer you will EVER
use, had the same representation for int and long -- then why would
you need two members of the union?

    union U5 { long l; };
    long f5(long l) { U5 u; u.l=l; return u.l; }

If int has the same representation as long, this is identical to
f3() except that the standard guarantees it will work correctly.

> A Defect?

Not in my opinion. I'd say it was quite clear, but some people
choose not to pay attention because it's not as convenient to write
code that way.

But perhaps you don't believe that I just wrote that?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 16 Apr 2003 00:01:31 +0000 (UTC) Raw View

a9804814@unet.univie.ac.at (Thomas Mang) wrote in message
news:<3E96EE62.E8531DAF@unet.univie.ac.at>...
> Allan W schrieb:

> > > int f3(long l) { U u; u.l=l; return u.i; }

> > The standard does say in 9.5/1:
> >     In a union, at most one of the data members can be active at any
> >     time, that is, the value of at most one of the data members can
> >     be stored in a union at any time. [Note: one special guarantee
> >     is made in order to simplify the use of unions: if a POD-union
> >     contains several POD-structs that share a common initial
> >     sequence (9.2), and if an object of this POD-union type contains
> >     one of the POD-structs, it is permitted to inspect the common
> >     initial sequence of any of POD-struct members; see 9.2.

> > When you store L into the local union, that's the active
> > member. When you access I in the union, you evoke UB.

> I don't think the standard says that:

He just quoted where it does.

> What if on the OP machine int and long are represented the same way?
> Yes, only one member can be active. But does this imply reading other
> members (with exactly the same bit representation as the active one)
> is ill - formed?  My reading of the standard doesn't make this clear -
> neither that such an action [assuming identical bit-representation] is
> defined, nor that such an action is ill-formed.

One possible effect of undefined behavior is that it works, or seems to
work.  If your compiler gives you such a guarantee, the behavior is
defined for that compiler.  It is undefined behavior according to the
standard.

This is intentional, and I have used compilers where writing to one
element of a union, then reading from another, did NOT result in the
same bit pattern.

There are some special rules inherited from C concerning "compatible"
structure prefixes, and I believe that if one of the elements is an
array of char or unsigned char, there are special rules for that as
well.  But in general, no.  (In practice, reading different integral
types with the same size will probably work just about everywhere.)

> A Defect?

No.

--
James Kanze             GABI Software             mailto:kanze@gabi-soft.fr
Conseils en informatique orient   e objet/
                           Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, T   l. : +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 16 Apr 2003 14:11:57 +0000 (UTC) Raw View

allan_w@my-dejanews.com (Allan W) wrote in message
news:<7f2735a5.0304141146.3c1e19da@posting.google.com>...

> > My reading of the standard doesn't make this clear - neither that
> > such an action [assuming identical bit-representation] is defined,
> > nor that such an action is ill-formed.

> Read it again. Pay particular attention to 9.5/1 (quoted above).
> According to the standard, even obvious cases like this:
>     union U4 { int i; int j; };
>     int f4(int x) { U4 u; u.i=x; return u.j; }
> are illegal.

It's been a long time, so my memory may be faulty, but I seem to
remember that one of the goals in the C committee (in the mid 80's) was
to make "checking" implementations legal.  In other words, an
implementation which maintained the type of the last assigned value in
some sort of look-aside memory (say, a hash-table indexed by the address
of the union), updated this memory on each write, and verified the type
on each read should still be conforming.

In practice, I've actually used compilers which deffered assignments
until necessary, tracking which values were in registers and using the
register value instead of reading from memory.  And which treated each
field of a union independantly.  In your example, for example, the
compiler would probably never store x into u.i, and would definitly have
read u.j before u.i was assigned.  Such an implementation is legal
(although IMHO poorly advised -- standard or no, people do use unions
for type punning, the compiler knows that a union is involved, and
there's no point breaking code, correct or not, unless you have to).

There are legitimate uses for "type punning".  The results are never
portable, of course, but the C and C++ standards provide a standard way
of doing it: by casting a pointer (reinterpret_cast in C++).

Historically, both casting a pointer and the union trick worked in K&R
C.  And the union trick was widely used.  But it was banned in C90, and
thus in C++.  Personally, I think that the standards committees could
have endorsed both, but they didn't.  And I don't think it matters too
much; in practice, the only times I've ever used it was to look at the
individual bytes of a representation, and the committees arguably did
make an exception when one of the elements involved was unsigned char or
char, i.e.:

    union { long l ; unsigned char b[ sizeof( long ) ] ; }
                    u ;
    u.l = someValue ;
    for ( size_t i = 0 ; i < sizeof( long ) ; ++ i ) {
        std::cout << ' ' << static_cast< int >( b[ i ] ) ;
    }

What will be displayed totally undefined, but the code is guaranteed to
run and to display something -- no core dump or other undefined behavior
allowed.

--
James Kanze             GABI Software             mailto:kanze@gabi-soft.fr
Conseils en informatique orient   e objet/
                           Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, T   l. : +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: do-not-spam-ben.hutchings@businesswebsoftware.com (Ben Hutchings)
Date: Tue, 22 Apr 2003 19:33:46 +0000 (UTC) Raw View

In article <4fb4137d.0304211333.734d27f3@posting.google.com>,
johnchx wrote:
<snip>
> But...there is trouble ahead.  Consider:
>
>  void Foo() {
>     MyUnion mu = {5};
>  }
>
> 8.5.1/15 says "When a union is initialized with a brace-enclosed
> initializer, the braces shall only contain an initializer for the
> first member of the union."
>
> However, 8.5.1/7 says "If there are fewer initializers in the list
> than there are members in the aggregate, then each member not
> explicitly initialized shall be default initialized (8.5)."  Ka-boom!
<snip>

This is clearly nonsensical behaviour for unions, and can't have been
intented (can it?).  Some defects in 8.5.1 have already been resolved
(<http://std.dkuug.dk/jtc1/sc22/wg21/docs/cwg_defects.html#35>,
<http://std.dkuug.dk/jtc1/sc22/wg21/docs/cwg_defects.html#178>), but
it appears that no-one reported this yet.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 23 Apr 2003 13:53:09 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote in message
news:<877k9ucv76.fsf@osv.javad.ru>...
> kanze@gabi-soft.de (James Kanze) writes:

> [...]

> > There are legitimate uses for "type punning".  The results are never
> > portable, of course, but the C and C++ standards provide a standard
> > way of doing it: by casting a pointer (reinterpret_cast in C++).

> But now it seems that aliasing rules I've initially asked about make
> using of pointers casting to be UB at some conditions, and union rules
> make using of unions to be UB at the same conditions.

Type punning almost always involves undefined behavior.  Which is
defined for the specific platform for which it is written.

> Probably I should explain where the initial question came from to make
> things more clear:

> Suppose I have to write code that is portable between architectures
> where sizeof(float) == sizeof(unsigned int) = 4 and ints and floats
> have the same (but unknown) bytes order and alignment requirement. The
> code should serialize both floats and ints in big endian without
> invoking undefined behavior.

What does big endian mean for a float?  What is the point of outputting
in big-endian (rather than something else), if not to conform to an
externally defined format, for data portability?  And if you are
conforming to an externally defined format, then you might have to do
extra work for int's, and will definitly have to do extra work for
floats.

The reason is simple: the actual internal representations of these types
are not defined by the standard.  In practice, if you are only concerned
with modern, 32 bit machines, then you can pretty much assume that int's
are 2's complement, with no padding or reserved bits (although I believe
that Unisys still makes a machine with 36 bit 1's complement ints, and
until a few years ago, it made one with 48 bit signed magnitude ints).
For floats, the issue isn't as clear, and even within the machines of
one manufacturer (IBM), the floating point format changes.  Just
adjusting the order you output the underlying bytes won't fix this.

> Now, for ints the solution is trivial:

> void out_uint(unsigned int u) {
>   out_byte(u >> 24);
>   out_byte(u >> 16);
>   out_byte(u >>  8);
>   out_byte(u);
> }

This is true ONLY for unsigned int's, and only if you accept that there
are no padding bits.  It is only true for signed int's if you accept the
same representation for negative numbers everywhere.

As I said above, although not guaranteed, this is a pretty good
assumption in practice.

> For floats I see three ways:

> void out_float1(float f) {
>   out_uint(*reinterpret_cast<unsigned int*>(&f));
> }

> void out_float2(float f) {
>   union { float f; unsigned int u } u;
>   u.f = f;
>   out_uint(u.u);
> }

> void out_float3(float f) {
>   unsigned int u;
>   unsigned char const* pf = reinterpret_cast<unsigned char*>(&f);
>   unsigned char* pu = reinterpret_cast<unsigned char*>(&u);
>   for(int i = 0; i < sizeof(u); ++i) *pu++ = *pf++; // or memcpy() instead
>   out_uint(u);
> }

> from which only the most ugly (and probably inefficient) out_float3()
> doesn't invoke undefined behavior (or does it?) as out_float1() seems
> to invoke UB due to aliasing rules (3.10/15) and out_float2() seems to
> invoke UB due to union rules (9.5/1) :(

All three invoke undefined behavior.  All three will fail, for example,
on a Unisys series A (where int's and float's are both 48 bits -- with
an 8 bit field which must be 0 for an int, and cannot be zero for a
float).

If you are willing to restrict portability (and you have, the moment you
say that ints and floats are four 8 bit bytes), you may be able to
restrict it further.  Say to machines which use 2's complement.  (I
don't know of any 32 bit machines which don't.)  Or even to machines
which use IEEE float (IBM mainframes don't).  Given that, the question
is: is this behavior still undefined in the set of possible target
machines?  All three of your solutions are defined by Posix, for example
(which also requires 8 bit bytes, and I think, 2's complement), and I
believe that some of the Windows interfaces also depend on the first two
working.  (If the first two work, the third will work.)  If you can
restrict your targets to Unix and Windows machines, IMHO, all three are
OK.  And if you can't, probably the next most important non-Unix
non-Windows system would be IBM mainframes, which have a completely
different floating point format anyway, so you'll need something *far*
more complicated.  (IBM floating point isn't even base 2.)

> Is this an intention or just an oversight in the standard?

It's intentional.  It is also the intent of the standard that specific
implementation define certain undefined behavior, even if it isn't
required.  The goal is to allow implementations even in cases where the
specific behavior in question can't be defined, not to encourage systems
to make it hard for programmers.  (There's also a large body of existing
code which depends on this behavior.  So you can pretty much be sure
that no compiler implementor is going to make it undefined unless he
absolutely has to.)

> IMHO, the comment to the aliasing rules 3.10/15:

>      "The intent of this list is to specify those circumstances in
>      which an object may or may not be aliased."

> suggests that out_float1() could be made valid as compiler has been
> explicitly told that 'f' is indeed aliased by another type of pointer.

> As for unions, why not change 9.5/1 so that it would be 'unspecified
> behavior' instead of implicitly implied 'undefined behavior'? It would
> make at least out_float2() to be a non-UB.

> Anyway, the 9.5/1 probably needs to be formulated in a more clean and
> strict way. For me, the quoted

>    "In a union, at most one of the data members can be active at any
>     time, that is, the value of at most one of the data members can
>     be stored in a union at any time."

> doesn't make it obvious that storing one member and fetching another
> is UB.  This is because the term "active" used in the first part is
> not well defined, and the second part doesn't tell about fetching at
> all. For me this quote sounds more like "you can't store to a field
> and expect that (previously stored) value of another field doesn't
> change."

I agree that the formulation could be improved, but I'm pretty sure that
the intent was that the "inactive" members be invalid, and accessing
invalid members is undefined behavior.  Generally, this IS the case:

    union { float f ; int i ; } u ;
    u.i = -1 ;
    function( u.f ) ;

is likely to core dump on some implementations (supposing that
0xffffffff is the representation of a signaling NaN).

Much of the standard involves compromises in the wording.  You don't
want to, and indeed you cannot specify all cases in detail.  You
certainly want to allow a core dump in cases like the one I just
presented, because that is what the hardware will give you.  And trying
to define in detail what will work, and what might not, is simply not
feasable -- the standard's authors are only human, and they don't know
how future hardware might behave any more than you or I do.  The policy
adopted in such cases is to simply declare the behavior undefined, and
let the individual implementations define as much as they can or want
to.  This does have a negative impact on portability, but then, even
when it works, the semantics will be different from one machine to the
next.)

--
James Kanze             GABI Software             mailto:kanze@gabi-soft.fr
Conseils en informatique orient   e objet/
                           Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, T   l. : +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 23 Apr 2003 16:37:45 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote in message
news:<873ckict4b.fsf@osv.javad.ru>...
> spambo_steffan_lankinensucks@hotmail.com ("Bo-Staffan Lankinen")
> writes:
> > Sergei Organov writes:

> > > Could somebody please explain which of the three functions below
> > > invoke "undefined behavior" due to the aliasing rules defined in
> > > the Standard (those that could be found in CD2 in [basic.lval]
> > > 15)?

> > It's mentioned in 3.10/15.

> > > union U { long l; int i };

> > > int f1(long l) { return *(int*)&l;  }
> > > int f2(long l) { return ((U*)&l)->i; }
> > > int f3(long l) { U u; u.l=l; return u.i; }

> > > My own understanding is that f3() is OK, f1() invokes "undefined
> > > behavior", and I'm unsure about f2().

> > You're right about f1 and f3. f2 also invokes undefined behavior
> > with respect to aliasing.

> If I'm right about f1, does it mean that the following also is UB with
> respect to 3.10/15:

> struct B;
> extern "C" void foo(B* b);

> struct A { ...something... };

> void boo() {
>   A a;
>   foo((B*)&a);
> }

There is a special clause that any common prefix of the two structs can
be accessed, which would cover this specific case.

> If so, it means calling some useful C functions in a usual way from
> C++ is UB, for example, UNIX socket routines 'bind' and 'getsockname'
> fall into this category as they use fake 'struct sockaddr *' as one of
> their arguments :(

I believe that there is more than one Posix function which requires what
the C (or C++) standard consider undefined behavior.  I believe that the
same thing is true for Windows.  All this means is that any C/C++
implementations on these platforms had better define this behavior -- an
implementation is free to define undefined behavior.

--
James Kanze             GABI Software             mailto:kanze@gabi-soft.fr
Conseils en informatique orient   e objet/
                           Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, T   l. : +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: osv@javad.ru (Sergei Organov)
Date: Thu, 24 Apr 2003 18:44:05 +0000 (UTC) Raw View

kanze@gabi-soft.de (James Kanze) writes:
> osv@javad.ru (Sergei Organov) wrote in message
> > If I'm right about f1, does it mean that the following also is UB with
> > respect to 3.10/15:
>
> > struct B;
> > extern "C" void foo(B* b);
>
> > struct A { ...something... };
>
> > void boo() {
> >   A a;
> >   foo((B*)&a);
> > }
>
> There is a special clause that any common prefix of the two structs can
> be accessed, which would cover this specific case.

Sorry, I don't see such a clause in 3.10/15. I see it for unions, but not in
aliasing rules :( The only apparently relevant clause is:

  --an  aggregate  or union type that includes one of the aforementioned
    types among its members (including, recursively, a member of a  sub-
    aggregate or contained union),

but I must say that I don't understand this clause at all :( Could somebody
give an example, please?


--
Sergei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: osv@javad.ru (Sergei Organov)
Date: Tue, 8 Apr 2003 08:36:18 +0000 (UTC) Raw View

Could somebody please explain which of the three functions below invoke
"undefined behavior" due to the aliasing rules defined in the Standard (those
that could be found in CD2 in [basic.lval] 15)?

union U { long l; int i };

int f1(long l) { return *(int*)&l;  }
int f2(long l) { return ((U*)&l)->i; }
int f3(long l) { U u; u.l=l; return u.i; }

My own understanding is that f3() is OK, f1() invokes "undefined behavior",
and I'm unsure about f2().

Thanks in advance.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: v.Abazarov@attAbi.com ("Victor Bazarov")
Date: Tue, 8 Apr 2003 18:11:08 +0000 (UTC) Raw View

"Sergei Organov" <osv@javad.ru> wrote...
> Could somebody please explain which of the three functions below invoke
> "undefined behavior" due to the aliasing rules defined in the Standard
(those
> that could be found in CD2 in [basic.lval] 15)?
>
> union U { long l; int i };
>
> int f1(long l) { return *(int*)&l;  }
> int f2(long l) { return ((U*)&l)->i; }
> int f3(long l) { U u; u.l=l; return u.i; }
>
> My own understanding is that f3() is OK, f1() invokes "undefined
behavior",
> and I'm unsure about f2().


I believe that to use the result of a cast of something into
something which it isn't is to invoke undefined behaviour.  In
your example in f1 '&l' is a pointer to long, and you are trying
to dereference the result of converting it to a pointer to int,
which it isn't (two pointers are not compatible, although the
pointed to types are, sort of).  In f2 you do it again.  'l' is
not an object of type 'U', so &l cannot be converted to U* and
then used.  It can only be converted back (see reinterpret_cast)
which is basically useless, although legal.

What would be legal, I think, is to cast 'l' to a 'U', instead
of doing the pointer dance.  According to the Standard, your U
is an aggregate and it can be initialised with a single value,
which is used to initialise the first member of the union.  It,
along with conversion rules, makes it legal to write

    int f4(long l) { return U(l).i; }

which is very similar to 'f3', it constructs a temporary U,
initialises it from 'l' and then extracts the 'i' member from
the temporary.

So, to sum up, both expressions in 'return' statement s of f1
and f2 cause undefined behaviour, IMHO.

Victor
--
Please remove capital A's from my address when replying by mail


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_w@my-dejanews.com (Allan W)
Date: Tue, 8 Apr 2003 21:15:32 +0000 (UTC) Raw View

osv@javad.ru (Sergei Organov) wrote
> Could somebody please explain which of the three functions below
> invoke "undefined behavior" due to the aliasing rules defined in
> the Standard (those that could be found in CD2 in [basic.lval] 15)?

CD2 is not "The standard"

> union U { long l; int i };
>
> int f1(long l) { return *(int*)&l;  }

I think this does evoke UB. It takes the address of a union, casts it to
the address of an int, and then dereferences it. You can't dereference
something that's cast to a pointer until you cast it back to it's original
type. This may happen to work on most machines (especially little-endian
machines), but I don't think it's strictly legal.

> int f2(long l) { return ((U*)&l)->i; }

Here you're taking the address of a long, casting it to the address of
the union, and then dereferencing it. Again, UB.

> int f3(long l) { U u; u.l=l; return u.i; }

The standard does say in 9.5/1:
    In a union, at most one of the data members can be active at any
    time, that is, the value of at most one of the data members can
    be stored in a union at any time. [Note: one special guarantee is
    made in order to simplify the use of unions: if a POD-union
    contains several POD-structs that share a common initial sequence
    (9.2), and if an object of this POD-union type contains one of
    the POD-structs, it is permitted to inspect the common initial
    sequence of any of POD-struct members; see 9.2.

When you store L into the local union, that's the active member. When
you access I in the union, you evoke UB.

> My own understanding is that f3() is OK, f1() invokes "undefined behavior",
> and I'm unsure about f2().

  #include <climits>
  int f4(long l) { return static_cast<int>(l % INT_MAX); >

or

  #include <limits>
  int f5(long l) {
    return static_cast<int>(l % std::numeric_limits<int>::max());
  }

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: spambo_steffan_lankinensucks@hotmail.com ("Bo-Staffan Lankinen")
Date: Wed, 9 Apr 2003 02:53:31 +0000 (UTC) Raw View

> Could somebody please explain which of the three functions below invoke
> "undefined behavior" due to the aliasing rules defined in the Standard
(those
> that could be found in CD2 in [basic.lval] 15)?

It's mentioned in 3.10/15.

> union U { long l; int i };
>
> int f1(long l) { return *(int*)&l;  }
> int f2(long l) { return ((U*)&l)->i; }
> int f3(long l) { U u; u.l=l; return u.i; }
>
> My own understanding is that f3() is OK, f1() invokes "undefined
behavior",
> and I'm unsure about f2().

You're right about f1 and f3. f2 also invokes undefined behavior with
respect to aliasing.

Bo-Staffan


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: v.Abazarov@attAbi.com ("Victor Bazarov")
Date: Wed, 9 Apr 2003 02:53:45 +0000 (UTC) Raw View

"Allan W" <allan_w@my-dejanews.com> wrote...
> [...]
>   #include <climits>
>   int f4(long l) { return static_cast<int>(l % INT_MAX); >
>
> or
>
>   #include <limits>
>   int f5(long l) {
>     return static_cast<int>(l % std::numeric_limits<int>::max());
>   }


So, if 'l' is negative, what will it result in?  Is that the
same as the OP wants (provided you know what he wants)?  Didn't
you actually mean to do (l % (INT_MAX + 1L))?

Victor
--
Please remove capital A's from my address when replying by mail

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]