Topic: offsetof - apparant contradiction in standard


Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 15 Apr 2002 07:42:46 GMT
Raw View
Mathew Hendry <mathewhendry@hotmail.com> writes:

|>  On Sat, 13 Apr 2002 14:38:15 GMT, Ian McCulloch
|>  <ian.mcculloch@wanadoo.nl> wrote:

|>  I was looking for a way to find the alignment requirements for a
|>  given class, for use in an allocator, and tried

|>    #include <cstddef>
|>    ...
|>    template<class T> struct alignment
|>    {
|>      struct lump
|>      {
|>        unsigned char pad;
|>        T t;
|>      };
|>      static const std::size_t value =3D offsetof(lump, t);
|>    };

|>  However, replies to that thread told me that this can only work for
|>  POD types. Why the limitation?

Simply because the operation was inherited from C.  It was maintained
for reasons of C compatibility, but it was clear that typical
implementations wouldn't work for at least some cases in C++ (consider
the implementation posted earlier in this thread, where the member
element is in fact a reference).  Since the only reason it was present
was C compatibility, and the raison d'etre for PODs is C compatibility,
this restriction seemed the simplest and the safest.

|>  In the thread you mention, Pete Becker says

|>  > The offset of a member of a virtual base is not fixed. It depends
|>  > on the most-derived type of the actual object.

|>  Are there any other issues?

References.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Ian McCulloch <ian.mcculloch@wanadoo.nl>
Date: Sat, 13 Apr 2002 14:38:15 GMT
Raw View
Hi,

This post is an offshoot of a query in comp.lang.c++.moderated,
http://groups.google.com/groups?hl=en&frame=right&th=7f4136d00016774b&seekm=86elhmtqpl.fsf%40alex.gabi-soft.de#link1
about offsetof not being an integral constant expression in strict mode of
Compaq cxx and also KAI KCC (I got this wrong in the original
comp.lang.c++.moderated post, I had mistakenly compiled the example in
strict mode with cxx but non-strict with KCC.  In fact, both compilers
accept the code in non-strict mode and both compilers reject it in strict
mode).

This is the first time I have read the actual C++ standard document, so
please forgive any first-time stupidities.

offsetof is defined in terms of the C standard, section 7.1.6, common
definitions <stddef.h>, which states:

<quote>
offsetof(type, _member-designator_)

expands to an integer constant expression that has type size_t, the value
of which is the offset in bytes, to the structure member designator
(designated by _member-designator_), from the beginning of its structure
(designated by _type_).  The _member-designator_ shall be such that given

static type t;

then the expression &(t.member-designator) evaluates to an
address-constant.  (If the specified member is a bit-field then the
behavior is undefined.)
<end quote>

Note that, although &(t.member-designator) is required to be an
address-constant, the result of an offsetof expression is an integer
constant expression.

C++ defines an integer constant expression in [expr.const] 5.19.2, as

<quote>
An integral constant-expression can involve only literals (2.13),
enumerators, const variables or static data members of integral or
enumeration types initialized with constant expressions (8.5), non-type
template parameters of integral or enumeration types, and sizeof
expressions. Floating literals (2.13.3) can appear only if they are cast to
integral or enumeration types. Only type conversions to integral or
enumeration types can be used. In particular, except in sizeof expressions,
functions, class objects, pointers, or references shall not be used, and
assignment, increment, decrement, function-call, or comma operators shall
not be used.
<end quote>

I am not sure how to interpret the first sentence, I assume this list is
meant to be exclusive, in which case
there is a problem because offsetof is not listed here, nor AFAICT is it
covered by any of the other cases.
(If the 'can' really does mean 'can', rather than 'must', then the whole
sentence is essentially meaningless,
so I guess that is not the correct interpretation?)

The usage of offsetof is covered by the clause in the C standard which
states that &(t.member-designator) is required to be an address-constant.
In C++, an address-constant is defined in [expr.const] 5.19.4 as

<quote>
4 An address constant expression is a pointer to an lvalue designating an
object of static storage duration, a string literal (2.13.4), or a
function. The pointer shall be created explicitly, using the unary &
operator, or implicitly using a non-type template parameter of pointer
type, or using an expression of array (4.2) or function (4.3) type. The
subscripting operator [] and the class member access . and -> operators,
the & and * unary operators, and pointer casts (except dynamic_casts,
5.2.7) can be used in the creation of an address constant expression, but
the value of an object shall not be accessed by the use of these operators.
If the subscripting operator is used, one of its operands shall be an
integral constant expression. An expression that designates the address of
a member or base class of a non-POD class object (clause 9) is not an
address constant expression (12.7). Function calls shall not be used in an
address constant expression, even if the function is inline and has a
reference return type.
<end quote>

This prohibits applying offsetof to a non-POD class type.  It also
prohibits the common hack of implementing offsetof as something like

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

because the result must be an integral constant expression, however this
form requires evaluating an address-constant expression, and therefore
cannot be used in some places that an integral constant expression can.
Since it is presumably legal to compile an already pre-processed C++
program, an implementation using the above macro definition cannot even
make a special exemption for offsetof, as that name wouldn't appear in the
pre-processed program.

Incidentally, the definition of an address contant expression makes
[lib.support.types] 18.1.5 redundant.  All this section says is that
offsetof accepts a restricted set of type arguments, namely a POD structure
or a POD union (ie. a POD class, as in 5.19.4).

Finally, one last (perhaps trivial) query; the C standard defines offsetof
in terms of _bytes_, but unfortunately the copy of the C standard that I
have is not searchable, so I don't know how it defines a byte.  What
relationship does byte have with char, and sizeof ?

Cheers,
Ian McCulloch

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 13 Apr 2002 23:43:24 GMT
Raw View
Ian McCulloch wrote:
....
> offsetof is defined in terms of the C standard, section 7.1.6, common
> definitions <stddef.h>, which states:
>
> <quote>
> offsetof(type, _member-designator_)
>
> expands to an integer constant expression that has type size_t, the value
> of which is the offset in bytes, to the structure member designator
> (designated by _member-designator_), from the beginning of its structure
> (designated by _type_).  The _member-designator_ shall be such that given
>
> static type t;
>
> then the expression &(t.member-designator) evaluates to an
> address-constant.  (If the specified member is a bit-field then the
> behavior is undefined.)
> <end quote>
....
> C++ defines an integer constant expression in [expr.const] 5.19.2, as
>
> <quote>
> An integral constant-expression can involve only literals (2.13),
> enumerators, const variables or static data members of integral or
> enumeration types initialized with constant expressions (8.5), non-type
> template parameters of integral or enumeration types, and sizeof
> expressions. Floating literals (2.13.3) can appear only if they are cast to
> integral or enumeration types. Only type conversions to integral or
> enumeration types can be used. In particular, except in sizeof expressions,
> functions, class objects, pointers, or references shall not be used, and
> assignment, increment, decrement, function-call, or comma operators shall
> not be used.
> <end quote>
....
> This prohibits applying offsetof to a non-POD class type.  It also
> prohibits the common hack of implementing offsetof as something like
>
> #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

You're correct. As far as I can tell, there's no way to use ordinary
C/C++ code to implement offsetof() correctly, even allowing for
implementation-defined behavior such as is used above. As has been
recently brought up in the comp.std.c newsgroup, this problem also comes
up in the C standard itself, which has a comparably restricted
definition of a integer constant expression. The proper solution to this
problem cannot be to relax the requirement that offsetof() must expand
to an integer constant expression; offsetof() is supposed to be useable
wherever an integer constant expression is required.

Some people have argued that "expands to an integer constant expression"
should be interpreted not as a requirement, but as a definition: that
the expansion of offsetof() automatically qualifies as an integer
constant expression, regardless of what string of tokens it actually
expands to. I don't think this is a correct interpretation of the actual
text of the standard. However, I do think this would be the best way to
resolve the problem.

> Finally, one last (perhaps trivial) query; the C standard defines offsetof
> in terms of _bytes_, but unfortunately the copy of the C standard that I
> have is not searchable, so I don't know how it defines a byte.  What
> relationship does byte have with char, and sizeof ?

sizeof(object) or sizeof(type) returns the size of the object or type,
measured in bytes. Objects (other than bit-fields) are requred to use up
an integral number of bytes. sizeof(char) is required to be 1. CHAR_BITS
is the number of bits in a byte, which is required to be an integer, and
to be at least 8, but it can be larger. Popular values for CHAR_BITS
other than 8 have included 9, 16, and 32. (9 was popular on some
machines which used a 36-bit word).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Mathew Hendry <mathewhendry@hotmail.com>
Date: Sun, 14 Apr 2002 15:32:47 GMT
Raw View
On Sat, 13 Apr 2002 14:38:15 GMT, Ian McCulloch
<ian.mcculloch@wanadoo.nl> wrote:

>This post is an offshoot of a query in comp.lang.c++.moderated,
>http://groups.google.com/groups?hl=en&frame=right&th=7f4136d00016774b&seekm=86elhmtqpl.fsf%40alex.gabi-soft.de#link1
>about offsetof not being an integral constant expression in strict mode of
>Compaq cxx and also KAI KCC

I asked a related question here:

http://groups.google.com/groups?threadm=u6o6aughfngkljbt3j601132scfidlbbts%404ax.com

I was looking for a way to find the alignment requirements for a given
class, for use in an allocator, and tried

  #include <cstddef>
  ...
  template<class T> struct alignment
  {
    struct lump
    {
      unsigned char pad;
      T t;
    };
    static const std::size_t value = offsetof(lump, t);
  };

However, replies to that thread told me that this can only work for
POD types. Why the limitation?

In the thread you mention, Pete Becker says

> The offset of a member of a virtual base is not fixed. It depends on the
> most-derived type of the actual object.

Are there any other issues?

-- Mat.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Sun, 14 Apr 2002 15:34:20 GMT
Raw View
Ian McCulloch <ian.mcculloch@wanadoo.nl> writes:

|>  This post is an offshoot of a query in comp.lang.c++.moderated,=20
|>  http://groups.google.com/groups?hl=3Den&frame=3Dright&th=3D7f4136d000=
16774b&seekm=3D86elhmtqpl.fsf%40alex.gabi-soft.de#link1=20
|>  about offsetof not being an integral constant expression in strict mo=
de of=20
|>  Compaq cxx and also KAI KCC (I got this wrong in the original=20
|>  comp.lang.c++.moderated post, I had mistakenly compiled the example i=
n=20
|>  strict mode with cxx but non-strict with KCC.  In fact, both compiler=
s=20
|>  accept the code in non-strict mode and both compilers reject it in st=
rict=20
|>  mode).

|>  offsetof is defined in terms of the C standard, section 7.1.6, common=
=20
|>  definitions <stddef.h>, which states:

|>  <quote>
|>  offsetof(type, _member-designator_)

|>  expands to an integer constant expression that has type size_t, the
|>  value of which is the offset in bytes, to the structure member
|>  designator (designated by _member-designator_), from the beginning
|>  of its structure (designated by _type_).  The _member-designator_
|>  shall be such that given

|>  static type t;

|>  then the expression &(t.member-designator) evaluates to an
|>  address-constant.  (If the specified member is a bit-field then the
|>  behavior is undefined.)
|>  <end quote>

|>  Note that, although &(t.member-designator) is required to be an
|>  address-constant, the result of an offsetof expression is an integer
|>  constant expression.

Right.  How the compiler gets from here to there is its problem, but it
must do it.

|>  C++ defines an integer constant expression in [expr.const] 5.19.2,
|>  as

|>  <quote>
|>  An integral constant-expression can involve only literals (2.13),
|>  enumerators, const variables or static data members of integral or
|>  enumeration types initialized with constant expressions (8.5),
|>  non-type template parameters of integral or enumeration types, and
|>  sizeof expressions. Floating literals (2.13.3) can appear only if
|>  they are cast to integral or enumeration types. Only type
|>  conversions to integral or enumeration types can be used. In
|>  particular, except in sizeof expressions, functions, class objects,
|>  pointers, or references shall not be used, and assignment,
|>  increment, decrement, function-call, or comma operators shall not be
|>  used.
|>  <end quote>
|> =20
|>  I am not sure how to interpret the first sentence, I assume this
|>  list is meant to be exclusive, in which case there is a problem
|>  because offsetof is not listed here, nor AFAICT is it covered by any
|>  of the other cases.  (If the 'can' really does mean 'can', rather
|>  than 'must', then the whole sentence is essentially meaningless, so
|>  I guess that is not the correct interpretation?)

The list is meant to be exclusive, but such rules really only apply to
user code.  Nothing prevents an implementation from defining offsetof
as:

    #define offsetof( type, member ) __builtin_offsetof( type, member )

and declaring that __builtin_offsetof is an integral constant expression
(although there is no way that you can do such in user code).

|>  The usage of offsetof is covered by the clause in the C standard
|>  which states that &(t.member-designator) is required to be an
|>  address-constant.

And since this expression is an address constant, and not an integral
constant expression, it cannot be the results of offsetof.

|>  In C++, an address-constant is defined in
|>  [expr.const] 5.19.4 as

|>  <quote>
|>  An address constant expression is a pointer to an lvalue designating
|>  an object of static storage duration, a string literal (2.13.4), or
|>  a function. The pointer shall be created explicitly, using the unary
|>  & operator, or implicitly using a non-type template parameter of
|>  pointer type, or using an expression of array (4.2) or function
|>  (4.3) type. The subscripting operator [] and the class member access
|>  . and -> operators, the & and * unary operators, and pointer casts
|>  (except dynamic_casts, 5.2.7) can be used in the creation of an
|>  address constant expression, but the value of an object shall not be
|>  accessed by the use of these operators.  If the subscripting
|>  operator is used, one of its operands shall be an integral constant
|>  expression. An expression that designates the address of a member or
|>  base class of a non-POD class object (clause 9) is not an address
|>  constant expression (12.7). Function calls shall not be used in an
|>  address constant expression, even if the function is inline and has
|>  a reference return type.
|>  <end quote>
|> =20
|>  This prohibits applying offsetof to a non-POD class type.

The prohibition is explicit in 18.1/5.

|>  It also prohibits the common hack of implementing offsetof as
|>  something like

|>  #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

|>  because the result must be an integral constant expression, however
|>  this form requires evaluating an address-constant expression, and
|>  therefore cannot be used in some places that an integral constant
|>  expression can.  Since it is presumably legal to compile an already
|>  pre-processed C++ program, an implementation using the above macro
|>  definition cannot even make a special exemption for offsetof, as
|>  that name wouldn't appear in the pre-processed program.

Well, there is a rule which lets the implementation off the hook.  The
expression in question dereferences a null pointer.  This is undefined
behavior, so anything the implementation does with it is legal.  An
implementation could, for example, define that the results of
dereferencing a null pointer constant by using the -> operator, then
immediately taking its address, is an integral constant expression.  A
conforming program could not tell, since no conforming program could
create such an expression.  And if you did write such an expression, the
compiler is not required to emit a diagnostic.

So this defintion, with or without the final cast, may be legal,
provided the compiler has the necessary extensions to handle it.
(Personally, I find the __builtin_offsetof a far cleaner solution.)

Historically, I think that the list was not considered exclusive.  In
ISO 9899 (both 1990 and 1999), there is a sentence "An implementation
may accept other forms of constant expressions."  This doesn't
explicitly include integral constant expressions (which are more
restrictive), but is suggestive with regards to the intent.  Thus, an
implementation might consider the above definition a constant
expression in C; I suspect that the intent was also to allow it as an
integral constant expression, since the type is integral.

|>  Incidentally, the definition of an address contant expression makes
|>  [lib.support.types] 18.1.5 redundant.  All this section says is that
|>  offsetof accepts a restricted set of type arguments, namely a POD
|>  structure or a POD union (ie. a POD class, as in 5.19.4).

|>  Finally, one last (perhaps trivial) query; the C standard defines
|>  offsetof in terms of _bytes_, but unfortunately the copy of the C
|>  standard that I have is not searchable, so I don't know how it
|>  defines a byte.  What relationship does byte have with char, and
|>  sizeof ?

You have to look at the sizeof operator.  It is defined to return the
size of an object in bytes, and sizeof( char ) is guaranteed to be 1.

As far as I know, there is no other definition of byte in the standard
(except with regards to multi-byte characters, which is probably not
relevant to what you are looking for).

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]