Thread

Topic: Data structures with negative offsets

Author: Geoffrey KEATING <geoffk@discus.anu.edu.au>
Date: 1999/09/13 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:

> OK, I'll bite. How could the alignment requirement of an aggregate
> be stricter than the strictest alignment requirement of its components?

Oh, that's easy.  From a (hypothetical but reasonable) ABI document:

Structures that would have had size between 5 and 8 bytes, containing
only integer components, are packed into an 8-byte integer type in
little-endian order.
....
Integer and floating-point types are aligned in memory so that their
address is a multiple of their size.

--
Geoff Keating <Geoff.Keating@anu.edu.au>

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/09/14 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> Paul Jarc wrote:
> > The language in the standard regarding pointer arithmetic (C9X
> > 6.5.6p8) does not differentiate between signed and unsigned integer
> > operands, nor does it mention coercion of the integer operands to a
> > signed or unsigned type.
>
> So the result is undefined?

Before you are ready to participate in discussions here, you will need
to rid yourself of whatever kind of thought led you to that
conclusion.  It's totally unwarranted.

> The only reasonable alternative to signed arithmetic for pointers is
> to allow BOTH signed and unsigned versions depending on context.

Which is exactly what the standard does, by requiring the result to
work, but not requiring a particular signedness to be used.

> >> (I checked two of several books on 'C' on my bookshelf and they said
> >> that pointer arithmetic uses 'int's, but they may not be definitive
> >> or I could have misunderstood something.)
> >
> > They're not definitive, because they're not the standard.  The
> > standard is definitive; all else is not and may be of arbitrarily low
> > quality as description.  Exception: if all you care about is making
> > your program work on a particular implementation which may or may not
> > be conforming, then that implementation's source code is definitive;
> > all else is not and may be of arbitrarily low quality as description.
> > (Of course, we don't concern ourselves too much with that case while
> > here in the std groups.)
>
> (Au contraire, that kind of argument has been raised innumerable times.
> The standard has to deal with the 'common practice' at its base...)

Yes, the standard draws from common practice.  That doesn't make
common practice definitive.  It makes a difference only when the
standard disagrees with another source; in these cases, we say the
standard is right because we accept the standard as the *definition*
of the language.  That's one major reason for having a standard: to
resolve descriptive disagreements.

"Right", for the purposes of discussing something defined by a
standard, is defined as "in agreement with the standard".  The
standard cannot be wrong, though it can be inconsistent.

> As for my possible misunderstanding, further reading indicates that
> the standard may be silent on this. At least no one has quoted anything
> definitive. That makes the result 'undefined'?

The standard is silent on signedness of the integer operand in pointer
arithmetic.  That doesn't make anything undefined.  It allows signed
or unsigned, and requires them both to work.  I didn't quote the text,
but I posted its location in the draft standard, which you can get
from the URL that's been posted recently.

> So, what would be the impact of changing the return type of 'offsetof'
> from an unsigned to a signed type? Clive says that this would be a 'Quiet
> Change', but I believe it would not be.

That's because you don't know what a Quiet Change is.  From the
Rationale:

# Avoid ``quiet changes.''  Any change to widespread practice altering
# the meaning of existing code causes problems.  Changes that cause
# code to be so ill-formed as to require diagnostic messages are at
# least easy to detect.  As much as seemed possible consistent with
# its other goals, the Committee has avoided changes that quietly
# alter one valid program to another with different semantics, that
# cause a working program to work differently without notice.  In
# important places where this principle is violated, the Rationale
# points out a QUIET CHANGE.

Your proposal would change the meaning of existing code, without
requiring a diagnostic.  That's what a Quiet Change is.

paul
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: david thompson <david.thompson@trintech.com>
Date: 1999/09/11 Raw View

9 Sep 1999 20:15:39 GMT,
Max TenEyck Woodbury <mtew@cds.duke.edu> wrote:
> "Clive D.W. Feather" wrote:
[alignment of aggregate is _at least_ as strict as any component]
> OK, I'll bite. How could the alignment requirement of an aggregate
> be stricter than the strictest alignment requirement of its components?
>
You want a real example?

The original Tandem "T16" architecture, now basically obsolete,
reportedly based on the HP3000 (which I never saw myself), and
AIUI the same in this respect as the DG Nova,Eclipse series,
uses a native 16-bit word containing 2 8-bit bytes, and
pointers to character types require a special format using
15 + 1 bits -- similar to, but simpler than, the (in?)famous
PDP-10 byte pointers (within 36-bit word) often referred to here.
To make struct pointers consistent with most other pointers,
and since all struct pointers must have a common representation,
all structs are aligned to 2 bytes even if they contain only
character members.  I believe the same is true of union but
I never bothered to check, as a union of only character types
accomplishes no more than a cast pointer to character.
OTOH char arrays, and scalars, are aligned only to 1 byte
when a member; *all* toplevel vars are aligned to 2 bytes.

- david.thompson at but not for trintech.com

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/11 Raw View

In article <37D91306.E5B036F@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>>> I could be mistaken, but I believe pointer arithmetic is signed.
>> The language in the standard regarding pointer arithmetic (C9X
>> 6.5.6p8) does not differentiate between signed and unsigned integer
>> operands, nor does it mention coercion of the integer operands to a
>> signed or unsigned type.
>So the result is undefined?

No. That's not what he said.

Pointer addition takes two arguments (in either order):
(1) a pointer
(2) an integer value.

There is no requirement that (2) be restricted to any particular set of
types. Rather, provided that the result is within the same object that
the original pointer pointed to, the integer argument can be signed int,
unsigned int, signed long, uintmax_t, unsigned long long, int87_t, or
any other integer type. They are *all* defined because there is a
definition of what the addition does, the definition makes sense, and
there's no explicit restriction.

>The only reasonable alternative to signed
>arithmetic for pointers is to allow BOTH signed and unsigned versions
>depending on context.

Exactly. Both signed and unsigned versions are correct.

E.g. suppose ints are 16 bits. Then:

    char v [120000];   // Assume the implementer allows this.
    signed int s;      // Range -32768 to 32767
    unsigned int u;    // Range 0 to 65535
    char *p = v + 50000, *p1, *p2;

    /* Set s and u to some value */
    p1 = p + s;  // Legal, p1 now in [v + 17232, v +  83767]
    p2 = p + u;  // Legal, p2 now in [v + 50000, v + 115535]

Note that both these additions are legal. There is no question of
converting s to unsigned or u to signed.

>I said that the references I could check were not definitive. As for
>the quality of the documents, that is why I consulted two widely
>different sources. One was an implementation document, but the other
>was a much broader reference (K & R).

Implementers define what they do, not the language. R will freely tell
you (possibly even in this thread) that K & R is not a substitute for
the Standard.

>As for my possible misunderstanding, further reading indicates that
>the standard may be silent on this. At least no one has quoted anything
>definitive. That makes the result 'undefined'?

No, it doesn't. It's not silent, it's implicit.

Okay, here's the relevant words:

    6.5.6  Additive operators

    Constraints

    [#2]  For  addition,  either  both   operands   shall   have
    arithmetic  type,  or  one  operand shall be a pointer to an
    object  type  and  the  other  shall  have   integer   type.

    Semantics

    [#8] When an expression that has integer type is added to or
    subtracted  from  a  pointer, the result has the type of the
    pointer operand.   If  the  pointer  operand  points  to  an
    element  of  an array object, and the array is large enough,
    the result points to an element  offset  from  the  original
    element  such  that  the difference of the subscripts of the
    resulting and original array  elements  equals  the  integer
    expression. [...]

Look at my example code above - this wording gives perfectly good
meaning for both signed and unsigned integers. Nothing undefined at all.

>At worst, the
>results of pointer arithmetic are implementation defined. There should
>be no problem with sufficiently small offsets, but offsets in the upper
>half of the 'size_t' range may cause trouble.

Wrong. Show me the source of any trouble in the above words.

>So, what would be the impact of changing the return type of 'offsetof'
>from an unsigned to a signed type? Clive says that this would be a 'Quiet
>Change', but I believe it would not be.

I've shown you various places where it would be. Once you understand the
above points, it should be clear.

>For those programs that would be
>effected, their operation is implicitly implementation defined at best
>and not subject to required diagnostics. With the change in place, the
>behavior would still be implementation defined, but some diagnostics would
>be required because of constraint violations unless the implementation
>could assure the correct result. The net would be an incremental
>improvement in portability.

This is, as I have said before, wrong.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/12 Raw View

In article <37D8FB99.F5E06CBF@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>>> OK, I'll bite. How could the alignment requirement of an aggregate
>>> be stricter than the strictest alignment requirement of its components?
[...]
>Case 1 is an implementer's decision above and beyond the actual requirement.

No, the implementer's decision *is* the requirement. There's no other
sensible meaning of "requirement" (for example, if the hardware has an
alignment requirement, you can always copy values byte by byte into an
aligned object for calculations, while keeping all C variables
unaligned).

>Case 2 is the one that proves the point. Conceded. But note that it does not
>make much sense if a binary addressing scheme is used. I believe that binary
>addressing has been standard since the IBM 1600 series went out of use and
>I never heard of a 'C' implementation for those machines.

One counterexample: the Unisys 60 bit machines.

>>>>> It is a characterization of
>>>>> alignment requirements. I think it is an important characteristic.
>>>> Why ?
>>> Why not?
>> One good reason would be that it's incorrect.
>OK, but you had to go way out in left field to get that one...

Only if "left field" is "half a metre from the pitcher".

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: kaih=7OdVDqS1w-B@khms.westfalen.de (Kai Henningsen)
Date: 1999/09/12 Raw View

mtew@cds.duke.edu (Max TenEyck Woodbury)  wrote on 11.09.99 in <37D91306.E5B036F@cds.duke.edu>:

> Paul Jarc wrote:
> >
> > Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> >> I could be mistaken, but I believe pointer arithmetic is signed.
> >
> > The language in the standard regarding pointer arithmetic (C9X
> > 6.5.6p8) does not differentiate between signed and unsigned integer
> > operands, nor does it mention coercion of the integer operands to a
> > signed or unsigned type.
>
> So the result is undefined?

Of course not.

Array indexing also doesn't talk about signed vs. unsigned arithmetic.

Surprise! Pointer arithmetic and array indexing are supposed to be
equivalent.

Pointer arithmetic, just like array indexing, only works as long as your
pointer points to the same originally allocated object. This object has a
finite number of allowable pointers, and those pointers have finite
differences (lesser or equal in absolute value to sizeof(object)).

As long as your operands are in the allowed range, the result is defined
by the values of those operands, and does not depend on the representation
of those values (that is, on the signedness of any integral type used to
hold such a value).

If it so happens that you can have objects large enough to get overflows -
say, when calculating pointer differences (actually, that's the only case
I can think of) - well, in that case, the standard explicitely says what
happens:

"When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array elements.
The size of the result is implementation-defined, and its type (a signed
integral type) is ptrdiff_t defined in the <stddef.h> header. As with any
other arithmetic overflow, if the result does not fit in the space
provided, the behaviour is undefined." (ANSI 3.3.6)

> any practical purpose. The only reasonable alternative to signed
> arithmetic for pointers is to allow BOTH signed and unsigned versions

That's what the standard does, obviously.

> depending on context.

Why that?

> As for my possible misunderstanding, further reading indicates that
> the standard may be silent on this. At least no one has quoted anything
> definitive. That makes the result 'undefined'?

No, it doesn't. Why on earth should it?

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
  - Russ Allbery (rra@stanford.edu)

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Douglas A. Gwyn" <DAGwyn@null.net>
Date: 1999/09/13 Raw View

david thompson wrote:
> reportedly based on the HP3000 (which I never saw myself),

The HP 3000 was a good machine.

There have been several machines where character pointers were
not VAX-like due to architectural constraints.  There have even
been some where longer integer representations require internal
padding bits.  The reason we allow these things in the C standard
is because they occur in the real world, and are not patently
design errors to be "discouraged" by making sure that C will be
inefficient on those platforms.

If you want examples of interesting architectures that deserved
to fare better than they did, there are plenty in the not-too-
distant history of computing.  For example, the System 38 and
the iAPX-432, basically "tagged" architectures (hardware
enforcement of type safety) inspired by Myers' SWARD.

Personally, I want to see efficient individual-bit addressing,
because a lot of my applications work at that level.  But
decisions made in language standardization can have a big impact
on what actually gets implemented.  I know of at least one case
where management didn't support the architect's wish for bit-level
addressing, on the grounds that "programming languages don't
support it".
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/09 Raw View

"Clive D.W. Feather" wrote:
>
> In article <37D57457.F9286CB6@cds.duke.edu>, Max TenEyck Woodbury
> <mtew@cds.duke.edu> writes
>
>>>> I was defining an important characteristic of alignment requirements. It was
>>>> Clive that tried to turn this into a rule/requirement. I did NOT claim that it
>>>> was a rule, only that 'C' made provisions for it and it COULD be an
>>>> implementation requirement.
>>> No you didn't - go back and read the words you *wrote*. The first quoted
>>> sentence is *not* an important characteristic of alignment requirements.
>> Huh? I suggest YOU read what I wrote.
>
> I have. Your original words were:
>
> || The alignment requirement of an aggregate is the same as the most
> || severe alignment requirement of any of its components.
>
> See that "is the same as" ? That's what I'm objecting to.
>
> Had you written "is at least as strict as", or even "is often the same
> as", I wouldn't have commented. But you said "is the same as".

OK, I'll bite. How could the alignment requirement of an aggregate
be stricter than the strictest alignment requirement of its components?

>> It is a characterization of
>> alignment requirements. I think it is an important characteristic.
>
> Why ?

Why not?

>> It is also a characteristic of aggregates that an implementation
>> has to consider.
>
> The implementation *defines* these things; it doesn't have to "consider"
> them.

An implementation may or may not define alignment requirements as it
sees fit, but it has to consider them in any case.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/09 Raw View

Paul Jarc wrote:
>
> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
>>
>>     assert( (ptrdiff_t)offsetof(a,b) >= 0 );
>>
>> will fail if the expression
>>
>>     ((c *)((char *)d + offsetof(a,b)))
>>
>> could cause a portability problem for appropriate value of a, b,
>> c and d.
>
> If that offsetof() would overflow ptrdiff_t, then the result of the
> cast is implementation-defined.

Yep. And that is precisely when you need to check for portability
problems.

> The assert() is no help there.

If the 'assert' fails, then the expression may or may not produce
the correct result. If the 'assert' passes, then the expression will
produce the correct result on a conforming implementation.

> Nor
> do you need any help, AFAICT: if d points to a sufficiently large
> object (such as one of type a), I don't see anything in the standard
> that would allow that addition to cause problems.

I could be mistaken, but I believe pointer arithmetic is signed.
(I checked two of several books on 'C' on my bookshelf and they said
that pointer arithmetic uses 'int's, but they may not be definitive
or I could have misunderstood something.) If it is signed and
overflows cause exceptions, there should be an overflow and thus
an exception when the large 'offsetof' value, after being converted
to a signed type, is added to the pointer. If the converted 'offsetof'
value is negative and does not overflow, then the expression points
to a location before the object designated by 'd', and that is a
problem of another sort. If the overflow does not cause an exception
or the conversion is to a 'larger' signed type and thus stays positive,
then the case you are thinking of applies, and there is no problem.

So, the 'assert' may have to be conditioned out for a number of
specific implementations where those implementations' details
compensate for the problem, but you have to investigate those
details if the 'assert' fails. Having the 'assert' fail indicates
that an important assumption no longer holds and you need to
check it out.

Note that a smart programmer would put all the implementation
version checking in a header and define a special macro to test
'offsetof' values rather than include all the exceptions and the
assert each time 'offsetof' was used. However, the 'assert' above
is the starting point needed to detect the problem.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/09/09 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> I could be mistaken, but I believe pointer arithmetic is signed.

The language in the standard regarding pointer arithmetic (C9X
6.5.6p8) does not differentiate between signed and unsigned integer
operands, nor does it mention coercion of the integer operands to a
signed or unsigned type.

> (I checked two of several books on 'C' on my bookshelf and they said
> that pointer arithmetic uses 'int's, but they may not be definitive
> or I could have misunderstood something.)

They're not definitive, because they're not the standard.  The
standard is definitive; all else is not and may be of arbitrarily low
quality as description.  Exception: if all you care about is making
your program work on a particular implementation which may or may not
be conforming, then that implementation's source code is definitive;
all else is not and may be of arbitrarily low quality as description.
(Of course, we don't concern ourselves too much with that case while
here in the std groups.)

paul

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James Kuyper <kuyper@wizard.net>
Date: 1999/09/09 Raw View

Max TenEyck Woodbury wrote:
>
> "Clive D.W. Feather" wrote:
> >
> > In article <37D57457.F9286CB6@cds.duke.edu>, Max TenEyck Woodbury
> > <mtew@cds.duke.edu> writes:
...
> OK, I'll bite. How could the alignment requirement of an aggregate
> be stricter than the strictest alignment requirement of its components?

It's possible to prove that the alignment requirement of a struct or
union must be a positive integral multiple of the least common multiple
of all the alignement requirements of the members. Beyond that,
alignment is completely implementation-defined.

Two plausible examples:
1. The implementation chooses to require all structs (or all unions),
regardless of their contents, to be aligned with the same restrictions
as the most strictly aligned basic type.

2. There happen to be two basic types with alignement requirements such
that the larger alignment requirement is not an integer multiple of the
smaller one. In that case, the alignment requirement of a structure
containing both types must be a mulitple of the least common multiple of
the two alignment requirements, which will be larger than either one.

The second case is pretty obscure, though I've heard of machines with
5-byte words where it might come up. However, the first case isn't
particularly odd.

> >> It is a characterization of
> >> alignment requirements. I think it is an important characteristic.
> >
> > Why ?
>
> Why not?

One good reason would be that it's incorrect.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/09/10 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> OK, I'll bite. How could the alignment requirement of an aggregate
> be stricter than the strictest alignment requirement of its components?

Easy.  Assume an implementation that aligns all structs on 2-byte (or
greater) boundaries, just to be extra safe, or for any other reason.
Then assume the declaration `struct foo { char ch; };'.  The standard
doesn't require the alignment of the struct to be greater than the
strictest alignment of a member, but it allows it.  An implementation
is thus free to define the struct's alignment requirement this way.

paul
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/10 Raw View

"Clive D.W. Feather" wrote:
>
> <mtew@cds.duke.edu> writes
> ...
>> Q. So there would be a problem on small machines?
>> A. Theoretically, yes. In practice no. If a program had such a
>>    large offset on such a small machine, it could not, in theory,
>>    combine it with a pointer without getting an overflow.
>
> Why not ? Consider the following code:
>
>     struct fred { char c [40000]; int x; char cc; } s;
>     char *pp;
>     unsigned int N = // some value determined elsewhere
>
>     if (N > sizeof s.c)
>         pp = (char *) &s + offsetof (struct fred, cc);

Assuming a 16 bit 'size_t', there are three possibilities here:

1. The 'offsetof' value gets converted to a signed 16 bit type
   and the result will be negative. Adding the negative value
   to &s overflows and you get an appropriate pointer.
   Technically Undefined but correct behavior.
2. The 'offsetof' value gets converted to a signed 16 bit type
   and the result will be negative. Adding the negative value
   to &s does not overflow. The result is a pointer before &s
   and you get the wrong result. Incorrect behavior.
3. The 'offsetof' value gets converted to a signed type with
   more than 16 bits. This works, but is an extension of the
   language. (Well maybe not in all cases, but in some...)

>     else
>         pp = (char *) &s + N;
>
> Strictly conforming code in C9X, undefined with your change.

Undefined in either case. More easily identified as undefined
with the signed type.

> [Yes, this particular code could be written differently, but the offset
> calculation might be decoupled from the addition.]
>
>>   In
>>   practice, address arithmetic overflows are ignored on such
>>   small machines.
>
> Possibly, but not necessarily.
>
>>   In other words, any implementation that could
>>   generate problematic offsets would have to extend the language
>>   in order to be able to use them.
>
> Not true.

It either has to change the conversion rules or ignore overflows.
Permitted, but extensions.

>> Any problems with the change would occur with objects in the
>> upper half of the size_t range.
>
> That's the case for this particular issue, yes.
>
>> From this discussion, I believe
>> it is clear that either choice of 'offsetof' type introduces
>> unpleasant surprises.
>
> You still fail to demonstrate this.
>
> Here's another situation: on any system where either SIZE_MAX > INT_MAX,
> the expression:
>
>     offsetof (struct jim, field) - 1
>
> can never be less than zero.

HUH? offsetof(struct jim, field) could be 0. Subtract 1 and it would be
less than 0.

> With your proposal, it suddenly could be.
> That is a Quiet Change. I have no idea what it could affect.
>
>> So, I see your point. For programs with large objects, changing
>> the return type of 'offsetof' would introduce undefined behavior.
>
> And for other programs the results of calculations could change.
>
>> The fact that any of the 'offsetof' values that would become
>> undefined with the change have significant restrictions on what
>> can be done with them was the missing point in the discussion.
>
> Very missing, since it isn't true.

Arrrg. Read what you yourself wrote half a page down!

>> Which ties back to another thread. I was under the apparently
>> mistaken impression that the portability problems associated
>> with combining 'offsetof' values with pointers had to do with
>> some of the more off-the-wall properties of old 36 bit
>> main-frames.
>
> Huh ? I may have missed this, but I am unaware of any such portability
> problems.
>
>> This makes it clear that the problem is with
>> 'offsetof' values in the upper half of the size_t range. With
>> that, I can now include 'assert's that check for this kind of
>> problem. Specifically --
>>
>>    assert( (ptrdiff_t)offsetof(a,b) >= 0 );
>>
>> will fail if the expression
>>
>>    ((c *)((char *)d + offsetof(a,b)))
>>
>> could cause a portability problem for appropriate value of a, b,
>> c and d.
>
> Wrong. For all the reasons given above, *plus* a new one: if the offset
> is greater than INT_MAX, the value produced by the cast is
> implementation-defined *or* the implementation is allowed to raise a
> signal.

Yes, and that is NOT a portability problem? It IS a portability problem
unless your definition of portability is much more restrictive than mine.

>> The introduction of data structures with negative offsets is
>> not consistent with some of the more obscure aspects of 'C'
>> and 'C++'.
>
> Correct (for some value of "obscure").

I agree with you and you get snide... The handling of arithmetic
overflow is hardly one of 'C's more lucid aspects.

>> If such a capability is to be introduced, it should
>> be as a pointer attribute rather than as a structure attribute.
>
> No comment until I see a proposal.

It's been posted in another thread. In fact, someone else came up with
almost the same suggestion. It's far from complete, but it would not
be a 'Quiet Change'.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Francis Glassborow <francis@robinton.demon.co.uk>
Date: 1999/09/10 Raw View

In article <37D8131D.B715732A@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>OK, I'll bite. How could the alignment requirement of an aggregate
>be stricter than the strictest alignment requirement of its components?

It would indeed be very unusual but I think that if any systems had
alignment requirements on different types where the stricter requirement
was not an exact multiple of the less strict ones that you might find
that stricter alignment was required for the aggregate.


Francis Glassborow      Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Paul D. DeRocco" <pderocco@ix.netcom.com>
Date: 1999/09/10 Raw View

Max TenEyck Woodbury wrote:
>
> "Clive D.W. Feather" wrote:
> >
> > Your original words were:
> >
> > || The alignment requirement of an aggregate is the same as the most
> > || severe alignment requirement of any of its components.
> >
> > See that "is the same as" ? That's what I'm objecting to.
> >
> > Had you written "is at least as strict as", or even "is often the same
> > as", I wouldn't have commented. But you said "is the same as".
>
> OK, I'll bite. How could the alignment requirement of an aggregate
> be stricter than the strictest alignment requirement of its components?

In one sense they can't. The alignment may be stricter, but the alignment
_requirement_ isn't. That is, a compiler may choose to align any struct on
a four-byte boundary, even if it only has chars in it, but it isn't
_required_ to. Therefore, the alignment _requirement_ is only one-byte,
even if the compiler unnecessarily enforces a stronger alignment.

However, in another sense (the sense it seems to be used in the Standard),
the term "alignment requirement" is whatever the compiler decides it to be,
not what it physically needs to be, in which case "is at least as strict
as" would be more accurate.

--

Ciao,                       Paul D. DeRocco
Paul                        mailto:pderocco@ix.netcom.com
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/09/10 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> "Clive D.W. Feather" wrote:
> >     struct fred { char c [40000]; int x; char cc; } s;
> >     char *pp;
> >     unsigned int N = // some value determined elsewhere
> >
> >     if (N > sizeof s.c)
> >         pp = (char *) &s + offsetof (struct fred, cc);
>
> Assuming a 16 bit 'size_t', there are three possibilities here:
>
> 1. The 'offsetof' value gets converted to a signed 16 bit type
...
> 2. The 'offsetof' value gets converted to a signed 16 bit type
...
> 3. The 'offsetof' value gets converted to a signed type with
...

No such conversion occurs.  Either you're misinformed about pointer
arithmetic, or you're inventing problems to create the need for your
proposal.

> > Here's another situation: on any system where either SIZE_MAX > INT_MAX,
> > the expression:
> >
> >     offsetof (struct jim, field) - 1
> >
> > can never be less than zero.
>
> HUH? offsetof(struct jim, field) could be 0. Subtract 1 and it would be
> less than 0.

`offsetof(struct jim, field)' has type size_t.  `1' has type int.  The
subtraction operation converts them to the same type.  Assuming size_t
has a conversion rank greater than or equal to the rank of int, the
arguments will be converted to size_t.  size_t is unsigned, so
arithmetic is done modulo SIZE_MAX+1.  So if the offsetof() is zero,
the subtraction yields SIZE_MAX.

> >>    assert( (ptrdiff_t)offsetof(a,b) >= 0 );
> >>
> >> will fail if the expression
> >>
> >>    ((c *)((char *)d + offsetof(a,b)))
> >>
> >> could cause a portability problem for appropriate value of a, b,
> >> c and d.
> >
> > Wrong. For all the reasons given above, *plus* a new one: if the offset
> > is greater than INT_MAX, the value produced by the cast is
> > implementation-defined *or* the implementation is allowed to raise a
> > signal.
>
> Yes, and that is NOT a portability problem? It IS a portability problem
> unless your definition of portability is much more restrictive than mine.

This problem occurs only in the assert(), not in the pointer calculation.

paul

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/10 Raw View

Paul Jarc wrote:
>
> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> > OK, I'll bite. How could the alignment requirement of an aggregate
> > be stricter than the strictest alignment requirement of its components?
>
> Easy.  Assume an implementation that aligns all structs on 2-byte (or
> greater) boundaries, just to be extra safe, or for any other reason.
> Then assume the declaration `struct foo { char ch; };'.  The standard
> doesn't require the alignment of the struct to be greater than the
> strictest alignment of a member, but it allows it.  An implementation
> is thus free to define the struct's alignment requirement this way.

But the extra alignment is not required. It is something added by choice
and is somewhat arbitrary. However, James Kuyper does make the point
properly...

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/10 Raw View

In article <37D8FD9C.4CA5AB5@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> > OK, I'll bite. How could the alignment requirement of an aggregate
>> > be stricter than the strictest alignment requirement of its components?
>>
>> Easy.
[...]

>But the extra alignment is not required. It is something added by choice
>and is somewhat arbitrary.

(1) Who are you to say what is "required" and what is not ? WG14 decided
that padding was required if the implementer said so.

(2) You asked "How could [it] be stricter", not "How could [it] be
required to be stricter". If you're going to participate in comp.std.c,
you need to realize that exact wording matters.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/11 Raw View

James Kuyper wrote:
>
> Max TenEyck Woodbury wrote:
> >
> > "Clive D.W. Feather" wrote:
> > >
> > > In article <37D57457.F9286CB6@cds.duke.edu>, Max TenEyck Woodbury
> > > <mtew@cds.duke.edu> writes:
> ...
> > OK, I'll bite. How could the alignment requirement of an aggregate
> > be stricter than the strictest alignment requirement of its components?
>
> It's possible to prove that the alignment requirement of a struct or
> union must be a positive integral multiple of the least common multiple
> of all the alignement requirements of the members. Beyond that,
> alignment is completely implementation-defined.
>
> Two plausible examples:
> 1. The implementation chooses to require all structs (or all unions),
> regardless of their contents, to be aligned with the same restrictions
> as the most strictly aligned basic type.
>
> 2. There happen to be two basic types with alignement requirements such
> that the larger alignment requirement is not an integer multiple of the
> smaller one. In that case, the alignment requirement of a structure
> containing both types must be a mulitple of the least common multiple of
> the two alignment requirements, which will be larger than either one.
>
> The second case is pretty obscure, though I've heard of machines with
> 5-byte words where it might come up. However, the first case isn't
> particularly odd.
>

Case 1 is an implementer's decision above and beyond the actual requirement.

Case 2 is the one that proves the point. Conceded. But note that it does not
make much sense if a binary addressing scheme is used. I believe that binary
addressing has been standard since the IBM 1600 series went out of use and
I never heard of a 'C' implementation for those machines. You might be able
to find something like this on the bit level for the 36 bit machines, but even
that is a stretch. (weak pun. actually the STRETCH was a 64 bit machine.)

>>>> It is a characterization of
>>>> alignment requirements. I think it is an important characteristic.
>>>
>>> Why ?
>>
>> Why not?
>
> One good reason would be that it's incorrect.

OK, but you had to go way out in left field to get that one...

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/11 Raw View

Paul Jarc wrote:
>
> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
>> I could be mistaken, but I believe pointer arithmetic is signed.
>
> The language in the standard regarding pointer arithmetic (C9X
> 6.5.6p8) does not differentiate between signed and unsigned integer
> operands, nor does it mention coercion of the integer operands to a
> signed or unsigned type.

So the result is undefined? Come on, pointer arithmetic is done so
often that 'undefining' it would make the language useless for almost
any practical purpose. The only reasonable alternative to signed
arithmetic for pointers is to allow BOTH signed and unsigned versions
depending on context.

>> (I checked two of several books on 'C' on my bookshelf and they said
>> that pointer arithmetic uses 'int's, but they may not be definitive
>> or I could have misunderstood something.)
>
> They're not definitive, because they're not the standard.  The
> standard is definitive; all else is not and may be of arbitrarily low
> quality as description.  Exception: if all you care about is making
> your program work on a particular implementation which may or may not
> be conforming, then that implementation's source code is definitive;
> all else is not and may be of arbitrarily low quality as description.
> (Of course, we don't concern ourselves too much with that case while
> here in the std groups.)

(Au contraire, that kind of argument has been raised innumerable times.
The standard has to deal with the 'common practice' at its base...)

I said that the references I could check were not definitive. As for
the quality of the documents, that is why I consulted two widely
different sources. One was an implementation document, but the other
was a much broader reference (K & R). I checked two more just now; one
didn't address the issue at all and the other was more ambiguous than
the first two I consulted.

As for my possible misunderstanding, further reading indicates that
the standard may be silent on this. At least no one has quoted anything
definitive. That makes the result 'undefined'? Are you really going to
tell me that you can't use subscripts in a 'C' program? At worst, the
results of pointer arithmetic are implementation defined. There should
be no problem with sufficiently small offsets, but offsets in the upper
half of the 'size_t' range may cause trouble.

So, what would be the impact of changing the return type of 'offsetof'
from an unsigned to a signed type? Clive says that this would be a 'Quiet
Change', but I believe it would not be. For those programs that would be
effected, their operation is implicitly implementation defined at best
and not subject to required diagnostics. With the change in place, the
behavior would still be implementation defined, but some diagnostics would
be required because of constraint violations unless the implementation
could assure the correct result. The net would be an incremental
improvement in portability.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/11 Raw View

In article <37D8131D.B715732A@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> I have. Your original words were:
>>
>> || The alignment requirement of an aggregate is the same as the most
>> || severe alignment requirement of any of its components.
>>
>> See that "is the same as" ? That's what I'm objecting to.

>OK, I'll bite. How could the alignment requirement of an aggregate
>be stricter than the strictest alignment requirement of its components?

Easy. The compiler writer decides that all scalar types have no
alignment requirements whatever, but all structure and union types are
aligned to 16 bytes.

Others have given examples where it *has* to happen: e.g. if the
alignment of long is 8 bytes but the alignment of float is 6 bytes, then
there exists a structure with an alignment requirement of 24.

>>> It is a characterization of
>>> alignment requirements. I think it is an important characteristic.
>> Why ?
>Why not?

Because it's not true, for a start.

>> The implementation *defines* these things; it doesn't have to "consider"
>> them.
>An implementation may or may not define alignment requirements as it
>sees fit, but it has to consider them in any case.

Only in the sense that it has to consider anything involved in writing a
compiler. The implementation *defines* what the alignment is and
generates code appropriately; this differs from (e.g.) the requirement
that INT_MAX <= LONG_MAX, where the Standard constrains the implementer.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/11 Raw View

In article <37D808B7.D558638@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
>>>
>>>     assert( (ptrdiff_t)offsetof(a,b) >= 0 );
>>>
>>> will fail if the expression
>>>
>>>     ((c *)((char *)d + offsetof(a,b)))
>>>
>>> could cause a portability problem for appropriate value of a, b,
>>> c and d.
>>
>> If that offsetof() would overflow ptrdiff_t, then the result of the
>> cast is implementation-defined.
>
>Yep. And that is precisely when you need to check for portability
>problems.

Only if you're casting the result of offsetof to ptrdiff_t. It's a
circular argument.

>> The assert() is no help there.
>If the 'assert' fails, then the expression may or may not produce
>the correct result. If the 'assert' passes, then the expression will
>produce the correct result on a conforming implementation.

Wrong.

Case 1: Thus the assertion can pass even if the conversion within it has
a portability problem. Proof: the conversion of unsigned type to a
signed type that cannot represent the value being converted is
implementation defined. There is *no* requirement that the result be
negative.

Case 2: The assertion can fail even if the second expression you give
has no portability problem. Proof: the case where the offsetof has a
result greater than PTRDIFF_MAX and the conversion in the assert *does*
yield a negative number. An example was given in another of my postings.

>I could be mistaken, but I believe pointer arithmetic is signed.

You are (mistaken, that is). Proof: the *lack* of any relevant wording
about the signedness of types or the application of the usual (or
indeed, any) arithmetic conversions in 6.{3,5}.6.

>So, the 'assert' may have to be conditioned out for a number of
>specific implementations where those implementations' details
>compensate for the problem, but you have to investigate those
>details if the 'assert' fails. Having the 'assert' fail indicates
>that an important assumption no longer holds and you need to
>check it out.

The assertion is useless for testing portability. And it's only an
"important assumption" if you're doing a specific thing that makes that
assumption. It's not needed in much general programming.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/11 Raw View

In article <37D82144.7D57071A@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>>     struct fred { char c [40000]; int x; char cc; } s;
>>     char *pp;
>>     unsigned int N = // some value determined elsewhere
>>
>>     if (N > sizeof s.c)
>>         pp = (char *) &s + offsetof (struct fred, cc);
>
>Assuming a 16 bit 'size_t', there are three possibilities here:
>
>1. The 'offsetof' value gets converted to a signed 16 bit type

Not on a conforming implementation, it had better not be.

>> Strictly conforming code in C9X, undefined with your change.

I stand by this.

>>>   In other words, any implementation that could
>>>   generate problematic offsets would have to extend the language
>>>   in order to be able to use them.
>> Not true.
>It either has to change the conversion rules or ignore overflows.

There *are* no relevant conversion rules. If you disagree, please quote
from any of C89, C9X FCD1, or C9X FDIS (at least the second of these was
widely distributed electronically, and I believe there are no relevant
differences between the three).

>Permitted, but extensions.

Changing the conversion rules is not permitted at all. Ignoring
overflows is a permitted extension.

>> Here's another situation: on any system where either SIZE_MAX > INT_MAX,
>> the expression:
>>
>>     offsetof (struct jim, field) - 1
>>
>> can never be less than zero.
>
>HUH? offsetof(struct jim, field) could be 0. Subtract 1 and it would be
>less than 0.

Learn the conversion rules that you are claiming support you: if
SIZE_MAX > INT_MAX, then the 1 is converted to the unsigned type size_t
and the result of the subtraction has that type.

>>> The fact that any of the 'offsetof' values that would become
>>> undefined with the change have significant restrictions on what
>>> can be done with them was the missing point in the discussion.
>> Very missing, since it isn't true.
>Arrrg. Read what you yourself wrote half a page down!

If you're talking about conversion to ptrdiff_t, then I don't see that
as a "significant restriction". Certainly *not* one you can claim
supports your suggestion to change the type to ptrdiff_t. Again, you're
being circular.

>>>    assert( (ptrdiff_t)offsetof(a,b) >= 0 );

>> Wrong. For all the reasons given above, *plus* a new one: if the offset
>> is greater than INT_MAX, the value produced by the cast is
>> implementation-defined *or* the implementation is allowed to raise a
>> signal.
>
>Yes, and that is NOT a portability problem? It IS a portability problem
>unless your definition of portability is much more restrictive than mine.

It doesn't stop me writing portable code or using such offsets in
strictly conforming code. I *do* have to be careful how I use them, but
that applies any time I convert a value from one type to a potentially
smaller one, particularly when the latter is signed.

In other words, it's precisely the same problem as I get with either of:

    (int) sizeof x
    (int) (ptr2 - ptr1)

both of which potentially cause problems. It's not a new problem, and
it's certainly not solved by making offsetof return ptrdiff_t.

>>> The introduction of data structures with negative offsets is
>>> not consistent with some of the more obscure aspects of 'C'
>>> and 'C++'.
>> Correct (for some value of "obscure").
>I agree with you and you get snide... The handling of arithmetic
>overflow is hardly one of 'C's more lucid aspects.

But neither is the presence of overflow issues "obscure" in my book.
Hence the parenthetic comment.

>>> If such a capability is to be introduced, it should
>>> be as a pointer attribute rather than as a structure attribute.
>> No comment until I see a proposal.
>It's been posted in another thread. In fact, someone else came up with
>almost the same suggestion.

I may have overlooked it or may not have realized it was a proposal.
Would you care to provide a message-ID, quote it here, or start a new
thread ?

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: 1999/09/11 Raw View

Max TenEyck Woodbury wrote:
>
> Paul Jarc wrote:
> >
> > Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> >> I could be mistaken, but I believe pointer arithmetic is signed.
> >
> > The language in the standard regarding pointer arithmetic (C9X
> > 6.5.6p8) does not differentiate between signed and unsigned integer
> > operands, nor does it mention coercion of the integer operands to a
> > signed or unsigned type.
>
> So the result is undefined? Come on, pointer arithmetic is done so

No - it's very precisely defined, in a way that doesn't differentiate
between them.

> often that 'undefining' it would make the language useless for almost
> any practical purpose. The only reasonable alternative to signed
> arithmetic for pointers is to allow BOTH signed and unsigned versions
> depending on context.

Bingo! Re-read 6.5.6p8. The description is entirely in terms of the
value of the integer, and offsets from position the pointer points at.
There's no description of any conversion of signed to unsigned, or
vice-versa, nor need of any.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/09 Raw View

In article <37D57457.F9286CB6@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes

>>> I was defining an important characteristic of alignment requirements. It was
>>> Clive that tried to turn this into a rule/requirement. I did NOT claim that it
>>> was a rule, only that 'C' made provisions for it and it COULD be an
>>> implementation requirement.
>> No you didn't - go back and read the words you *wrote*. The first quoted
>> sentence is *not* an important characteristic of alignment requirements.
>Huh? I suggest YOU read what I wrote.

I have. Your original words were:

|| The alignment requirement of an aggregate is the same as the most
|| severe alignment requirement of any of its components.

See that "is the same as" ? That's what I'm objecting to.

Had you written "is at least as strict as", or even "is often the same
as", I wouldn't have commented. But you said "is the same as".

>It is a characterization of
>alignment requirements. I think it is an important characteristic.

Why ?

>It is also a characteristic of aggregates that an implementation
>has to consider.

The implementation *defines* these things; it doesn't have to "consider"
them.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/09 Raw View

In article <37D5A981.B76880C4@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>>> The big question is if this problem has actually come up enough times that
>>> people have put in defensive code against these overflows and if the change
>>> would break that code.
>>
>> *What* problem ?
>
>First, in the VAST majority of cases, changing the type returned by
>'offsetof' from 'size_t' to 'ptrdiff_t' would have no impact on the
>programs that use 'offsetof'. Hypothesizing a program that would be
>impacted by the change, that impact represents the problem under
>discussion.

Okay, I think.

>> *What* overflow ? I asked before and you still haven't answered it.
>> I think that this claim is a straw man.
>The change from an unsigned type to a signed type changes the point
>in the number representation cycle where arithmetic overflow is
>designated to occur. Since we are discussing programs that would be
>impacted by this change, it follows that the change in overflow
>behavior is the source of that impact. That is the overflow under
>discussion.

Okay.

>> I think that this claim is a straw man.
>And I had a similar feeling about your claim.
>However, I will address this as a serious question.

I will clarify my position: you appear to be requesting a Quiet Change
in the sense the Rationale capitalises that term. If so, I only have to
show *one* correct program whose meaning would change or that would
become undefined. Making a Quiet Change is *automatically* a negative
property of a proposal - it may not be enough to defeat it, but you have
to justify the Quiet Change *as well as* justifying the proposal.

>Q. So, how could changing the return type of 'offsetof' from 'size_t'
>   to 'ptrdiff_t' introduce undefined behavior into a program?
>A. By introducing an arithmetic overflow that causes an exception.

That's one possibility. See below.

>Q. When would that happen?
>A. When the offset was too big to fit in a 'ptrdiff_t' but not so
>   big that it would overflow a 'size_t'. In other words, when to
>   objects under consideration are large.
>Q. What is the smallest case where this might happen?
>A. 'size_t' could be as small as 16 bits.
>Q. So there would be a problem on small machines?
>A. Theoretically, yes. In practice no. If a program had such a
>   large offset on such a small machine, it could not, in theory,
>   combine it with a pointer without getting an overflow.

Why not ? Consider the following code:

    struct fred { char c [40000]; int x; char cc; } s;
    char *pp;
    unsigned int N = // some value determined elsewhere

    if (N > sizeof s.c)
        pp = (char *) &s + offsetof (struct fred, cc);
    else
        pp = (char *) &s + N;

Strictly conforming code in C9X, undefined with your change.

[Yes, this particular code could be written differently, but the offset
calculation might be decoupled from the addition.]

>In
>   practice, address arithmetic overflows are ignored on such
>   small machines.

Possibly, but not necessarily.

>In other words, any implementation that could
>   generate problematic offsets would have to extend the language
>   in order to be able to use them.

Not true.

>Any problems with the change would occur with objects in the
>upper half of the size_t range.

That's the case for this particular issue, yes.

>From this discussion, I believe
>it is clear that either choice of 'offsetof' type introduces
>unpleasant surprises.

You still fail to demonstrate this.

Here's another situation: on any system where either SIZE_MAX > INT_MAX,
the expression:

    offsetof (struct jim, field) - 1

can never be less than zero. With your proposal, it suddenly could be.
That is a Quiet Change. I have no idea what it could affect.

>So, I see your point. For programs with large objects, changing
>the return type of 'offsetof' would introduce undefined behavior.

And for other programs the results of calculations could change.

>The fact that any of the 'offsetof' values that would become
>undefined with the change have significant restrictions on what
>can be done with them was the missing point in the discussion.

Very missing, since it isn't true.

>Which ties back to another thread. I was under the apparently
>mistaken impression that the portability problems associated
>with combining 'offsetof' values with pointers had to do with
>some of the more off-the-wall properties of old 36 bit
>main-frames.

Huh ? I may have missed this, but I am unaware of any such portability
problems.

>This makes it clear that the problem is with
>'offsetof' values in the upper half of the size_t range. With
>that, I can now include 'assert's that check for this kind of
>problem. Specifically --
>
>    assert( (ptrdiff_t)offsetof(a,b) >= 0 );
>
>will fail if the expression
>
>    ((c *)((char *)d + offsetof(a,b)))
>
>could cause a portability problem for appropriate value of a, b,
>c and d.

Wrong. For all the reasons given above, *plus* a new one: if the offset
is greater than INT_MAX, the value produced by the cast is
implementation-defined *or* the implementation is allowed to raise a
signal.

>The introduction of data structures with negative offsets is
>not consistent with some of the more obscure aspects of 'C'
>and 'C++'.

Correct (for some value of "obscure").

>If such a capability is to be introduced, it should
>be as a pointer attribute rather than as a structure attribute.

No comment until I see a proposal.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/07 Raw View

"Clive D.W. Feather" wrote:
>
> <mtew@cds.duke.edu> writes
>>>>>>>> The alignment requirement of an aggregate is the same as the most severe
>>>>>>>> alignment requirement of any of its components.
>>>>>>> No such rule in C.
> [...]
>> I was defining an important characteristic of alignment requirements. It was
>> Clive that tried to turn this into a rule/requirement. I did NOT claim that it
>> was a rule, only that 'C' made provisions for it and it COULD be an
>> implementation requirement.
>
> No you didn't - go back and read the words you *wrote*. The first quoted
> sentence is *not* an important characteristic of alignment requirements.

Huh? I suggest YOU read what I wrote. It is a characterization of
alignment requirements. I think it is an important characteristic.
It is also a characteristic of aggregates that an implementation
has to consider. If you disagree, please explain your reasoning.

The characterization is far from complete; it only acts as a place
holder for a much larger discussion.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/08 Raw View

"Clive D.W. Feather" wrote:
>
> <mtew@cds.duke.edu> writes
>>>> What MAJOR change? There is exactly ONE minor change to the language
>>>> required, the type returned by 'offsetof'. The rest of the semantics
>>>> is already there; it is just hard to get to.
>>> 1. I (and I expect Clive, too, though I don't claim to speak for him)
>>>    think this (changing the type of offsetof()) is a larger change
>>>    than you seem to think.
>> That is possible, but I suspect that the problem would be smaller than
>> Clive implies, and that the change would remove rather than add problems.
>
> I suspect that you are wrong. You will make some currently-valid code
> become undefined.

OK. I see your point but only after having worked out what you were
NOT saying. See below.

>> The big question is if this problem has actually come up enough times that
>> people have put in defensive code against these overflows and if the change
>> would break that code.
>
> *What* problem ?

First, in the VAST majority of cases, changing the type returned by
'offsetof' from 'size_t' to 'ptrdiff_t' would have no impact on the
programs that use 'offsetof'. Hypothesizing a program that would be
impacted by the change, that impact represents the problem under
discussion.

> *What* overflow ? I asked before and you still haven't answered it.
> I think that this claim is a straw man.

The change from an unsigned type to a signed type changes the point
in the number representation cycle where arithmetic overflow is
designated to occur. Since we are discussing programs that would be
impacted by this change, it follows that the change in overflow
behavior is the source of that impact. That is the overflow under
discussion.

> I asked before and you still haven't answered it.

Actually, I have answered this before, but there was an element
to your question that was hard to understand.

> I think that this claim is a straw man.

And I had a similar feeling about your claim.
However, I will address this as a serious question.

Q. So, how could changing the return type of 'offsetof' from 'size_t'
   to 'ptrdiff_t' introduce undefined behavior into a program?
A. By introducing an arithmetic overflow that causes an exception.
Q. When would that happen?
A. When the offset was too big to fit in a 'ptrdiff_t' but not so
   big that it would overflow a 'size_t'. In other words, when to
   objects under consideration are large.
Q. What is the smallest case where this might happen?
A. 'size_t' could be as small as 16 bits.
Q. So there would be a problem on small machines?
A. Theoretically, yes. In practice no. If a program had such a
   large offset on such a small machine, it could not, in theory,
   combine it with a pointer without getting an overflow. In
   practice, address arithmetic overflows are ignored on such
   small machines. In other words, any implementation that could
   generate problematic offsets would have to extend the language
   in order to be able to use them.

Any problems with the change would occur with objects in the
upper half of the size_t range. From this discussion, I believe
it is clear that either choice of 'offsetof' type introduces
unpleasant surprises.

So, I see your point. For programs with large objects, changing
the return type of 'offsetof' would introduce undefined behavior.
The fact that any of the 'offsetof' values that would become
undefined with the change have significant restrictions on what
can be done with them was the missing point in the discussion.

Which ties back to another thread. I was under the apparently
mistaken impression that the portability problems associated
with combining 'offsetof' values with pointers had to do with
some of the more off-the-wall properties of old 36 bit
main-frames. This makes it clear that the problem is with
'offsetof' values in the upper half of the size_t range. With
that, I can now include 'assert's that check for this kind of
problem. Specifically --

    assert( (ptrdiff_t)offsetof(a,b) >= 0 );

will fail if the expression

    ((c *)((char *)d + offsetof(a,b)))

could cause a portability problem for appropriate value of a, b,
c and d.

So, thank you. I learned something important and non-obvious
from this discussion.

Conclusion:

The introduction of data structures with negative offsets is
not consistent with some of the more obscure aspects of 'C'
and 'C++'. If such a capability is to be introduced, it should
be as a pointer attribute rather than as a structure attribute.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/09/08 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
>     assert( (ptrdiff_t)offsetof(a,b) >= 0 );
>
> will fail if the expression
>
>     ((c *)((char *)d + offsetof(a,b)))
>
> could cause a portability problem for appropriate value of a, b,
> c and d.

If that offsetof() would overflow ptrdiff_t, then the result of the
cast is implementation-defined.  The assert() is no help there.  Nor
do you need any help, AFAICT: if d points to a sufficiently large
object (such as one of type a), I don't see anything in the standard
that would allow that addition to cause problems.

paul

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/09/06 Raw View

Paul Jarc wrote:
>
> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
>>>>>> The alignment requirement of an aggregate is the same as the most severe
>>>>>> alignment requirement of any of its components.
>>>>> No such rule in C.
>>>> OK. I'm oversimplifying.
>>>
>>> No, you're simply wrong.
>>
>> Then what's all that verbiage about padding for?
>
> To *allow* implementations to use that rule (or whatever rule is
> appropriate for their respective platforms), but not to *require* it.
> Assume a hardware platform where ints can be accessed at any address,
> but can be accessed faster on 4-byte boundaries.  An implementation on
> this platform might well force most ints onto 4-byte boundaries for
> performance, but it might also allow packed structures (with a
> command-line option, or pragma, or whatever) with no padding, which
> violate the otherwise-imposed 4-byte alignment rule for ints.

I was defining an important characteristic of alignment requirements. It was
Clive that tried to turn this into a rule/requirement. I did NOT claim that it
was a rule, only that 'C' made provisions for it and it COULD be an implementation
requirement. Further, structures with negative offsets do not have different
alignment rules; they just have to be applied in a consistent fashion. This
set of complications was one of many reasons for NOT implementing frames.

>>>> This is a minor consideration, but smaller problems have been
>>>> addressed before.
>>>
>>> Not when it requires a major change to the language.
>>
>> What MAJOR change? There is exactly ONE minor change to the language
>> required, the type returned by 'offsetof'. The rest of the semantics
>> is already there; it is just hard to get to.
>
> 1. I (and I expect Clive, too, though I don't claim to speak for him)
>    think this (changing the type of offsetof()) is a larger change
>    than you seem to think.

That is possible, but I suspect that the problem would be smaller than
Clive implies, and that the change would remove rather than add problems.

The big question is if this problem has actually come up enough times that
people have put in defensive code against these overflows and if the change
would break that code. The most obvious defense is to convert the result of
'offsetof' immediately back to the proper signed type namely 'ptrdiff_t'.
The proposed change would make that conversion a no-op but would NOT break
anything. Other fixups that would be broken by such a change would almost
certainly not be portable.

> 2. As has been pointed out elsewhere in this thread, the results of
>    offsetof() would have to remain the same, so changing the type is
>    unrelated to negative offsets.  Your struct foo*, even if it points
>    to the middle of a struct, must be adjusted to point to the
>    beginning of the struct when cast to a void*, char*, or the like.
>    It is with these kinds of pointers that offsetof() is used, so
>    offsetof() would have to continue to give the offset from the
>    beginning.

It would depend on the implementation and would have to be consistent.

The adjustment for 'void *' is required, but adjusting 'char *' would
depend on the implementation. A broken implementation that calls for
'char *' arguments to 'memcpy' would have to convert casts to 'char *',
but one that used 'void *' arguments to 'memcpy' would not want to make
offset adjustments for casts to 'char *'.

However, I am beginning to think that the offset should be an attribute
of the pointer rather than the structure itself.

> So since offsetof() does not change, all that's left of your proposal
> is designating the zero-offset member.
> For the purpose of squeezing a large struct into the small-signed-
> offset range, this can be better done by the compiler, since it knows
> how struct members are laid out, and you don't.  So all this is just
> an optimization which is *already allowed*, provided the conversions
> between pointer types are handled correctly.

Agreed. In most cases, optimizations should be left to the compiler.
However, a mechanism for getting around compiler weaknesses is often
helpful. If negative offsets were not powerful explanatory devices, I'd
not have submitted this suggestion.

> For the purpose of recovering a struct pointer from a member pointer,
> you'd want to be able to do it for any member, so there's no point in
> making any member special.
>
>>>> Negative offsets do require some thought to the meaning of
>>>> unions. I believe the results are more consistent if all the
>>>> names refer to the same address rather than the same storage.
>
> Another branch of this thread established that negative offsets are an
> allowed optimization, but the offsets of union members would be
> independent; they would all (really) start at the same address, and
> the foo* representations would be offset from that, independent of how
> other members' types' pointers are offset internally.  The union
> type's pointer could also have an offset - for example, if all its
> members' pointers' types have the same nonzero offset, it might be
> convenient to use the same offset for the union pointer type.  But
> this is also independent of all the other offsets.
>
>> struct xyz_tag { int item1; int item2; } * block;
>>    ...
>>    return &(block->item2);
>>    ...
>>
>> now try to recover the original value of 'block' from the returned
>> value.
>
> As established in the other branch, converting that int* back to a
> struct xyz_tag* will have to apply any offset used in struct xyz_tag*'s.
> You won't be able to take advantage of the fact that the struct xyz_tag*
> happens to point straight to item2.
>
> But I wouldn't mind seeing
>     T foo;
>     &foo==(T.member*)&foo.member;
> The cast could require that its operand be of type pointer-to-T, with
> T being the type of the member specified in the cast (a bit unusual
> for a cast to require an operand of a particular type, granted), and
> could apply the offset of that member to the pointer, yielding a
> pointer to the struct, with the appropriate type.

Excellent! Now if we can just put it in formal language and get it
accepted...

>>>> The problem comes when that storage
>>>> unit contains more than one named bit field. Since they all have the
>>>> same 'address', how do you distinguish which one is the zero offset bit.
>>>
>>> Why do you need to ? You want a mapping from name to storage unit, so it
>>> doesn't matter if more than one name can do the job.
>>
>> If that were the case, why doesn't the standard let you take the
>> address of a bit field?
>
> Because a pointer designates a unique object, and bit-fields cannot be
> efficiently deisgnated this way on most hardware.  But you don't need
> to single out any particular member; you just want an offset for the
> struct foo*, and you can attach "use the (not necessarily unique)
> offset of this member" to a bit-field as well as any other member.
> You're not using the struct pointer to get straight to the member;
> you're still using foo->member.  So there's no need for the
> struct foo* to designate a particular member, bit-field or otherwise.

Ok. But it is still a portability problem waiting to happen...

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/06 Raw View

In article <37CB4F64.A89F2BA9@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> No, I'm talking about in the Standard, not specific hardware.
>You were the one saying registers could have addresses, not I.

Yes. The Standard gives addresses to registers. See elsewhere in this
thread.

>>>>> The alignment requirement of an aggregate is the same as the most severe
>>>>> alignment requirement of any of its components.
>>>> No such rule in C.
>>> OK. I'm oversimplifying.
>> No, you're simply wrong.
>Then what's all that verbiage about padding for?

For defining the rules for padding.

The alignment requirements for any type T are no more severe than
sizeof(T). They can be less. I believe the alignment requirements for a
structure containing T are at least as severe as those for T, though I
could be wrong.

The alignment requirements for:

    struct { char x; }

could be (say) 24.

>>> Yep. It doesn't happen often. It does happen. This is a minor consideration,
>>> but smaller problems have been addressed before.
>> Not when it requires a major change to the language.
>What MAJOR change? There is exactly ONE minor change to the language
>required, the type returned by 'offsetof'. The rest of the semantics
>is already there; it is just hard to get to.

I disagree. For example, what does it do to compatibility of types ? If
types differing only in offset are compatible, you get some nasty issues
with pointers to compatible types being incompatible. If they aren't
compatible, what does that do to assignability ?

>>> Negative offsets do require some thought to the meaning of unions. I believe
>>> the results are more consistent if all the names refer to the same address
>>> rather than the same storage.
>> This is an example where the semantics are not obvious. This makes it
>> more than a minor change to the language, and you still haven't
>> justified the requirement.
>The semantics have been worked by the implimenters of C++ who put their
>virtual function pointers at negative offsets.

Even if that is both true and relevant, it means that there are other
changes required.

>At worst, the semantics
>of 'union' is a bit ambiguous. By specifying that the invariant of unions
>is that all the named components have the same address, that ambiguity is
>removed in a fashion consistent with all extant usage. This is NOT a major
>change.

If you put two such types in a union then - according to your proposal -
the common initial subsequence rule suddenly breaks.

So that's two areas of concern just off the top of my head.

>>>>> makes frames similar to some implementations of stacks,
>>>> and completely unlike others. I see this use of the term "frame" as thus
>>>> more misleading than anything else.
>>> The term is a reference to 'stack frame' which, in some implementations, puts
>>> new items at the lowest address.
>> And on other implementations doesn't. That's what I mean by "more
>> misleading than anything else".
>2. I said _SOME_ implementations work this way. You say some do not.

Agreed. Reread the core of my only complaint here: the sentence
beginning "I see".

>>> I'm not sure. I suspect that the answer should be 'yes'.
>> But you haven't considered it yet? This is another place where the
>> proposal is non-trivial.
>I have now considered it quite carefully, but I am willing to consider other
>opinions. I do not come up with complete answers instantaneously. I admit
>that there are problems that I have to work out the details of before I am
>confident in my results.

You completely miss my point. Elsewhere you say:

>There is exactly ONE minor change to the language
>required,

I am showing you that this is far from being the case. And, since it is
not the case, your claim that it's a small change for a small benefit
goes away - it's a large change for a small benefit.

[...]
>No, I said 'registers' were unaddressable. The bit fields are unaddressable.
>The storage unit that contain them, which are NOT the same as the bit field
>even if the bit field completely fills them, may be addressable

I'll accept that this may be my misunderstanding.

>>> Do you have a good syntax that would preserve these options without
>>> getting into the complete offset designation mess?
>> No, but it's not my proposal to design.
>Why not? I'm not selfish. You can have a piece of it if you can make
>a useful contribution.

If I thought it was a useful proposal, I would weigh in whether you
liked it or not ! As it is, I'm not clear that *you* understand the
semantic issues well enough yet. And those should *always* be considered
before the syntax, which is usually the easiest bit.

>>> Usage (MAJOR):
>>>
>>> Negative offsets can be used to portably reverse the effect of taking
>>> the address of component of a structure.
>>
>> That doesn't even make sense to me.
>
>OK. So you don't understand.
>
>struct xyz_tag { int item1; int item2; } * block;
>   ...
>   return &(block->item2);
>   ...
>
>now try to recover the original value of 'block' from the returned
>value.

    int *i2ptr;
    // Set i2ptr to the pointer.

    unsigned char *p = (unsigned char *) i2ptr;
    p -= offsetof (struct xyz_tag, item2);
    return (struct xyz_tag *) p;

or even:

    return (struct xyz_tag *)
           ((unsigned char *) i2ptr - offsetof (struct xyz_tag, item2));

Clearly it can't be done if you don't know which member it returns the
address of. Now explain how it *can* be done using your negative offsets
proposal. To avoid side tracking, I suggest you simply stick
__ZERO_OFFSET__ somewhere in the relevant member declaration.

>>> Fine. Give that storage unit a designation and you can specify it as the
>>> zero offset.
>> Use the name of any of the bit-fields to designate the unit.
>No. That would be ambiguous and would imply that you could take the
>address of a bit field, which is specifically forbidden by the standard.
>You need a designation that makes it clear that the address is associated
>with all the bit fields in the storage unit.

No you don't. You want the mapping "bit-field -> address", which does
not have to be injective. You aren't looking for the different mapping
"address -> SETOF (bit-field)".

>>> The problem comes when that storage
>>> unit contains more than one named bit field. Since they all have the
>>> same 'address', how do you distinguish which one is the zero offset bit.
>> Why do you need to ? You want a mapping from name to storage unit, so it
>> doesn't matter if more than one name can do the job.
>If that were the case, why doesn't the standard let you take the
>address of a bit field?

Because nobody saw the need, and because there's the minor question of
what the type would be.

An extension that said "&(struct-or-union.bit-field) has type (void *)
and address of the first byte of the storage unit holding the bit-field"
would be simple; offhand I can't see anything else that would need
changing.

But I'm not proposing it because I don't see any pressing need.

>>>>> Also, the
>>>>> offsetof macro
>>>>> would return a ptr_diff, rather than a size_t.
>>>> Oh great, change a whole load of semantics of existing code.
[...]
>'size_t' is an unsigned type. It has to be converted to an 'int'
>before it can be combined with a pointer.

False. RTS (C9X 6.5.7p2 and p3; equivalent locations in C89).

>That means the handling
>of arithmatic overflow in the calculation of 'offsetof' is not
>symmetric with the handling of arithmatic overflow in the pointer
>calculation on some implementations.

This is true no matter what the type of offsetof is, since overflow
happens when the pointer no longer points within the object or to the
address one beyond the end.

But since you start with a false premise, you can't use it as part of
your argument.

> That can lead to difficult to
>identify portability problems.

Huh ? Examples, please ?

>>> Further, size_t is often the largest commonly
>>> used size integer in the machine, so that the only difference between
>>> size_t and ptr_diff is the handling of overflow conditions, and those are
>>> commonly ignored.
>> Not by those who write proper code. And you would be changing defined
>> behaviour to undefined behaviour.
>Wrong. The behavior is undefined because of the asymmetry of the overflow
>handling.

See above.

>The entire section that allows consistent overflow cancellation
>to be defined behavior gets tossed because of that one inconsistency.

Sorry, but this doesn't even make sense to me. Can you please reword it;
in particular, what is "consistent overflow cancellation" ?

>>>>> Since a zero width bit field is useless
>>>> Wrong.
>>> Cite a usage please.
>> RT[F]S
>That was an honest request. I still can't find my copy of the standard.

Then how would a citation help ? Or, put another way, why are you making
such strong assertions about what the standard says if you haven't read
the relevant bit recently ?

>You didn't have to get [F] snide.

I'm sorry if it was over the top (though it's not so much snide as
common usage), but this was the third or fourth wrong statement in your
posting.

>>>>> and bit fields can not be designated as
>>>>> the zero offset component,
>>> As pointed out above, bit fields can NOT be the zero
>>> offset component. They may be IN such a component, but
>>> not BE that component.
>> Which doesn't stop them designating it.
>Yes, it does. See above.

No it doesn't. See above.

>Further, using a bit field as the zero
>offset component introduces a whole raft of portability problems
>since the placement of bit fields into storage units is
>implementation defined.

Of course. But so is the allocation of offsets to fields in a structure.
[Note that, in both cases, there are restrictions on what the
implementation can do.]

>>>> Better syntax would be, for example, to indicate the zero offset
>>>> component by placing "&" after its identifier.
>>> Hmm. That would be confused with reference syntax in C++.
>> This is supposed to concern me?
>Not if you consider this sub-thread your own private domain. If you
>are really trying to be helpful, it is a problem.

I, personally, consider C++ "compatibility" to be a problem in
development of the C Standard. I believe there have been enough examples
in the past to support me (\u being a topical one). This may be a
personal failing, but nonetheless it is there.

But, as I've already said, syntax is the *last* thing you worry about in
designing a proposal. First you need it to work semantically.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/07 Raw View

In article <37D3D63E.CB5F8626@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>>>>>>> The alignment requirement of an aggregate is the same as the most severe
>>>>>>> alignment requirement of any of its components.
>>>>>> No such rule in C.
[...]
>I was defining an important characteristic of alignment requirements. It was
>Clive that tried to turn this into a rule/requirement. I did NOT claim that it
>was a rule, only that 'C' made provisions for it and it COULD be an
>implementation
>requirement.

No you didn't - go back and read the words you *wrote*. The first quoted
sentence is *not* an important characteristic of alignment requirements.

>>> What MAJOR change? There is exactly ONE minor change to the language
>>> required, the type returned by 'offsetof'. The rest of the semantics
>>> is already there; it is just hard to get to.
>> 1. I (and I expect Clive, too, though I don't claim to speak for him)
>>    think this (changing the type of offsetof()) is a larger change
>>    than you seem to think.
>That is possible, but I suspect that the problem would be smaller than
>Clive implies, and that the change would remove rather than add problems.

I suspect that you are wrong. You will make some currently-valid code
become undefined.

>The big question is if this problem has actually come up enough times that
>people have put in defensive code against these overflows and if the change
>would break that code.

*What* problem ? *What* overflow ? I asked before and you still haven't
answered it. I think that this claim is a straw man.

And you haven't addressed the other issues I have with your proposal.

>The adjustment for 'void *' is required, but adjusting 'char *' would
>depend on the implementation. A broken implementation that calls for

6.2.5 para 26:
    A pointer to void shall have the  same  representation
    and  alignment  requirements  as  a  pointer  to a character
    type.39)

    39)The  same  representation  and alignment requirements are
       meant  to  imply  interchangeability  as   arguments   to
       functions,  return  values from functions, and members of
       unions.

6.5.2.2 para 6:
    If the function is
    defined with a type that does not include a  prototype,  and
    the   types   of  the  arguments  after  promotion  are  not
    compatible with those of the parameters after promotion, the
    behavior is undefined, except for the following cases:
         -- one  promoted  type is a signed integer type, the other
            promoted type is  the  corresponding  unsigned  integer
            type, and the value is representable in both types;
         -- both  types  are  pointers  to qualified or unqualified
            versions of a character type or void.

Both of which show that converting between ([[un]signed] char *) and
(void *) must not alter the bit pattern in memory (or, more precisely,
must produce a bit pattern for a value that equals the original value).

>> For the purpose of squeezing a large struct into the small-signed-
>> offset range, this can be better done by the compiler,

>Agreed. In most cases, optimizations should be left to the compiler.
>However, a mechanism for getting around compiler weaknesses is often
>helpful.

But for most compilers and most structures there *is* no weakness. So
we're talking about addressing a *tiny* issue that would better be done
by either fixing the compiler or by adding 0.1% to your system clock
speed.

> If negative offsets were not powerful explanatory devices, I'd
>not have submitted this suggestion.

Explanatory of *what* ? Again, I've yet to see you use them to explain
something.

>> But I wouldn't mind seeing
>>     T foo;
>>     &foo==(T.member*)&foo.member;

>Excellent! Now if we can just put it in formal language and get it
>accepted...

Let's see: you want to define:

    (T.member *) v

as meaning:

    (T *)( (char *) (v) - offsetof (T, member) )

right ? Yes, it looks simple (unless there's a parsing issue that I've
overlooked), but equally it appears next to useless.

>>> If that were the case, why doesn't the standard let you take the
>>> address of a bit field?
>> Because a pointer designates a unique object, and bit-fields cannot be
>> efficiently deisgnated this way on most hardware.  But you don't need
>> to single out any particular member; you just want an offset for the
>> struct foo*, and you can attach "use the (not necessarily unique)
>> offset of this member" to a bit-field as well as any other member.

>Ok. But it is still a portability problem waiting to happen...

Why is it ? If you're using this feature, you want to point at the unit
containing a specific bit-field. So specify that one. If the order of
bit-fields matters, why are you using this facility ?

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: scjones@thor.sdrc.com (Larry Jones)
Date: 1999/09/01 Raw View

Clive D.W. Feather (clive@on-the-train.demon.co.uk) wrote:
>
> Neither of these has any exceptions for registers.
>
> Basically, it's easier to treat everything as having and address and
> then say that certain addresses can't be obtained, than to deal with
> addressable and unaddressable storage.

And it matches the real world where registers (nearly) always have
addresses, although they're usually not in the same address space as
memory (i.e., their addresses tend to be things like r0 or eax rather
than simple numbers).

-Larry Jones

Well, it's all a question of perspective. -- Calvin

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/01 Raw View

In article <37CB280D.5517F718@wn.net>, Andrew F. Vesper <avesper@wn.net>
writes
>Clive, I generally agree with your postings, but I don't understand
>where or how the C standard claims that registers (non-bit-field objects)
>have addresses. Would you be so kind as to elucidate, please?

    3.6
    [#1] byte
    addressable  unit  of  data storage large enough to hold any
    member  of  the  basic  character  set  of   the   execution
    environment

    [#2]  NOTE 1 It  is  possible to express the address of each
    individual byte of an object uniquely.

    6.2.4  Storage durations of objects

    [#2]  The  lifetime  of  an object is the portion of program
    execution during which storage is guaranteed to be  reserved
    for it.  An object exists, has a  constant  address,25)  and
    retains its last-stored value throughout  its  lifetime.26)

Neither of these has any exceptions for registers.

Basically, it's easier to treat everything as having and address and
then say that certain addresses can't be obtained, than to deal with
addressable and unaddressable storage.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/09/01 Raw View

In article <37CAF64E.20079BC@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> No, I don't agree that the change is small, and I don't think you
>> adequately addressed some of the semantic issues already asked.
>
>The only _change_ mentioned was changing the type returned by
>'offsetof' from 'size_t' to 'ptrdiff_t', which is a small change.

Firstly, that is not a small change, since it changes the behaviour of a
range of code (due, for example, to the rules for signed and unsigned
types).

>As I pointed out, many implementations actually use an expression
>that is naturally a 'ptrdiff_t' and then cast it to 'size_t'.

So what ? That doesn't mean you aren't proposing a large change.

But the main point you haven't addressed is the range of semantic
questions your proposal raises, such as how compatibility and
assignability of types is affected.

And I'd still like to see some examples that can't be solved by a simple
wrapper.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/09/01 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> >>>> The alignment requirement of an aggregate is the same as the most severe
> >>>> alignment requirement of any of its components.
> >>> No such rule in C.
> >> OK. I'm oversimplifying.
> >
> > No, you're simply wrong.
>
> Then what's all that verbiage about padding for?

To *allow* implementations to use that rule (or whatever rule is
appropriate for their respective platforms), but not to *require* it.
Assume a hardware platform where ints can be accessed at any address,
but can be accessed faster on 4-byte boundaries.  An implementation on
this platform might well force most ints onto 4-byte boundaries for
performance, but it might also allow packed structures (with a
command-line option, or pragma, or whatever) with no padding, which
violate the otherwise-imposed 4-byte alignment rule for ints.

> >> This is a minor consideration, but smaller problems have been
> >> addressed before.
> >
> > Not when it requires a major change to the language.
>
> What MAJOR change? There is exactly ONE minor change to the language
> required, the type returned by 'offsetof'. The rest of the semantics
> is already there; it is just hard to get to.

1. I (and I expect Clive, too, though I don't claim to speak for him)
   think this (changing the type of offsetof()) is a larger change
   than you seem to think.
2. As has been pointed out elsewhere in this thread, the results of
   offsetof() would have to remain the same, so changing the type is
   unrelated to negative offsets.  Your struct foo*, even if it points
   to the middle of a struct, must be adjusted to point to the
   beginning of the struct when cast to a void*, char*, or the like.
   It is with these kinds of pointers that offsetof() is used, so
   offsetof() would have to continue to give the offset from the
   beginning.

So since offsetof() does not change, all that's left of your proposal
is designating the zero-offset member.
For the purpose of squeezing a large struct into the small-signed-
offset range, this can be better done by the compiler, since it knows
how struct members are laid out, and you don't.  So all this is just
an optimization which is *already allowed*, provided the conversions
between pointer types are handled correctly.
For the purpose of recovering a struct pointer from a member pointer,
you'd want to be able to do it for any member, so there's no point in
making any member special.

> >> Negative offsets do require some thought to the meaning of
> >> unions. I believe the results are more consistent if all the
> >> names refer to the same address rather than the same storage.

Another branch of this thread established that negative offsets are an
allowed optimization, but the offsets of union members would be
independent; they would all (really) start at the same address, and
the foo* representations would be offset from that, independent of how
other members' types' pointers are offset internally.  The union
type's pointer could also have an offset - for example, if all its
members' pointers' types have the same nonzero offset, it might be
convenient to use the same offset for the union pointer type.  But
this is also independent of all the other offsets.

> struct xyz_tag { int item1; int item2; } * block;
>    ...
>    return &(block->item2);
>    ...
>
> now try to recover the original value of 'block' from the returned
> value.

As established in the other branch, converting that int* back to a
struct xyz_tag* will have to apply any offset used in struct xyz_tag*'s.
You won't be able to take advantage of the fact that the struct xyz_tag*
happens to point straight to item2.

But I wouldn't mind seeing
    T foo;
    &foo==(T.member*)&foo.member;
The cast could require that its operand be of type pointer-to-T, with
T being the type of the member specified in the cast (a bit unusual
for a cast to require an operand of a particular type, granted), and
could apply the offset of that member to the pointer, yielding a
pointer to the struct, with the appropriate type.

> >> The problem comes when that storage
> >> unit contains more than one named bit field. Since they all have the
> >> same 'address', how do you distinguish which one is the zero offset bit.
> >
> > Why do you need to ? You want a mapping from name to storage unit, so it
> > doesn't matter if more than one name can do the job.
>
> If that were the case, why doesn't the standard let you take the
> address of a bit field?

Because a pointer designates a unique object, and bit-fields cannot be
efficiently deisgnated this way on most hardware.  But you don't need
to single out any particular member; you just want an offset for the
struct foo*, and you can attach "use the (not necessarily unique)
offset of this member" to a bit-field as well as any other member.
You're not using the struct pointer to get straight to the member;
you're still using foo->member.  So there's no need for the
struct foo* to designate a particular member, bit-field or otherwise.

paul
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/30 Raw View

"Clive D.W. Feather" wrote:
>
> In article <37C180C6.91AD49D4@cds.duke.edu>, Max TenEyck Woodbury
> <mtew@cds.duke.edu> writes
>>> No, I don't see.
>>>
>>> About the only time this has an explanatory power is when you are
>>> handing around pointers to the middles or ends of data structures. The
>>> only time that is likely - other than just for the hell of it - is if
>>> you are conforming to some "external" memory layout, which is likely to
>>> be internal to your machine (and the only examples that come to mind are
>>> when the pointer is to just beyond the end of the structure). These are
>>> inherently unportable, and a simple wrapper to move the pointer to the
>>> start is not going to be hard. Certainly nowhere near as hard as
>>> redefining lots of semantics of the language.
>>
>>Please, you're exaggerating for effect. The change is small and the
>>semantics is already there. It's just not conveniently accessible.
>
> No, I don't agree that the change is small, and I don't think you
> adequately addressed some of the semantic issues already asked.

The only _change_ mentioned was changing the type returned by
'offsetof' from 'size_t' to 'ptrdiff_t', which is a small change.
As I pointed out, many implementations actually use an expression
that is naturally a 'ptrdiff_t' and then cast it to 'size_t'.

There was also an extension to be made, which I hung on the ': 0'
syntax. Since ': 0' has an assigned meaning, it can not be used
the way I suggested. I withdrew that part of the suggestion. That
doesn't mean the concept is not useful.

>> The main reason the representations are unportable is because negative
>> offsets can not be expressed easily.
>
> That's not what I said. Even if they could be expressed easily, you
> haven't shown me an example where there's any real benefit to it.
>
>> And your 'for the hell of it' may well be a specification that the
>> designer has no control over, like the interface requirements of the
>> 'C' run-time heap routines.
>
> Chicken and egg: why would someone be writing such specifications if the
> facility is hard to use ?

Specifications may or may not take ease of implementation into
account. In many cases, the specification does not invoke negative
offsets directly. Often negative offsets are specified implicitly
to counter the effects of positive offsets already part of an extant
design.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/31 Raw View

In article <37C35999.93A3D9D7@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>Thank you. That makes it clear that
>a) I need a more up-to-date copy of the standard, and

That usage has *always* been in the Standard.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Barry Margolin <barmar@bbnplanet.com>
Date: 1999/08/31 Raw View

In article <cKDxakEBWSy3EwXz@romana.davros.org>,
Clive D.W. Feather <clive@demon.net> wrote:
>In article <37BD94ED.72775ABB@cds.duke.edu>, Max TenEyck Woodbury
><mtew@cds.duke.edu> writes
>>The standard says you
>>can't take the address of a register.
>
>Exactly: which bit of "you just can't obtain them" was unclear ? All
>non-bit-field objects have addresses.

If you can't obtain it, what does it mean to "have an address", as far as
the C spec is concerned?

--
Barry Margolin, barmar@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/31 Raw View

It may be that the base designation should be a pointer attribute
rather than a structure attribute. For example:

struct xyz_tag { int item1; int item2 };

struct xyz_tag * block;      /* a normal pointer to the whole xyz_tag struct */
struct xyz_tag.item2 * item; /* a pointer to an xyz_tag struct using item2 as base */

item->item2;  /* Note that you still need the member selector to get item2 */

block = item; /* NOT a bit for bit copy. The representation will change */

block = (struct xyz_tag.item2 *)0;  /* Undefined behaviour! */

But there are some questions --

block = item = (struct xyz_tag *)0; /* OK? */
(item != NULL);                     /* TRUE? */

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Andrew F. Vesper" <avesper@wn.net>
Date: 1999/08/31 Raw View

"Clive D.W. Feather" wrote:

> Clive>> Actually, registers do have addresses, you just can't obtain them.
> Max TenEyck Woodbury>Depends on the hardware and the implementation.
>
> No, I'm talking about in the Standard, not specific hardware.
>
> Max>The standard says you
> Max>can't take the address of a register.
>
> Exactly: which bit of "you just can't obtain them" was unclear ? All
> non-bit-field objects have addresses.

(I hope the attributions are clear.)

Clive, I generally agree with your postings, but I don't understand
where or how the C standard claims that registers (non-bit-field objects)
have addresses. Would you be so kind as to elucidate, please?
--
Andy V (OpenGL Alpha Geek)
"In order to make progress, one must leave the door to the unknown ajar."
Richard P. Feynman, quoted by Jagdish Mehra in _The Beat of a Different Drum_.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/31 Raw View

"Clive D.W. Feather" wrote:
>
>> Depends on the hardware and the implementation.
>
> No, I'm talking about in the Standard, not specific hardware.

You were the one saying registers could have addresses, not I.

>> The standard says you can't take the address of a register.
>
> Exactly: which bit of "you just can't obtain them" was unclear ? All
> non-bit-field objects have addresses.

What part of 'not in all cases' did you fail to understand?

>>>> The alignment requirement of an aggregate is the same as the most severe
>>>> alignment requirement of any of its components.
>>> No such rule in C.
>> OK. I'm oversimplifying.
>
> No, you're simply wrong.

Then what's all that verbiage about padding for?

>>>> this restriction can reduce program efficiency.
>>> Only in the rare case that the structure size is *slightly* larger than
>>> the maximum size of an index. This is a relatively rare situation, as
>>> other posts have pointed out.
>> Yep. It doesn't happen often. It does happen. This is a minor consideration,
>> but smaller problems have been addressed before.
>
> Not when it requires a major change to the language.

What MAJOR change? There is exactly ONE minor change to the language
required, the type returned by 'offsetof'. The rest of the semantics
is already there; it is just hard to get to.

>> Negative offsets do require some thought to the meaning of unions. I believe
>> the results are more consistent if all the names refer to the same address
>> rather than the same storage.
>
> This is an example where the semantics are not obvious. This makes it
> more than a minor change to the language, and you still haven't
> justified the requirement.

The semantics have been worked by the implimenters of C++ who put their
virtual function pointers at negative offsets. At worst, the semantics
of 'union' is a bit ambiguous. By specifying that the invariant of unions
is that all the named components have the same address, that ambiguity is
removed in a fashion consistent with all extant usage. This is NOT a major
change.

>> If you want to use frames and overlap storage
>> with a struct, wrap the frame in a struct first.
>
> Surely this is just as bad as the wrappers that I suggested and you were
> complaining about ?

1. I don't recommend implementing frames.

2. If you do need a wrapper, it indicates that you are trying to get
   around one of the design characteristics of frames. That may mean
   you should not be using a frame in that context. More importantly,
   it will communicate to someone who is familiar with frames that you
   are subverting the usual union rules.

3. The wrappers needed to implement negative offsets are needed to get
   around a structural (but not conceptual) incompleteness in the
   language and thus contribute little to a design. The wrapper of a
   frame conveys a significant exception to the expected behavior and
   conveys significant design information.

>>>> makes frames similar to some implementations of stacks,
>>> and completely unlike others. I see this use of the term "frame" as thus
>>> more misleading than anything else.
>> The term is a reference to 'stack frame' which, in some implementations, puts
>> new items at the lowest address.
>
> And on other implementations doesn't. That's what I mean by "more
> misleading than anything else".

1. (Again!) I don't recommend implementing frames.

2. I said _SOME_ implementations work this way. You say some do not.
   I believe that is implicit in the use of the word 'some'. If I were
   seriously recommending the implementation of frames, I'd ask

   A. "Does anybody who understands what I'm talking about have a better
      way of explaining it?" [but I'm NOT asking that question!]

   B. "Is there a better word that we can hang this concept on?" [And I'm
      not asking that question either!]

   but frames are more difficult to understand than specifying that the
   base address of a structure is associated with a specific member. They
   are discussed here because they are a significant alternative. Discussing
   them up front and dismissing should make the debate go a bit faster if
   someone reads the whole thing.

>>>> and is no where near as
>>>> complex as the introduction of a new kind of structure.
>>> Is it not ? Are structures compatible if they are identical except that
>>> their zero offset components are different ? Are they assignable ?
>> I'm not sure. I suspect that the answer should be 'yes'.
>
> But you haven't considered it yet? This is another place where the
> proposal is non-trivial.

I have now considered it quite carefully, but I am willing to consider other
opinions. I do not come up with complete answers instantaneously. I admit
that there are problems that I have to work out the details of before I am
confident in my results. After due deliberation, I conclude that the answer
to both of your questions should be 'yes'.

>>>> 2.  A zero offset designation can only be applied to an addressable data
>>>> component of the structure.
>>> Why ? Why shouldn't the storage unit holding a given bit-field be the
>>> zero offset component ?
>> Because, technically, bit fields do NOT have addresses.
>
> I didn't say they did.
>
>> If you designate
>> the storage unit holding the bit field properly, you could indeed use
>> it as the zero offset component.
>
> But previously you've taken the view that these storage units are
> unaddressable. Which is it ?

No, I said 'registers' were unaddressable. The bit fields are unaddressable.
The storage unit that contain them, which are NOT the same as the bit field
even if the bit field completely fills them, may be addressable and IF you
invent/construct a way to designate that address, you might want to use it
as the base address of a structure. The antecedent of 'it' was intended to
be 'storage unit', and not 'bit field'.

>> Do you have a good syntax that would preserve these options without
>> getting into the complete offset designation mess?
>
> No, but it's not my proposal to design.

Why not? I'm not selfish. You can have a piece of it if you can make
a useful contribution.

>> I'm glad you're asking these questions. It clarifies the problem being
>> addressed. It wasn't really stated was it?
>
> No.
>
>> So, to clarify:
>>
>> Usage (MAJOR):
>>
>> Negative offsets can be used to portably reverse the effect of taking
>> the address of component of a structure.
>
> That doesn't even make sense to me.

OK. So you don't understand.

struct xyz_tag { int item1; int item2; } * block;
   ...
   return &(block->item2);
   ...

now try to recover the original value of 'block' from the returned
value.

>>>> A bit field can not be designated as a zero offset
>>>> component since it does not have an address.
>>> But the storage unit it resides in does.
>> Fine. Give that storage unit a designation and you can specify it as the
>> zero offset.
>
> Use the name of any of the bit-fields to designate the unit.

No. That would be ambiguous and would imply that you could take the
address of a bit field, which is specifically forbidden by the standard.
You need a designation that makes it clear that the address is associated
with all the bit fields in the storage unit.

>> The problem comes when that storage
>> unit contains more than one named bit field. Since they all have the
>> same 'address', how do you distinguish which one is the zero offset bit.
>
> Why do you need to ? You want a mapping from name to storage unit, so it
> doesn't matter if more than one name can do the job.

If that were the case, why doesn't the standard let you take the
address of a bit field?

>> it will only create
>> massive and unneeded confusion in this thread.
>
> Why ?

It has. That is enough. If you want to discuss the addressability of
bit fields, please do it in another thread.

>>>> Also, the
>>>> offsetof macro
>>>> would return a ptr_diff, rather than a size_t.
>>>
>>> Oh great, change a whole load of semantics of existing code.
>>
>> Since many implementations already use a pointer difference to implement
>> offsetof, the change would bring more systems into conformance rather
>> than the other way around.
>
> Not true. The fact that a pointer difference is used in the
> implementation is irrelevant, provided that they correctly cast the
> result.

'size_t' is an unsigned type. It has to be converted to an 'int'
before it can be combined with a pointer. That means the handling
of arithmatic overflow in the calculation of 'offsetof' is not
symmetric with the handling of arithmatic overflow in the pointer
calculation on some implementations. That can lead to difficult to
identify portability problems. If 'offsetof' returned a 'ptrdiff_t',
the implementation and usage would be more consistent. That make
it relevant.

In other words, the fact that 'offsetof' returns an unsigned type
is a design defect in both 'C' and 'C++' and should be fixed even
if negative offsets never see the light.

>> Further, size_t is often the largest commonly
>> used size integer in the machine, so that the only difference between
>> size_t and ptr_diff is the handling of overflow conditions, and those are
>> commonly ignored.
>
> Not by those who write proper code. And you would be changing defined
> behaviour to undefined behaviour.

Wrong. The behavior is undefined because of the asymmetry of the overflow
handling. The entire section that allows consistent overflow cancellation
to be defined behavior gets tossed because of that one inconsistency.
Correcting this problem would remove some undefined behaviors rather
than add new ones.

>>>> Since a zero width bit field is useless
>>> Wrong.
>> Cite a usage please.
>
> RT[F]S

That was an honest request. I still can't find my copy of the standard.
You didn't have to get [F] snide.

Since someone else was kind enough to provide the citation you did
NOT include, and from that citation it became clear that ': 0' could
not be used as the base designator, an alternative is needed.
Constructive suggestions would be helpful.

Note that Clive's reference to a designator without a name could
be useful. Language similar to the unnamed bit field language could
be used to attach this attribute to an otherwise inaccessible
storage unit.

>>>> and bit fields can not be designated as
>>>> the zero offset component,
>> As pointed out above, bit fields can NOT be the zero
>> offset component. They may be IN such a component, but
>> not BE that component.
>
> Which doesn't stop them designating it.

Yes, it does. See above. Further, using a bit field as the zero
offset component introduces a whole raft of portability problems
since the placement of bit fields into storage units is
implementation defined. If you can get everyone to agree to an
addressing scheme for bit fields, then this restriction should
be removed, but until then allowing a bit field as the designator
of the base component of a structure is not going to be very
helpful.

>>> Better syntax would be, for example, to indicate the zero offset
>>> component by placing "&" after its identifier.
>> Hmm. That would be confused with reference syntax in C++.
>
> This is supposed to concern me?

Not if you consider this sub-thread your own private domain. If you
are really trying to be helpful, it is a problem.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/30 Raw View

In article <37C180C6.91AD49D4@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> No, I don't see.
>>
>> About the only time this has an explanatory power is when you are
>> handing around pointers to the middles or ends of data structures. The
>> only time that is likely - other than just for the hell of it - is if
>> you are conforming to some "external" memory layout, which is likely to
>> be internal to your machine (and the only examples that come to mind are
>> when the pointer is to just beyond the end of the structure). These are
>> inherently unportable, and a simple wrapper to move the pointer to the
>> start is not going to be hard. Certainly nowhere near as hard as
>> redefining lots of semantics of the language.
>
>Please, you're exaggerating for effect. The change is small and the
>semantics is already there. It's just not conveniently accessible.

No, I don't agree that the change is small, and I don't think you
adequately addressed some of the semantic issues already asked.

>The main reason the representations are unportable is because negative
>offsets can not be expressed easily.

That's not what I said. Even if they could be expressed easily, you
haven't shown me an example where there's any real benefit to it.

>And your 'for the hell of it' may well be a specification that the
>designer has no control over, like the interface requirements of the
>'C' run-time heap routines.

Chicken and egg: why would someone be writing such specifications if the
facility is hard to use ?

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/30 Raw View

In article <37BD94ED.72775ABB@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> >Aggregate: A group of related data items with distinct names and meanings
>> >grouped together under a common designation.
>> Warning: not the definition used in C.
>Right. C lumps arrays, structs and unions together.

Wrong.

>I
>made the distinction for purposes of discussion.

Fine, but you should have made it clear.

>>>Not all aggregate components have addresses. In particular, bit fields (those
>>>components with bit length specifications) do not have addresses.
>> No, but C does have the concept of the address of an unnamed component
>> holding the bit field.
>Yep. but that isn't the same as the address of a bit field

It's very close, though.

>and it doesn't
>apply at all if the bit field is in a register.

Nor does anything else to do with addressses in your sense. Also see
below.

>>>The other
>>>unaddressable data item, registers,
>> Actually, registers do have addresses, you just can't obtain them.
>Depends on the hardware and the implementation.

No, I'm talking about in the Standard, not specific hardware.

>The standard says you
>can't take the address of a register.

Exactly: which bit of "you just can't obtain them" was unclear ? All
non-bit-field objects have addresses.

>>>The alignment requirement of an aggregate is the same as the most severe
>>>alignment requirement of any of its components.
>> No such rule in C.
>OK. I'm oversimplifying.

No, you're simply wrong.

>>>this restriction can reduce program efficiency.
>> Only in the rare case that the structure size is *slightly* larger than
>> the maximum size of an index. This is a relatively rare situation, as
>> other posts have pointed out.
>Yep. It doesn't happen often. It does happen. This is a minor consideration,
>but smaller problems have been addressed before.

Not when it requires a major change to the language.

>Negative offsets do require some thought to the meaning of unions. I believe
>the results are more consistent if all the names refer to the same address
>rather than the same storage.

This is an example where the semantics are not obvious. This makes it
more than a minor change to the language, and you still haven't
justified the requirement.

>If you want to use frames and overlap storage
>with a struct, wrap the frame in a struct first.

Surely this is just as bad as the wrappers that I suggested and you were
complaining about ?

>> >makes frames similar to some implementations of stacks,
>> and completely unlike others. I see this use of the term "frame" as thus
>> more misleading than anything else.
>The term is a reference to 'stack frame' which, in some implementations, puts
>new items at the lowest address.

And on other implementations doesn't. That's what I mean by "more
misleading than anything else".

>>>and is no where near as
>>>complex as the introduction of a new kind of structure.
>> Is it not ? Are structures compatible if they are identical except that
>> their zero offset components are different ? Are they assignable ?
>I'm not sure. I suspect that the answer should be 'yes'.

But you haven't considered it yet ? This is another place where the
proposal is non-trivial.

>>>2.  A zero offset designation can only be applied to an addressable data
>>>component of the structure.
>> Why ? Why shouldn't the storage unit holding a given bit-field be the
>> zero offset component ?
>Because, technically, bit fields do NOT have addresses.

I didn't say they did.

>If you designate
>the storage unit holding the bit field properly, you could indeed use
>it as the zero offset component.

But previously you've taken the view that these storage units are
unaddressable. Which is it ?

>Do you have a good syntax that would preserve these options without
>getting into the complete offset designation mess?

No, but it's not my proposal to design.

>I'm glad you're asking these questions. It clarifies the problem being
>addressed. It wasn't really stated was it?

No.

>So, to clarify:
>
>Usage (MAJOR):
>
>Negative offsets can be used to portably reverse the effect of taking
>the address of component of a structure.

That doesn't even make sense to me.

>> >A bit field can not be designated as a zero offset
>> >component since it does not have an address.
>> But the storage unit it resides in does.
>Fine. Give that storage unit a designation and you can specify it as the
>zero offset.

Use the name of any of the bit-fields to designate the unit.

>The problem comes when that storage
>unit contains more than one named bit field. Since they all have the
>same 'address', how do you distinguish which one is the zero offset bit.

Why do you need to ? You want a mapping from name to storage unit, so it
doesn't matter if more than one name can do the job.

>it will only create
>massive and unneeded confusion in this thread.

Why ?

>> >Also, the
>> >offsetof macro
>> >would return a ptr_diff, rather than a size_t.
>>
>> Oh great, change a whole load of semantics of existing code.
>
>Since many implementations already use a pointer difference to implement
>offsetof, the change would bring more systems into conformance rather
>than the other way around.

Not true. The fact that a pointer difference is used in the
implementation is irrelevant, provided that they correctly cast the
result.

>Further, size_t is often the largest commonly
>used size integer in the machine, so that the only difference between
>size_t and ptr_diff is the handling of overflow conditions, and those are
>commonly ignored.

Not by those who write proper code. And you would be changing defined
behaviour to undefined behaviour.

>> >Since a zero width bit field is useless
>> Wrong.
>Cite a usage please.

RT[F]S

>>>and bit fields can not be designated as
>>>the zero offset component,
>As pointed out above, bit fields can NOT be the zero
>offset component. They may be IN such a component, but
>not BE that component.

Which doesn't stop them designating it.

>> Better syntax would be, for example, to indicate the zero offset
>> component by placing "&" after its identifier.
>Hmm. That would be confused with reference syntax in C++.

This is supposed to concern me ?

But that was one suggestion. I'm sure people can think of many more.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/30 Raw View

In article <UXew3.579$kg.75970@ptah.visi.com>, Peter Seebach
<seebs@plethora.net> writes
>>A compiler can, if it chooses, make all "struct xyz *" pointers
>>point to the middle of all "struct xyz"s.

>Now, we have, by definition,
>       (void *) &b == (void *) &(b.firstbyte);
>       (void *) &(b.firstbyte) < (void *) &(b.byte01)
>       (void *) &(b.byte01) < (void *) &(b.byte02)
>       ...
>       (void *) &(b.bytefe) < (void *) &(b.lastbyte)
>
>All of these are mandated by the spec, yes?

Ignoring the misuse of void * in the last one, yes.

>  How can the pointer be in the
>middle of anything?

You store a (struct a *) as a pointer to the middle of the structure.
Whenever it's converted to or compared with another type, you add the
offset. I don't think there's a way to tell the difference.

>Hmm.  While I see that it's possible to do this "behind the scenes", I'm
>convinced that you can't do it in any way that shows up in offsetof().

Agreed, since offsetof is defined in terms of pointers to the start of
the structure. However, if efficient use of pointer+constant addressing
is the only issue, it wouldn't matter.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/25 Raw View

Paul Jarc wrote:
>
> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
>> ...
>> Cite a usage please.
>
> ...

Thank you. That makes it clear that

a) I need a more up-to-date copy of the standard, and

b) That particular syntactic niche has already been filled so
   some other construction would be needed to designate the
   base member of a structure.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/08/24 Raw View

seebs@plethora.net (Peter Seebach) writes:
> Does this break "all structures smell alike"?  Imagine a pair of structures
> in a union.

I think it can be handled correctly.

>  struct dummy { int a; };
>  union {
>   struct foo {
>    struct dummy a;
>   } foo;
>   struct bar {
>    struct dummy a;
>    int b;
>   } bar;
>  } u;
>
> Now, we know that:
>  (void *)&u == (void *)&foo == (void *)&bar == (void *)&(bar.a) ==
>  (void *)&(foo.a) == (void *)&(foo.a.a)

s/foo/u.foo/
s/bar/u.bar/
(also assuming struct dummy starts with `int a;')

The key thing is that all these involve casting the struct or union
pointer to void*.  That conversion can (and must) result in a pointer
to the beginning of the structure, but the unconverted pointer, which
is used for by-name (as opposed to by-offset) member access, can point
to the middle.
Since offsetof() is used in conjunction with a char* (or the like)
pointing to the structure, it would have to report the offset from the
beginning, not from where the struct foo* points.

>  int *ip = (int) (void *) &u;
>  *ip = 1;
> guarantees
>  u.foo.a.a == u.bar.a.a
>  and
>  u.foo.a.a == 1
> does it not?  That same pointer has to point to all of them, and the fixup
> has to be the same, so that the leading common initial members are the same,
> doesn't it?

I'd say so.  All the pointers must compare equal after conversion to
the same type; beforehand, they can point wherever is convenient for
member access.  The union pointer might be offset, especially if all
its members were structs with the same offset for their pointer types,
but I expect unions would usually not be bothered with.

paul

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/24 Raw View

Lisa Lippincott wrote:
>
> ...

Indeed, a very fine example. It can be done in C++ with quite a bit of
effort. It should be possible to do in 'C' with only an appropriate
declaration.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/08/24 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> Negative offsets are confusing enough as they stand without dragging
> in the whole bit address question. The problem comes when that storage
> unit contains more than one named bit field. Since they all have the
> same 'address', how do you distinguish which one is the zero offset bit.

Why would you want to?  For pointer conversions, all you need to know
is the offset of the (struct foo*) from the (void*)(struct foo*).  You
don't need to know which member the (struct foo*) points to.

> > >Since a zero width bit field is useless
> >
> > Wrong.
>
> Cite a usage please.

C9X 6.7.2.1p3:
# The expression that specifies the width of a bit-field shall be an
# integer constant expression that has nonnegative value that shall
# not exceed the number of bits in an object of the type that is
# specified if the colon and expression are omitted.  If the value is
# zero, the declaration shall have no declarator.

p10:
# A bit-field declaration with no declarator, but only a colon and a
# width, indicates an unnamed bit-field.94) As a special case, a
# bit-field structure member with a width of 0 indicates that no
# further bit-field is to be packed into the unit in which the
# previous bit-field, if any, was placed.

> I was under the impression that a reference to such a field
> would invoke a semantic constraint since it could not have
> any value other than zero, and you can not take its address
> since it is a bit field.

What does "invoke a semantic constraint" mean?
A zero-width bit-field has no value, not a zero value, and you cannot
take its address because it has no name (in addition to its being a
bit-field).

paul
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: postmast.root.admi.gov@iname.com (blargg)
Date: 1999/08/23 Raw View

In article <37BD731E.BF564215@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> wrote:

> blargg wrote:
> >
> > > In the second routine, you and the compiler ended up trying to second
> > > guess each other and the result was an extra instruction.
> >
> > Actually, that's the only way it could add 32768 to the pointer. At most
> > it could add 32767 to -32768 to the pointer in one instruction, so it adds
> > 0x10000 (the addis - add immediate shifted 16 bits left) then "adds"
> > -32768 to it.
>
> It could have subtracted -32768.

LOL. There is no subtract immediate. It's redundant. Just add a negative
value. It ain't no CISC processor.

> > > If you had
> > > passed in the mid-point pointer, you might have ended up with one
> > > instruction less instead of one instruction more.
> >
> > I would have done that in the example, except the compiler I use doesn't
> > optimize that correctly (it keeps the pointer on the stack - uck!). The
> > optimizer isn't done correctly to realize that objects as argments can be
> > put in registers.
>
> The reason it didn't do that is because a) you didn't tell it to, and b)
> the language does not provide a concise way to do it. I'm trying to fix (b).

Sure it does. Just pass in the mid_ptr as a parameter to the function. AS
I SAID, my compiler simply doesn't optimize that well, so I showed a case
where it *did* optimize properly.

> > > > I just don't see the general utility of this, especially if it involves
> > > > new syntax.
> > >
> > > It has only special utility, not general utility. It is a way to simplify
> > > what would otherwise be hard to explain code. It makes a concept, namely
> > > negative offsets, directly accessible where it is only indirectly
accessible
> > > now.
> >
> > Have you tried alternatives like I posted? What about a macro that does
> > the pointer addition (in case your compiler's optimizer isn't smart
> > enough, or you're using C)?
[snip]
> > I'm not convinced this can't be wrapped in some current construct that
> > makes it safe and convenient enough to use.
>
> Of course this can be done. I've done similar things more times than I care
> to remember. It's the excessive convolution that I'm trying to reduce.
[snip]

Isn't it interesting how people live in different worlds? I guess I should
be thankful that I've never had such severe problems with negative
offsets...

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: seebs@plethora.net (Peter Seebach)
Date: 1999/08/23 Raw View

In article <7pnvv3$rio$1@elf.bsdi.com>, Chris Torek <torek@elf.bsdi.com> wrote:
>A compiler can, if it chooses, make all "struct xyz *" pointers
>point to the middle of all "struct xyz"s.

I'm not sure it can.

> struct a {
>  char firstbyte;
  char byte01;
  char byte02;
  ...
  char bytefe;
>  char lastbyte;
> };

Hmm.  Just for convenience,
 struct a b;

Now, we have, by definition,
 (void *) &b == (void *) &(b.firstbyte);
 (void *) &(b.firstbyte) < (void *) &(b.byte01)
 (void *) &(b.byte01) < (void *) &(b.byte02)
 ...
 (void *) &(b.bytefe) < (void *) &(b.lastbyte)

All of these are mandated by the spec, yes?  How can the pointer be in the
middle of anything?

>These compilers (gcc and the Sun compilers both) already use offset
>pointers, yet those offsets are not visible to the C programmer.
>If negative offsets are important to some machine architecture,
>the compiler may optimize them in.  Although this "actual practice"
>example is for the frame and stack pointers, the principle applies
>equally well to all other pointers.

Hmm.  While I see that it's possible to do this "behind the scenes", I'm
convinced that you can't do it in any way that shows up in offsetof().

-s
--
Copyright 1999, All rights reserved.  Peter Seebach / seebs@plethora.net
C/Unix wizard, Pro-commerce radical, Spam fighter.  Boycott Spamazon!
Will work for interesting hardware.  http://www.plethora.net/~seebs/
Visit my new ISP <URL:http://www.plethora.net/> --- More Net, Less Spam!

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/23 Raw View

"Clive D.W. Feather" wrote:
>
>> Yes, but I'm also talking about conceptual efficiency. Negative offsets
>> have some explanatory power that could be used even if they are only
>> an optional part of the language.
>
> No, I don't see.
>
> About the only time this has an explanatory power is when you are
> handing around pointers to the middles or ends of data structures. The
> only time that is likely - other than just for the hell of it - is if
> you are conforming to some "external" memory layout, which is likely to
> be internal to your machine (and the only examples that come to mind are
> when the pointer is to just beyond the end of the structure). These are
> inherently unportable, and a simple wrapper to move the pointer to the
> start is not going to be hard. Certainly nowhere near as hard as
> redefining lots of semantics of the language.

Please, you're exaggerating for effect. The change is small and the
semantics is already there. It's just not conveniently accessible.

The main reason the representations are unportable is because negative
offsets can not be expressed easily. The wrapper you mentioned should
not be required.

And your 'for the hell of it' may well be a specification that the
designer has no control over, like the interface requirements of the
'C' run-time heap routines.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James Kuyper <kuyper@wizard.net>
Date: 1999/08/23 Raw View

Peter Seebach wrote:
>
> In article <7pnvv3$rio$1@elf.bsdi.com>, Chris Torek <torek@elf.bsdi.com> wrote:
> >A compiler can, if it chooses, make all "struct xyz *" pointers
> >point to the middle of all "struct xyz"s.
>
> I'm not sure it can.
>
> >       struct a {
> >               char firstbyte;
>                 char byte01;
>                 char byte02;
>                 ...
>                 char bytefe;
> >               char lastbyte;
> >       };
>
> Hmm.  Just for convenience,
>         struct a b;
>
> Now, we have, by definition,
>         (void *) &b == (void *) &(b.firstbyte);
>         (void *) &(b.firstbyte) < (void *) &(b.byte01)
>         (void *) &(b.byte01) < (void *) &(b.byte02)
>         ...
>         (void *) &(b.bytefe) < (void *) &(b.lastbyte)
>
> All of these are mandated by the spec, yes?  How can the pointer be in the
> middle of anything?

No - the results of the '<' operator are not defined on 'void *'.
However, you can fix it up by converting each of those pointers to 'char
*', and the equality comparison is OK as is.
The key thing is than an implementation is free to implement the
conversions to and from a structure pointer type, such that the address
contained in the struct pointer is offset by a fixed amount from the
address of the start of the struct, yet when cast to or from other
types, it acts as though it points at the beginning of the struct.

...
> Hmm.  While I see that it's possible to do this "behind the scenes", I'm
> convinced that you can't do it in any way that shows up in offsetof().

I agree with that.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/23 Raw View

blargg wrote:
>
> In article <37BD731E.BF564215@cds.duke.edu>, Max TenEyck Woodbury
> <mtew@cds.duke.edu> wrote:
>
>> It could have subtracted -32768.
>
> LOL. There is no subtract immediate. It's redundant. Just add a negative
> value. It ain't no CISC processor.

Ah. Interesting that that it is exactly at the asymmetry of the 2's
complement notation that the problem RISC architecture has a problem.
Oh, well, we're getting way off topic...

> > The reason it didn't do that is because a) you didn't tell it to, and b)
> > the language does not provide a concise way to do it. I'm trying to fix (b).
>
> Sure it does. Just pass in the mid_ptr as a parameter to the function. AS
> I SAID, my compiler simply doesn't optimize that well, so I showed a case
> where it *did* optimize properly.

Hmm. That wasn't what I got from your comments.

> > Of course this can be done. I've done similar things more times than I care
> > to remember. It's the excessive convolution that I'm trying to reduce.
> [snip]
>
> Isn't it interesting how people live in different worlds? I guess I should
> be thankful that I've never had such severe problems with negative
> offsets...

Yep. Trying to explain that you had to write three lines of code and 5 lines
of comments where a single line would have sufficed if the language had the
correct feature can be a lot of work. Then try explaining that it would take 10
lines of code to do it in a completely portable way so it could run on a
machine 15 years obsolete. It can be done, but doing it so you don't embarrass
yourself or your boss gets real tricky. Repeat six months later (for the same
code). Arrrrrg!

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: seebs@plethora.net (Peter Seebach)
Date: 1999/08/23 Raw View

In article <37C1A7C6.4864EE4@wizard.net>,
James Kuyper  <kuyper@wizard.net> wrote:
>No - the results of the '<' operator are not defined on 'void *'.
>However, you can fix it up by converting each of those pointers to 'char
>*', and the equality comparison is OK as is.

Argh.  I knew what I meant...

>The key thing is than an implementation is free to implement the
>conversions to and from a structure pointer type, such that the address
>contained in the struct pointer is offset by a fixed amount from the
>address of the start of the struct, yet when cast to or from other
>types, it acts as though it points at the beginning of the struct.

Hmm.

Does this break "all structures smell alike"?  Imagine a pair of structures
in a union.

 struct dummy { int a; };
 union {
  struct foo {
   struct dummy a;
  } foo;
  struct bar {
   struct dummy a;
   int b;
  } bar;
 } u;

Now, we know that:
 (void *)&u == (void *)&foo == (void *)&bar == (void *)&(bar.a) ==
 (void *)&(foo.a) == (void *)&(foo.a.a)

Therefore:

 int *ip = (int) (void *) &u;
 *ip = 1;
guarantees
 u.foo.a.a == u.bar.a.a
 and
 u.foo.a.a == 1
does it not?  That same pointer has to point to all of them, and the fixup
has to be the same, so that the leading common initial members are the same,
doesn't it?

-s
--
Copyright 1999, All rights reserved.  Peter Seebach / seebs@plethora.net
C/Unix wizard, Pro-commerce radical, Spam fighter.  Boycott Spamazon!
Will work for interesting hardware.  http://www.plethora.net/~seebs/
Visit my new ISP <URL:http://www.plethora.net/> --- More Net, Less Spam!

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Paul Jarc <prj@po.cwru.edu>
Date: 1999/08/20 Raw View

Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> #if defined(_NEGATIVE_OFFSETS_IMPLEMENTED)
...
> #else
> /* Since negative offsets are not implemented, stand on your ear... */
> ...
> #endif

That gives you twice as much code to maintain, with more opportunity
for variation between platforms.  What exactly is the problem with
offsetof()?

paul

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/21 Raw View

"Clive D.W. Feather" wrote:
>
> ...
>
> Are structures compatible if they are identical except that
> their zero offset components are different ? Are they assignable ?

Sorry. I got this wrong earlier.

Yes, they should be considered compatible and a direct assignment
of a pointer to one struct should convert directly to a pointer to
the other struct with appropriate arithmetic to change the offset.

If you wanted to make it tougher for the compiler writers, you could
allow for type matches with arbitrary shifts, but I suspect that this
could be a big source of problems. If someone wanted to do this, the
reader should be warned by a slightly more convoluted structure
declaration and a &(x->y) kind of construction in the code. (No, you
didn't ask about this, but it is a logical follow-on to your question.)
It should not be done.

The comment about dangerous was because I was thinking that the
'void *' conversion has to take place. The arithmetic has to be
done and the compiler has to be extended to generate the correct
code, but this type of conversion would be much safer than a
conversion through 'void *'.

The conclusion that they should be assignable stands for the
reason specified.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/23 Raw View

In article <37BC0AA8.E84540D3@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> However, even on such a machine, I suspect that at least 99% of structure
>> accesses would be to offsets under 128 bytes, so the impact of this
>> efficiency problem would be minor.
[...]

>Yes, but I'm also talking about conceptual efficiency. Negative offsets
>have some explanatory power that could be used even if they are only
>an optional part of the language.

No, I don't see.

About the only time this has an explanatory power is when you are
handing around pointers to the middles or ends of data structures. The
only time that is likely - other than just for the hell of it - is if
you are conforming to some "external" memory layout, which is likely to
be internal to your machine (and the only examples that come to mind are
when the pointer is to just beyond the end of the structure). These are
inherently unportable, and a simple wrapper to move the pointer to the
start is not going to be hard. Certainly nowhere near as hard as
redefining lots of semantics of the language.

In fact, I strongly suspect that almost all uses of this feature would
have significant negative explanatory power.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/23 Raw View

blargg wrote:
>
> > In the second routine, you and the compiler ended up trying to second
> > guess each other and the result was an extra instruction.
>
> Actually, that's the only way it could add 32768 to the pointer. At most
> it could add 32767 to -32768 to the pointer in one instruction, so it adds
> 0x10000 (the addis - add immediate shifted 16 bits left) then "adds"
> -32768 to it.

It could have subtracted -32768.

> > If you had
> > passed in the mid-point pointer, you might have ended up with one
> > instruction less instead of one instruction more.
>
> I would have done that in the example, except the compiler I use doesn't
> optimize that correctly (it keeps the pointer on the stack - uck!). The
> optimizer isn't done correctly to realize that objects as argments can be
> put in registers.

The reason it didn't do that is because a) you didn't tell it to, and b)
the language does not provide a concise way to do it. I'm trying to fix (b).

> > > I just don't see the general utility of this, especially if it involves
> > > new syntax.
> >
> > It has only special utility, not general utility. It is a way to simplify
> > what would otherwise be hard to explain code. It makes a concept, namely
> > negative offsets, directly accessible where it is only indirectly accessible
> > now.
>
> Have you tried alternatives like I posted? What about a macro that does
> the pointer addition (in case your compiler's optimizer isn't smart
> enough, or you're using C)?
>
>     #define make_mid_ptr( p )   ((void*) ((char*) (p) + sizeof (**p) / 2))
>
>     #define use_mid_ptr( type, p )  ((type*) ((char*) (p) - sizeof (type)))
>
> // usage
>
>     Foo* foo = // ...
>
>     void* foo_mid = make_mid_ptr( foo );
>
>     use_mid_ptr( Foo, foo_mid )->bar = 1234;
>
> I'm not convinced this can't be wrapped in some current construct that
> makes it safe and convenient enough to use.

Of course this can be done. I've done similar things more times than I care
to remember. It's the excessive convolution that I'm trying to reduce.

> > And technically, the suggestion only introduces new semantics, not new
> > syntax. It gives a valid meaning to what is now a semantic error. That
> > might be a point against the proposal, rather than a point in its favor.
>
> That's what I call "new syntax and semantics". Something that isn't
> allowed now is new.

So we're quibbling over a missing word or two. I already said it was new (and
should be optional, that is NOT required)...

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/23 Raw View

Nice to hear from you...

"Clive D.W. Feather" wrote:
>
> In article <37B2D159.C5C781C4@cds.duke.edu>, Max TenEyck Woodbury
> <mtew@cds.duke.edu> writes
> >Aggregate: A group of related data items with distinct names and meanings
> >grouped together under a common designation.
>
> Warning: not the definition used in C.

Right. C lumps arrays, structs and unions together. You can use
negative indexes on arrays, so that didn't need to be covered. I
made the distinction for purposes of discussion.

> >Address: A number that, when dereferenced, produces a data item.
>
> Why must it be a number ?

Because the hardware does 'arithmetic' rather than 'logical' operations
on it. If it quacks (behaves) like a number, it's a number for all
practical purposes. There are designations for other storage like
registers, but they don't have addresses in the 'C' sense.

> >Not all aggregate components have addresses. In particular, bit fields (those
> >components with bit length specifications) do not have addresses.
>
> No, but C does have the concept of the address of an unnamed component
> holding the bit field.

Yep. but that isn't the same as the address of a bit field and it doesn't
apply at all if the bit field is in a register.

> >The other
> >unaddressable data item, registers,
>
> Actually, registers do have addresses, you just can't obtain them.

Depends on the hardware and the implementation. The standard says you
can't take the address of a register.

> >The alignment requirement of an aggregate is the same as the most severe
> >alignment requirement of any of its components.
>
> No such rule in C.

OK. I'm oversimplifying. The presence of padding and some of the wording
about the library heap allocation routines returing aligned pointers has
implications here. The point was that you still had alignment and
padding even with negative offsets implemented.

> >As C and C++ are currently defined, all the offsets of aggregate components must
> >be positive or zero. Since most modern computers used signed offsets in their
> >instruction formats, this restriction can reduce program efficiency.
>
> Only in the rare case that the structure size is *slightly* larger than
> the maximum size of an index. This is a relatively rare situation, as
> other posts have pointed out.

Yep. It doesn't happen often. It does happen. This is a minor consideration,
but smaller problems have been addressed before.

> >2.  In contrast to the union of two structures, a union of a frame and a
> >structure would not share any storage even though the name of the union, the
> >name of the structure and the name of the frame would all designate the same
> >address.
>
> I see this as confusing rather than being a benefit. I would expect a
> union to overlap its components no matter what they are - the fact that
> this means that the offsets are all zero is a consequence of this rather
> than a requirement per se.

You'll notice I don't recommend implementing frames per se.

Negative offsets do require some thought to the meaning of unions. I believe
the results are more consistent if all the names refer to the same address
rather than the same storage. If you want to use frames and overlap storage
with a struct, wrap the frame in a struct first.

> >5. Adding another item to a frame would put the storage for that item at the
> >beginning of the storage area allocated for the frame, not at the end. This
> >makes frames similar to some implementations of stacks,
>
> and completely unlike others. I see this use of the term "frame" as thus
> more misleading than anything else.

The term is a reference to 'stack frame' which, in some implementations, puts
new items at the lowest address. The main idea is that 'frame's turn 'struct's
upside down.

> >A simpler solution is to allow the designation of a particular component of a
> >structure as the zero offset component. This avoids all the portability issues
> >associated with designating the offset of each component and is no where near as
> >complex as the introduction of a new kind of structure.
>
> Is it not ? Are structures compatible if they are identical except that
> their zero offset components are different ? Are they assignable ?

I'm not sure. I suspect that the answer should be 'yes'. If you do assignment
by s process equivalent to a call to memcpy, the implicit conversion to 'void *'
and its implicit offset adjustment would make the process work.

> >2.  A zero offset designation can only be applied to an addressable data
> >component of the structure.
>
> Why ? Why shouldn't the storage unit holding a given bit-field be the
> zero offset component ?

Because, technically, bit fields do NOT have addresses. If you designate
the storage unit holding the bit field properly, you could indeed use
it as the zero offset component.

> Why shouldn't the zero offset component be
> several bytes before the start of the structure, or after the end ?

I don't know. If you can be sure that such an address will pass muster
with the hardware, I don't see why not. But the same consideration applied
to arrays does not allow pointers beyond the address immediately following
the array or to storage before the array.

Do you have a good syntax that would preserve these options without
getting into the complete offset designation mess?

> Why shouldn't it be in the middle of a N byte hole ?

It could point to an otherwise unaddressable element, like the VTP in
C++ if the zero offset element started with such a thing. In practice,
if you can designate the aggregate component and it has an address, you
should be able to make it the zero offset component.

I'm glad you're asking these questions. It clarifies the problem being
addressed. It wasn't really stated was it? So, to clarify:

Usage (MAJOR):

Negative offsets can be used to portably reverse the effect of taking
the address of component of a structure. Such conversion must be through
the use of a 'void *' cast to assure that the offset adjustments are
done properly and to document the dangerous nature of such manipulations.

Usage (NOMINAL):

Interface to hardware, run-time and external systems that use negative
offsets.

Usage (MINOR):

To improve execution efficiency.

> If you see a need to be able to designate the zero offset component,
> why are these possibilities unreasonable ?

Some of them are perfectly reasonable. The one that creates problems
is designating a zero offset outside the structure and that is only
a problem in some implementations. If you can come up with a consistent
syntax, have at it.

> >A bit field can not be designated as a zero offset
> >component since it does not have an address.
>
> But the storage unit it resides in does.

Fine. Give that storage unit a designation and you can specify it as the
zero offset.

Negative offsets are confusing enough as they stand without dragging
in the whole bit address question. The problem comes when that storage
unit contains more than one named bit field. Since they all have the
same 'address', how do you distinguish which one is the zero offset bit.
The standard deliberately leaves that open! If you want to tackle that
problem, by all means have at it elsewhere, but it will only create
massive and unneeded confusion in this thread.

> >The use of a designated zero offset component is not a new idea. The Digital
> >Equipment Corporation software development group has used a language independent
> >data aggregate description (I forget its name) that included a base address
> >designation. It was rarely used mainly because it was seldom needed and lacked
> >high-level language support.
>
> So your one example of previous practice failed ?

It worked for BLISS if I remember correctly, but I'm not absolutely certain.
It definitely worked for their MACRO assembler and negative offsets were used
quite successfully there. The failure of 'C' to make provisions for negative
offsets was counted against 'C' when choosing an implementation language, but
was less important than the fact the more engineers knew BLISS than knew 'C'
at the time.

> >Also, the
> >offsetof macro
> >would return a ptr_diff, rather than a size_t.
>
> Oh great, change a whole load of semantics of existing code.

Since many implementations already use a pointer difference to implement
offsetof, the change would bring more systems into conformance rather
than the other way around. Further, size_t is often the largest commonly
used size integer in the machine, so that the only difference between
size_t and ptr_diff is the handling of overflow conditions, and those are
commonly ignored. The only thing that would 'break' would be code that
uses the new feature.

> >However, the concept involved is simple enough
>
> I disagree. See above.
>
> >Since a zero width bit field is useless
>
> Wrong.

Cite a usage please.

I was under the impression that a reference to such a field
would invoke a semantic constraint since it could not have
any value other than zero, and you can not take its address
since it is a bit field.

> >and bit fields can not be designated as
> >the zero offset component,
>
> See above.

As pointed out above, bit fields can NOT be the zero
offset component. They may be IN such a component, but
not BE that component.

> >This has the advantage of leaving the syntax of the language
> >virtually unchanged and only defines a meaning for what was previously an error
> >condition.
>
> This is often dangerous.

Agreed. I believe I said that or something equivalent.

> Better syntax would be, for example, to indicate the zero offset
> component by placing "&" after its identifier.

Hmm. That would be confused with reference syntax in C++.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Lisa Lippincott <lisa_lippincott@advisories.com>
Date: 1999/08/23 Raw View

For what it's worth, I believe a standard-conforming implementation can
represent a pointer to a class type with a pointer which, in machine
terms, points to the middle of the structure.  This might be a reasonable
choice when using an instruction set with particular efficiencies for
small, signed offsets.

If, as a programmer, you want to use negative offsets in common
implementations, multiple inheritance provides a way:

class NegativePart
  {
   private:
      int a;
   public:
      int A() const  { return a; }
  };

class PositivePart
  {
   friend class BothParts;
   private:
      int b;
      // only private constructors
   public:
      int A() const;
      int B() const  { return b; }
  };

struct BothParts: public NegativePart, public PositivePart
  {
   public:
      using NegativePart::A;
  };

int PositivePart::A() const
  {
   return static_cast<const BothParts*>( this )->A();
  }

Now, except for construction and destruction, PositivePart has
the same interface as BothParts.  On common implementations, a
PositivePart* will (under the hood) point into the middle of a
BothParts.  If you pass around only PositivePart*, you get the
negative offsets you're looking for.

                                                   --Lisa Lippincott
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/23 Raw View

Paul Jarc wrote:
>
> Max TenEyck Woodbury <mtew@cds.duke.edu> writes:
> > #if defined(_NEGATIVE_OFFSETS_IMPLEMENTED)
> ...
> > #else
> > /* Since negative offsets are not implemented, stand on your ear... */
> > ...
> > #endif
>
> That gives you twice as much code to maintain, with more opportunity
> for variation between platforms.  What exactly is the problem with
> offsetof()?

There is no portable mechanism for combining the value of offsetof and
a pointer to get another pointer with the correct type. The widely used
conversion to and from char * is almost but not quite universally portable.
It's that 'almost' that is the problem. If it was portable, it would be in
the standard. There are also type safety issues.

Actually, the code that does not use negative offsets is usually at least
two times as complex as the negative offset code. The negative offset code
is the prototype, the documentation, of what should be happening. It is less
likely to need platform specific variations than the work-around code. I know
that documentation is often not maintained, but that doesn't mean it should
be that way.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: torek@elf.bsdi.com (Chris Torek)
Date: 1999/08/23 Raw View

(All quoted text snipped; chase "references" if you really want to look
at them.)

Even after reading all of this, I am not sure what the point is.

A compiler can, if it chooses, make all "struct xyz *" pointers
point to the middle of all "struct xyz"s.  Given a hypothetical
fast-signed-byte-offset machine and a structure with 256 bytes of
members, something like:

 struct a {
  char firstbyte;
  ...
  char lastbyte;
 };

 struct a *ap; ...
 ap = malloc(sizeof *ap);
 ...
 x = ap->firstbyte;
 ap->lastbyte = 99;

might compile to:

 # line 42: ap = malloc(sizeof *ap);
  mov $256,r1  # parameters go in r1 through r4
  call C$malloc  # functions are prefixed with C$
  add r1,$128,r5  # "struct ... *" points to middle
 ...
 # line 65: x = ap->firstbyte;
  movb -128(r5),r6
 # line 66: ap->lastbyte = 99;
  movb $99,127(r5)

and of course something like:

 memset(ap, 0, sizeof *ap);

would compile to:

 # line 47: memset(ap, 0, sizeof *ap);
  sub r5,$128,r1
  mov $0,r2
  mov $256,r3
  call C$memset

In fact, current compilers already do something much like this.
For instance, on V9 sparc (in 64-bit mode), the frame and stack
pointers are offset by 2047.  What is nomally at "0(fp)" is really
at [%fp+2047]; what is nominally at -1028(fp) is really at
[%fp+2047-1028]; and (importantly) what is nominally at -5540(fp)
is really at [%fp+2047-5540] or [%fp-3493].  The reason this is
important is that the immediate offset in the [reg+imm] addressing
modes can only range from -4096..4095.  ("Nominally nonnegative"
offsets -- those from actual [%fp+2048] through actual [%fp+4095]
-- are used for things that tend to be smaller, so having the
"negative" area larger means more code can use the signed 13-bit
field, rather than using up another register.  Specifically, 128
bytes at [%fp+2047] hold register windows, and above that is mainly
any "extra" parameters beyond the first six.)

These compilers (gcc and the Sun compilers both) already use offset
pointers, yet those offsets are not visible to the C programmer.
If negative offsets are important to some machine architecture,
the compiler may optimize them in.  Although this "actual practice"
example is for the frame and stack pointers, the principle applies
equally well to all other pointers.
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA Domain: torek@bsdi.com +1 510 234 3167
http://claw.bsdi.com/torek/  (not always up) I report spam to abuse@.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: postmast.root.admi.gov@iname.com (blargg)
Date: 1999/08/19 Raw View

In article <37BB0AB9.8F60B9DD@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> wrote:

> blargg wrote:
> >...
> > Here is disassembly on my machine (a PowerPC):
> >
> > without
> >
> > 00000000: 98830000  stb      r4,0(r3)
> > 00000004: 3CA30001  addis    r5,r3,1        ; offset ptr for later accesses
> > 00000008: 98830001  stb      r4,1(r3)
> > 0000000C: 9885FFFE  stb      r4,-2(r5)      ; use offset ptr
> > 00000010: 9885FFFF  stb      r4,-1(r5)
> > 00000014: 4E800020  blr
> >
> > with
> >
> > 00000000: 3C630001  addis    r3,r3,1        ; pre-offset ptr to middle
> > 00000004: 38638000  subi     r3,r3,32768    ; (requires two instructions)
> > 00000008: 98838000  stb      r4,-32768(r3)  ; access off middle ptr
> > 0000000C: 98838001  stb      r4,-32767(r3)
> > 00000010: 98837FFE  stb      r4,32766(r3)
> > 00000014: 98837FFF  stb      r4,32767(r3)
> > 00000018: 4E800020  blr
> >
> > As you can see, the compiler was smart enough in the first case to cache
> > the pointer value offset to access the last elements.
>
> Actually, it created an auxiliary pointer (in r5) and used a negative
> offset.

Yeah yeah. My comment wasn't precise enough :-) Anyway, with this
architecture, you don't normally destroy values (hence the 3-operand
instructions), so the comment is understandable in that context.

> In the second routine, you and the compiler ended up trying to second
> guess each other and the result was an extra instruction.

Actually, that's the only way it could add 32768 to the pointer. At most
it could add 32767 to -32768 to the pointer in one instruction, so it adds
0x10000 (the addis - add immediate shifted 16 bits left) then "adds"
-32768 to it.

> If you had
> passed in the mid-point pointer, you might have ended up with one
> instruction less instead of one instruction more.

I would have done that in the example, except the compiler I use doesn't
optimize that correctly (it keeps the pointer on the stack - uck!). The
optimizer isn't done correctly to realize that objects as argments can be
put in registers.

> > I just don't see the general utility of this, especially if it involves
> > new syntax.
>
> It has only special utility, not general utility. It is a way to simplify
> what would otherwise be hard to explain code. It makes a concept, namely
> negative offsets, directly accessible where it is only indirectly accessible
> now.

Have you tried alternatives like I posted? What about a macro that does
the pointer addition (in case your compiler's optimizer isn't smart
enough, or you're using C)?

    #define make_mid_ptr( p )   ((void*) ((char*) (p) + sizeof (**p) / 2))

    #define use_mid_ptr( type, p )  ((type*) ((char*) (p) - sizeof (type)))

// usage

    Foo* foo = // ...

    void* foo_mid = make_mid_ptr( foo );

    use_mid_ptr( Foo, foo_mid )->bar = 1234;

I'm not convinced this can't be wrapped in some current construct that
makes it safe and convenient enough to use.

> And technically, the suggestion only introduces new semantics, not new
> syntax. It gives a valid meaning to what is now a semantic error. That
> might be a point against the proposal, rather than a point in its favor.

That's what I call "new syntax and semantics". Something that isn't
allowed now is new.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Francis Glassborow <francis@robinton.demon.co.uk>
Date: 1999/08/19 Raw View

In article <user-1908991411210001@aus-as5-169.io.com>, blargg <postmast.
root.admi.gov@iname.com> writes
>Barry is simply saying that this particular (unnamed :-) CPU has an
>addressing mode where the memory reference is a pointer plus an 8-bit
>signed value. To access more than that, one would need to use a regular
>add instruction on the pointer itself, then use a memory access
>instruction (or perhaps it also has a 16-bit offset address mode which is
>slower - I don't use the processor so I don't know).

The IBM 1130 was somewhat like that.

Francis Glassborow      Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/19 Raw View

blargg wrote:
>
> ...
>
> Yes, that's probably right. I suppose that's the 32-bit processor with a
> built-in 16-bit emulation mode for a design based on an 8-bit processor
> that was an extension of a 4-bit processor way back :-)  It's pretty silly
> to me that the instructions can be on any byte boundary (from what little
> I know about it - hey, I don't use machines based on it!). Talk about
> complex instruction decoding!
>
> As I said, this is really only an issue with archaic processor designs. I
> think most modern CPUs can offset by a 16-bit quantity without much
> significant (if any) penalty over an 8-bit offset or even no offset. On
> the two I'm familiar with (MC68K and PowerPC), 16-bit offsets are the
> norm, so allowing structure members to be at a negative offset wouldn't
> buy much, as people have commented in this thread.

Oh, PLEASE. The processor I was most sure of is the VAX. I still can't find
the right x86 (Pentium) data book, but I believe it also uses 8 bit signed
offsets in some addressing modes. CISC may be a bit dated, but it is hardly
'archaic' and there are a LARGE number of instances still in existence. (Oh,
I get it, you really DO mean the Pentium. Are you being deliberately snobby?)

And there is the problem of conceptual efficiency as well as execution
efficiency. The work arounds to get past the absence of negative offsets
are often more convoluted than this sentence is; documenting them is
a real nightmare.

I don't really expect to see this actually implemented. It should be an
option. What would be useful is being able to do something like:

...
struct alloc_block {
...
    char   alloc[];         /* address of this element to the user */
    } * block;
#if defined(_NEGATIVE_OFFSETS_IMPLEMENTED)
struct extern_alloc_block { /* same as alloc_block with negative offsets */
...
    char   alloc[] : 0;     /* address of this element from the user */
    } * from_user;
...
    /*****************************************************************/
    /* A void * always points to the beginning of the actual struct. */
    /* Use that transformation to get the internal representation    */
    /* pointer. Alternately, we could use the negative offset form   */
    /* throughout, but the other (real) version of the code can not  */
    /* rely on that.                                                 */
    /*****************************************************************/
    block = (alloc_block *)(void *)from_user;
...
#else
/* Since negative offsets are not implemented, stand on your ear... */
...
#endif

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/19 Raw View

Barry Margolin wrote:
> ...
>
> However, even on such a machine, I suspect that at least 99% of structure
> accesses would be to offsets under 128 bytes, so the impact of this
> efficiency problem would be minor.  Supporting negative offsets would
> merely double the range; I doubt there are enough references to offsets
> between 128 and 255 bytes to make the proposal worthwhile.  If a program
> makes extensive use of very large structures they're probably larger than
> 256 bytes as well, so they'll still need address arithmetic.  Negative
> offsets only aid those few programs that happen to have structures whose
> sizes are between 2^(n-1) and 2^n bytes in size.

Yes, but I'm also talking about conceptual efficiency. Negative offsets
have some explanatory power that could be used even if they are only
an optional part of the language.

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: nmm1@cus.cam.ac.uk (Nick Maclaren)
Date: 1999/08/19 Raw View

In article <37BC0FDE.B585BD47@wizard.net>, James Kuyper <kuyper@wizard.net> writes:
|>
|> Barry Margolin wrote:
|> >
|> > He's talking about 8-bit offsets in indexed addressing modes, not 8-bit
|> > pointers.  He's also talking about efficiency; a machine with 8-bit offsets
|> > would be able to implement larger structures, but it would have to do it
|> > with pointer arithmetic, which would presumably be slower than indexed
|> > addressing.
|>
|> I've no idea how that works; none of the three assembly languages I know
|> had that feature. Can it be efficiently used to index to an arbitrary
|> point in a 65535-byte char array?

Yes, and has been.  The reason that you don't need offsets in indexing
(ANY offsets) is that instructions are effectively 'free' on most
modern RISC systems.  The pre-RISC RISC systems that used the scheme
and got high efficiencies often had the ability to pick up a constant
out of the instruction stream.

But I don't see that any of this is particularly relevant to the
languages concerned.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/19 Raw View

In article <37B87C73.E7A3B4F6@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>> I don't know what problem you're trying to solve.
>It's not a problem. It's just an inconvenience.

All right, I don't know what inconvenience you're trying to solve.

[More substantive response in another post.]

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/18 Raw View

blargg wrote:
> <mtew@cds.duke.edu> wrote:
>> ...
>> While a structure larger than 127 bytes is fairly large, I would not call it
>> huge.
>
> 127? Where? I didn't see any mention of the particular value of "n". Maybe
> you were imagining that the whole world used the same processor you do :-)
>
> I figure on most modern processors, n will be at least 16. If you have a
> processor that only allows direct 8-bit signed offsets, then there will
> probably be many other performance issues to consider too, ones which
> would warp the language too much. 8 bit processors don't do to well with a
> language like C++ :-)

I may be mistaken, but I believe that at least one common 32 bit processor
can have byte offsets for some of its instructions. The documentation is at
home and I'm at work so I can't check it immediately.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Barry Margolin <barmar@bbnplanet.com>
Date: 1999/08/19 Raw View

In article <37BA99C1.6E45DC28@wizard.net>,
James Kuyper Jr. <kuyper@wizard.net> wrote:
>Max TenEyck Woodbury wrote:
>> While a structure larger than 127 bytes is fairly large, I would not call it
>> huge.
>
>An implementation that uses 8-bit pointers would have trouble coming up
>with the "one program" that each implementation must be able to
>translate and execute, as required by the "Translation Limits" section
>of the draft C9X standard. That program must have at least one object
>containing 65535 bytes. I don't think that the C89 limit was much
>smaller.

He's talking about 8-bit offsets in indexed addressing modes, not 8-bit
pointers.  He's also talking about efficiency; a machine with 8-bit offsets
would be able to implement larger structures, but it would have to do it
with pointer arithmetic, which would presumably be slower than indexed
addressing.

However, even on such a machine, I suspect that at least 99% of structure
accesses would be to offsets under 128 bytes, so the impact of this
efficiency problem would be minor.  Supporting negative offsets would
merely double the range; I doubt there are enough references to offsets
between 128 and 255 bytes to make the proposal worthwhile.  If a program
makes extensive use of very large structures they're probably larger than
256 bytes as well, so they'll still need address arithmetic.  Negative
offsets only aid those few programs that happen to have structures whose
sizes are between 2^(n-1) and 2^n bytes in size.

--
Barry Margolin, barmar@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/19 Raw View

"James Kuyper Jr." wrote:
>
> Max TenEyck Woodbury wrote:
>>...
>>
>> While a structure larger than 127 bytes is fairly large, I would not call it
>> huge.
>
> An implementation that uses 8-bit pointers would have trouble coming up
> with the "one program" that each implementation must be able to
> translate and execute, as required by the "Translation Limits" section
> of the draft C9X standard. That program must have at least one object
> containing 65535 bytes. I don't think that the C89 limit was much
> smaller.

Who said anything about the size of _pointers_. I'm talking about _offsets_
which is related, but distinctly different.

> The corresponding value in the C++ standard is 262144, but it's only a
> recommended value, and the standard doesn't actually require that the
> implementation's maximum object size even be positive. However, I'd be
> very impressed by any useful implementation of C++ that fit onto a
> platform that was limited to 8 bit addresses.

And this is relevant how?

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/19 Raw View

blargg wrote:
>...
> Here is disassembly on my machine (a PowerPC):
>
> without
>
> 00000000: 98830000  stb      r4,0(r3)
> 00000004: 3CA30001  addis    r5,r3,1        ; offset ptr for later accesses
> 00000008: 98830001  stb      r4,1(r3)
> 0000000C: 9885FFFE  stb      r4,-2(r5)      ; use offset ptr
> 00000010: 9885FFFF  stb      r4,-1(r5)
> 00000014: 4E800020  blr
>
> with
>
> 00000000: 3C630001  addis    r3,r3,1        ; pre-offset ptr to middle
> 00000004: 38638000  subi     r3,r3,32768    ; (requires two instructions)
> 00000008: 98838000  stb      r4,-32768(r3)  ; access off middle ptr
> 0000000C: 98838001  stb      r4,-32767(r3)
> 00000010: 98837FFE  stb      r4,32766(r3)
> 00000014: 98837FFF  stb      r4,32767(r3)
> 00000018: 4E800020  blr
>
> As you can see, the compiler was smart enough in the first case to cache
> the pointer value offset to access the last elements.

Actually, it created an auxiliary pointer (in r5) and used a negative
offset.

In the second routine, you and the compiler ended up trying to second
guess each other and the result was an extra instruction. If you had
passed in the mid-point pointer, you might have ended up with one
instruction less instead of one instruction more.

> I just don't see the general utility of this, especially if it involves
> new syntax.

It has only special utility, not general utility. It is a way to simplify
what would otherwise be hard to explain code. It makes a concept, namely
negative offsets, directly accessible where it is only indirectly accessible
now.

And technically, the suggestion only introduces new semantics, not new
syntax. It gives a valid meaning to what is now a semantic error. That
might be a point against the proposal, rather than a point in its favor.

>...

mtew@cds.duke.edu
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: postmast.root.admi.gov@iname.com (blargg)
Date: 1999/08/19 Raw View

In article <37BB005B.3A745DFD@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> wrote:

> blargg wrote:
> > <mtew@cds.duke.edu> wrote:
> >> ...
> >> While a structure larger than 127 bytes is fairly large, I would not
call it
> >> huge.
> >
> > 127? Where? I didn't see any mention of the particular value of "n". Maybe
> > you were imagining that the whole world used the same processor you do :-)
> >
> > I figure on most modern processors, n will be at least 16. If you have a
> > processor that only allows direct 8-bit signed offsets, then there will
> > probably be many other performance issues to consider too, ones which
> > would warp the language too much. 8 bit processors don't do to well with a
> > language like C++ :-)
>
> I may be mistaken, but I believe that at least one common 32 bit processor
> can have byte offsets for some of its instructions. The documentation is at
> home and I'm at work so I can't check it immediately.

Yes, that's probably right. I suppose that's the 32-bit processor with a
built-in 16-bit emulation mode for a design based on an 8-bit processor
that was an extension of a 4-bit processor way back :-)  It's pretty silly
to me that the instructions can be on any byte boundary (from what little
I know about it - hey, I don't use machines based on it!). Talk about
complex instruction decoding!

As I said, this is really only an issue with archaic processor designs. I
think most modern CPUs can offset by a 16-bit quantity without much
significant (if any) penalty over an 8-bit offset or even no offset. On
the two I'm familiar with (MC68K and PowerPC), 16-bit offsets are the
norm, so allowing structure members to be at a negative offset wouldn't
buy much, as people have commented in this thread.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James Kuyper <kuyper@wizard.net>
Date: 1999/08/19 Raw View

Barry Margolin wrote:
>
> In article <37BA99C1.6E45DC28@wizard.net>,
> James Kuyper Jr. <kuyper@wizard.net> wrote:
> >Max TenEyck Woodbury wrote:
> >> While a structure larger than 127 bytes is fairly large, I would not call it
> >> huge.
> >
> >An implementation that uses 8-bit pointers would have trouble coming up
> >with the "one program" that each implementation must be able to
> >translate and execute, as required by the "Translation Limits" section
> >of the draft C9X standard. That program must have at least one object
> >containing 65535 bytes. I don't think that the C89 limit was much
> >smaller.
>
> He's talking about 8-bit offsets in indexed addressing modes, not 8-bit
> pointers.  He's also talking about efficiency; a machine with 8-bit offsets
> would be able to implement larger structures, but it would have to do it
> with pointer arithmetic, which would presumably be slower than indexed
> addressing.

I've no idea how that works; none of the three assembly languages I know
had that feature. Can it be efficiently used to index to an arbitrary
point in a 65535-byte char array?

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: postmast.root.admi.gov@iname.com (blargg)
Date: 1999/08/19 Raw View

In article <37BC0FDE.B585BD47@wizard.net>, James Kuyper
<kuyper@wizard.net> wrote:

> Barry Margolin wrote:
> >
> > In article <37BA99C1.6E45DC28@wizard.net>,
> > James Kuyper Jr. <kuyper@wizard.net> wrote:
> > >Max TenEyck Woodbury wrote:
> > >> While a structure larger than 127 bytes is fairly large, I would
not call it
> > >> huge.
> > >
> > >An implementation that uses 8-bit pointers would have trouble coming up
> > >with the "one program" that each implementation must be able to
> > >translate and execute, as required by the "Translation Limits" section
> > >of the draft C9X standard. That program must have at least one object
> > >containing 65535 bytes. I don't think that the C89 limit was much
> > >smaller.
> >
> > He's talking about 8-bit offsets in indexed addressing modes, not 8-bit
> > pointers.  He's also talking about efficiency; a machine with 8-bit offsets
> > would be able to implement larger structures, but it would have to do it
> > with pointer arithmetic, which would presumably be slower than indexed
> > addressing.
>
> I've no idea how that works; none of the three assembly languages I know
> had that feature. Can it be efficiently used to index to an arbitrary
> point in a 65535-byte char array?

Barry is simply saying that this particular (unnamed :-) CPU has an
addressing mode where the memory reference is a pointer plus an 8-bit
signed value. To access more than that, one would need to use a regular
add instruction on the pointer itself, then use a memory access
instruction (or perhaps it also has a 16-bit offset address mode which is
slower - I don't use the processor so I don't know).

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 1999/08/19 Raw View

In article <37B2D159.C5C781C4@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> writes
>Aggregate: A group of related data items with distinct names and meanings
>grouped together under a common designation.

Warning: not the definition used in C.

>Address: A number that, when dereferenced, produces a data item.

Why must it be a number ?

>Not all aggregate components have addresses. In particular, bit fields (those
>components with bit length specifications) do not have addresses.

No, but C does have the concept of the address of an unnamed component
holding the bit field.

>The other
>unaddressable data item, registers,

Actually, registers do have addresses, you just can't obtain them.

>The alignment requirement of an aggregate is the same as the most severe
>alignment requirement of any of its components.

No such rule in C.

>As C and C++ are currently defined, all the offsets of aggregate components must
>be positive or zero. Since most modern computers used signed offsets in their
>instruction formats, this restriction can reduce program efficiency.

Only in the rare case that the structure size is *slightly* larger than
the maximum size of an index. This is a relatively rare situation, as
other posts have pointed out.

>2.  In contrast to the union of two structures, a union of a frame and a
>structure would not share any storage even though the name of the union, the
>name of the structure and the name of the frame would all designate the same
>address.

I see this as confusing rather than being a benefit. I would expect a
union to overlap its components no matter what they are - the fact that
this means that the offsets are all zero is a consequence of this rather
than a requirement per se.

>5. Adding another item to a frame would put the storage for that item at the
>beginning of the storage area allocated for the frame, not at the end. This
>makes frames similar to some implementations of stacks,

and completely unlike others. I see this use of the term "frame" as thus
more misleading than anything else.

>A simpler solution is to allow the designation of a particular component of a
>structure as the zero offset component. This avoids all the portability issues
>associated with designating the offset of each component and is no where near as
>complex as the introduction of a new kind of structure.

Is it not ? Are structures compatible if they are identical except that
their zero offset components are different ? Are they assignable ?

>2.  A zero offset designation can only be applied to an addressable data
>component of the structure.

Why ? Why shouldn't the storage unit holding a given bit-field be the
zero offset component ? Why shouldn't the zero offset component be
several bytes before the start of the structure, or after the end ? Why
shouldn't it be in the middle of a N byte hole ? If you see a need to be
able to designate the zero offset component, why are these possibilities
unreasonable ?

>A bit field can not be designated as a zero offset
>component since it does not have an address.

But the storage unit it resides in does.

>The use of a designated zero offset component is not a new idea. The Digital
>Equipment Corporation software development group has used a language independent
>data aggregate description (I forget its name) that included a base address
>designation. It was rarely used mainly because it was seldom needed and lacked
>high-level language support.

So your one example of previous practice failed ?

>Also, the
>offsetof macro
>would return a ptr_diff, rather than a size_t.

Oh great, change a whole load of semantics of existing code.

>However, the concept involved is simple enough

I disagree. See above.

>Since a zero width bit field is useless

Wrong.

>and bit fields can not be designated as
>the zero offset component,

See above.

>This has the advantage of leaving the syntax of the language
>virtually unchanged and only defines a meaning for what was previously an error
>condition.

This is often dangerous.

Better syntax would be, for example, to indicate the zero offset
component by placing "&" after its identifier.

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet Ltd.  | Home: <clive@davros.org>
Fax: +44 20 8371 1037 |                      | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: postmast.root.admi.gov@iname.com (blargg)
Date: 1999/08/18 Raw View

In article <37B87C73.E7A3B4F6@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> wrote:

> "Douglas A. Gwyn" wrote:
> >
> > I don't know what problem you're trying to solve.
> >
>
> It's not a problem. It's just an inconvenience. That's why
> I suggested it be an option, rather than a required part of
> the language.
>
> On very rare occasions negative offsets can be useful. The best
> know, but not the only, example is when writing the 'free' routine
> for heap management.

Assuming a decent compiler, one can easily enough write something to allow
more efficient access to something:

    template<typename T>
    class middle_ptr {
        char* middle;
        static std::size_t const middle_offset = sizeof (T) / 2;
    public:
        middle_ptr( T* p ) : middle( (char*) p + middle_offset ) { }

        operator T* () const { return (T*) (middle - middle_offset); }
        T* operator -> () const { return *this; }
    };

// usage

    struct X {
        char c [32768];
        char d [32768];
    };

    void without( X* x, char value ) {
        x->c [0] = value;
        x->c [1] = value;
        x->d [32766] = value;
        x->d [32767] = value;
    }

    void with( X* raw_x, char value ) {
        middle_ptr<X> x( raw_x );
        x->c [0] = value;
        x->c [1] = value;
        x->d [32766] = value;
        x->d [32767] = value;
    }

Here is disassembly on my machine (a PowerPC):

without

00000000: 98830000  stb      r4,0(r3)
00000004: 3CA30001  addis    r5,r3,1        ; offset ptr for later accesses
00000008: 98830001  stb      r4,1(r3)
0000000C: 9885FFFE  stb      r4,-2(r5)      ; use offset ptr
00000010: 9885FFFF  stb      r4,-1(r5)
00000014: 4E800020  blr

with

00000000: 3C630001  addis    r3,r3,1        ; pre-offset ptr to middle
00000004: 38638000  subi     r3,r3,32768    ; (requires two instructions)
00000008: 98838000  stb      r4,-32768(r3)  ; access off middle ptr
0000000C: 98838001  stb      r4,-32767(r3)
00000010: 98837FFE  stb      r4,32766(r3)
00000014: 98837FFF  stb      r4,32767(r3)
00000018: 4E800020  blr

As you can see, the compiler was smart enough in the first case to cache
the pointer value offset to access the last elements.

I just don't see the general utility of this, especially if it involves
new syntax.

If one wants to set the exact offset, they could allow middle_ptr<> to
take the offset,

    template<typename T,std::size_t middle_offset = sizeof (T) / 2>
    class middle_ptr {
        // ...

or use some sort of multiple inheritance.

    struct positive_offset_members { };

    template<typename T>
    class mid_ptr {
        positive_offset_members* ptr;
    public:
        mid_ptr( T* p ) : ptr( &static_cast<positive_offset_members&> (*p) ) { }

        operator T* () const { return &static_cast<T&> (*ptr); }

        T* operator -> () const { return *this; }
    };

// usage

    struct negative_offset {
        char c [32768];
    };

    struct X : negative_offset, positive_offset_members {
         char d [32768];
    };

    void f( X* raw_x, char value ) {
        mid_ptr<X> x( raw_x );
        x->c [0] = value;
        x->c [1] = value;
        x->d [32766] = value;
        x->d [32767] = value;
    }

I used reference casts in the static_cast<> to prevent compiler checks for
NULL, since assume that ptr is never NULL. With common base-class memory
layout, this will give the desired effect.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: postmast.root.admi.gov@iname.com (blargg)
Date: 1999/08/18 Raw View

In article <37B876C2.541BBA06@cds.duke.edu>, Max TenEyck Woodbury
<mtew@cds.duke.edu> wrote:

> Barry Margolin wrote:
> >
> > Max TenEyck Woodbury  <mtew@cds.duke.edu> wrote:
> > >As C and C++ are currently defined, all the offsets of aggregate
> > >components must be positive or zero. Since most modern computers used
> > >signed offsets in their instruction formats, this restriction can reduce
> > >program efficiency. The following considers ways to remove this
> > >restriction.
> >
> > How does this reduce efficiency?  The only impact I can think of is that it
> > limits the size of aggregates.  If the processor uses signed offsets,
> > aggregates are limited to 2^(n-1) bytes in size, rather than 2^n bytes
> > (where n is the size of the offset field in instructions), unless the
> > compiler inserts extra code for components with very large offsets.  I
> > suppose you could be assuming that the compiler does the latter, and that's
> > where the efficiency impact is.  But who defines structures that are so
> > huge that this is an issue?
>
> While a structure larger than 127 bytes is fairly large, I would not call it
> huge.

127? Where? I didn't see any mention of the particular value of "n". Maybe
you were imagining that the whole world used the same processor you do :-)

I figure on most modern processors, n will be at least 16. If you have a
processor that only allows direct 8-bit signed offsets, then there will
probably be many other performance issues to consider too, ones which
would warp the language too much. 8 bit processors don't do to well with a
language like C++ :-)
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: 1999/08/18 Raw View

Max TenEyck Woodbury wrote:
>
> Barry Margolin wrote:
> >
> > Max TenEyck Woodbury  <mtew@cds.duke.edu> wrote:
> > >As C and C++ are currently defined, all the offsets of aggregate
> > >components must be positive or zero. Since most modern computers used
> > >signed offsets in their instruction formats, this restriction can reduce
> > >program efficiency. The following considers ways to remove this
> > >restriction.
> >
> > How does this reduce efficiency?  The only impact I can think of is that it
> > limits the size of aggregates.  If the processor uses signed offsets,
> > aggregates are limited to 2^(n-1) bytes in size, rather than 2^n bytes
> > (where n is the size of the offset field in instructions), unless the
> > compiler inserts extra code for components with very large offsets.  I
> > suppose you could be assuming that the compiler does the latter, and that's
> > where the efficiency impact is.  But who defines structures that are so
> > huge that this is an issue?
>
> While a structure larger than 127 bytes is fairly large, I would not call it
> huge.

An implementation that uses 8-bit pointers would have trouble coming up
with the "one program" that each implementation must be able to
translate and execute, as required by the "Translation Limits" section
of the draft C9X standard. That program must have at least one object
containing 65535 bytes. I don't think that the C89 limit was much
smaller.

The corresponding value in the C++ standard is 262144, but it's only a
recommended value, and the standard doesn't actually require that the
implementation's maximum object size even be positive. However, I'd be
very impressed by any useful implementation of C++ that fit onto a
platform that was limited to 8 bit addresses.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Daniel M. Pfeffer" <p.f.e.f.f.e.r.d@internet-zahav.net>
Date: 1999/08/16 Raw View

It is only necessay to use relative addressing to address a structure
element when the base of the structure is at a dynamic (i.e. non-static)
address. Neglecting the segment part of the address in segmented
architectures, we would have one N-bit value representing the address of the
base of the structure, and another N-bit value representing the offset to
the structure element (if the structure is static - the linker can resolve
the address at link time).

Assuming a twos-complement processor, adding two N-bit values will return
the same N-bit pattern irrespective of whether they are signed or unsigned.
Any compiler can therefore create code to represent structures of any size
up to the N-bit limit of the address space.

Your proposal would only make sense on CPUs that have optimised versions of
instructions for when the relative offset fits inside a limited subrange of
the address space. The compiler may use these instructions to reference data
within a short distance of the structure base. If these instructions on a
particular CPU use a signed value, your 'frame' idea would double the size
of the structure that could be addressed by these optimised instructions.

The idea is a bad idea because:
    1. It adds a feature to the language which provides no extra
functionality _at_the_language_level_.
    2. The 'optimisation' that it does provide is only useful on a limited
subset of available CPUs.

Daniel Pfeffer
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/16 Raw View

Barry Margolin wrote:
>
> Max TenEyck Woodbury  <mtew@cds.duke.edu> wrote:
> >As C and C++ are currently defined, all the offsets of aggregate
> >components must be positive or zero. Since most modern computers used
> >signed offsets in their instruction formats, this restriction can reduce
> >program efficiency. The following considers ways to remove this
> >restriction.
>
> How does this reduce efficiency?  The only impact I can think of is that it
> limits the size of aggregates.  If the processor uses signed offsets,
> aggregates are limited to 2^(n-1) bytes in size, rather than 2^n bytes
> (where n is the size of the offset field in instructions), unless the
> compiler inserts extra code for components with very large offsets.  I
> suppose you could be assuming that the compiler does the latter, and that's
> where the efficiency impact is.  But who defines structures that are so
> huge that this is an issue?

While a structure larger than 127 bytes is fairly large, I would not call it
huge.

mtew@cds.duke.edu


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/16 Raw View

"Douglas A. Gwyn" wrote:
>
> I don't know what problem you're trying to solve.
>

It's not a problem. It's just an inconvenience. That's why
I suggested it be an option, rather than a required part of
the language.

On very rare occasions negative offsets can be useful. The best
know, but not the only, example is when writing the 'free' routine
for heap management.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Barry Margolin <barmar@bbnplanet.com>
Date: 1999/08/17 Raw View

In article <37B87C73.E7A3B4F6@cds.duke.edu>,
Max TenEyck Woodbury  <mtew@cds.duke.edu> wrote:
>On very rare occasions negative offsets can be useful. The best
>know, but not the only, example is when writing the 'free' routine
>for heap management.

I assume you're referring to the common practice of placing the size of a
malloc'ed block in the bytes preceding the block.  This doesn't require
negative offsets.  Instead, you allocate a structure that's something like

struct malloc_block_t {
  long size;
  char bytes[];
} block;

and then return &block.bytes.  The free(ptr) routine would then compute
ptr-offsetof(malloc_block_t, bytes).

--
Barry Margolin, barmar@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/17 Raw View

Barry Margolin wrote:
>
> I assume you're referring to the common practice of placing the size of a
> malloc'ed block in the bytes preceding the block.  This doesn't require
> negative offsets.  Instead, you allocate a structure that's something like
>
> struct malloc_block_t {
>   long size;
>   char bytes[];
> } block;
>
> and then return &block.bytes.  The free(ptr) routine would then compute
> ptr-offsetof(malloc_block_t, bytes).
>

Yes, but it can not do that last trick portably. There are portable ways
to do it but they get a _little_ complicated to document.

struct alloced_header_t {
    size_t size;
    } * block, * from_user;
...
/* in alloc */
    return block + 1;
...
/* in free */
    block = from_user - 1;
...

1. Heap management was only a well known example.

2. This is a feature most often useful in low level software, that is in
   implementing run time and operating system routines. Work arounds do
   exist, but they are often awkward, unportable and hard to explain.

3. I am asking for an option so that if it is implemented by anybody, it
   will be implemented in the same way each time. I do not think this
   feature should be required.

mtew@cds.duke.edu

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Max TenEyck Woodbury <mtew@cds.duke.edu>
Date: 1999/08/12 Raw View

Definitions --

Aggregate: A group of related data items with distinct names and meanings grouped together under a common designation.

Aggregate Component: One of the data items that make up an aggregate:

Array: A group of related data items with a single name and meaning, distinguished from each other only by an index. (Include only for contrast with aggregates.)

Address: A number that, when dereferenced, produces a data item.

Addressable unit: The data item retrieved when an address is dereferenced that does not overlap any other data item retrieved by a different address.

Pointer: A combination of an address and the type of data associated with that address.

Offset: The numeric difference between the address of a component of an aggregate and the address of the aggregate itself.

Union: an aggregate where the offset of all components is zero. Components share the same storage.

Structure: an aggregate where the offset of each component is unique and none of the components overlap.

Alignment requirements: Restrictions on addresses usually related to the size of the data item dereferenced.

Notes --

Not all aggregate components have addresses. In particular, bit fields (those components with bit length specifications) do not have addresses. The other unaddressable data item, registers, can not be part of aggregates (but they can contain aggregates).

The alignment requirement of an aggregate is the same as the most severe alignment requirement of any of its components. For structures, unnamed and unreferencable data items may be added between named components to meet alignment requirements.

The addresses of structure components are not arbitrary. The order of addresses or offsets of the components must be the same as the order of the components in the structure description. When access to components is restricted, as in C++, the components may be grouped together by the degree of visibility of the component, with the more visible components preceding the less visible components.

Discussion --

As C and C++ are currently defined, all the offsets of aggregate components must be positive or zero. Since most modern computers used signed offsets in their instruction formats, this restriction can reduce program efficiency. The following considers ways to remove this restriction.

The most direct solution to this problem would be to allow the structure designer to designate component offsets explicitly. While this would provide a much finer level of control than currently exists and would allow for the specification of negative offsets, the result would not be portable, would introduce many opportunities for error, and would be very difficult to maintain. In particular, changes in the size of the addressable unit and different alignment requirements between platforms would be left i
n the aggregate designers hands. One of the reasons compilers were developed was to take care of this level of detail. So, this is not a good way to solve this problem.

A possible solution would be to introduce an alternate kind of structure where the offsets of the components are all negative. The ordering of component offsets would be the exact reverse of those for structures with positive offsets. For reasons that may be apparent to some, I will designate such a structure as a 'frame'.

Frames have a number of interesting properties:

1. The address of the frame, as designated by the name of the frame, is the address of the location that follows the frame and is not actually the address of any component in the frame. This is similar to being able to address, but not use, the first element beyond the end of an array.

2. In contrast to the union of two structures, a union of a frame and a structure would not share any storage even though the name of the union, the name of the structure and the name of the frame would all designate the same address.

3. A structure could contain a frame. The offsets of the frame's components would be positive or zero when taken with respect to the address of the structure.

4. A frame could contain a structure. The offsets of the structure's components would be negative when taken with respect to the address of the frame.

5. Adding another item to a frame would put the storage for that item at the beginning of the storage area allocated for the frame, not at the end. This makes frames similar to some implementations of stacks, and could be convenient for the internal management of automatic data by the compiler.

In summary, it is possible to design and implement a new kind of structure as a new language component, but it would add quit a bit of complexity to the language and to a compiler for that language.

A simpler solution is to allow the designation of a particular component of a structure as the zero offset component. This avoids all the portability issues associated with designating the offset of each component and is no where near as complex as the introduction of a new kind of structure. It is even appropriate to restrict which kind of components can be designated as the zero offset component:

1. Only one component of a structure may be designated as the zero offset component. Note that a structure can contain an instance of a structure which has a zero offset component that may or may not be the designated zero offset component of the containing structure. In other words, the existence of a designated zero offset component in a sub-structure does not preclude the designation of a zero offset component in the higher or lower level structure specification.

2. A zero offset designation can only be applied to an addressable data component of the structure. A bit field can not be designated as a zero offset component since it does not have an address. A static component or member function can not be designated as the zero offset component since neither is a data component of the structure itself. (In the case of a static data component, the static component is associated with but not part of the structure.)

3. Only the most visible components of a structure with component access restrictions can be designated as the zero offset components. This avoids making the visible interface to the structure dependent on non-visible details.

Note that adding a dummy element to the end of a structure and designating it as the zero offset component approximates most of the features of frames. The feature that can not be duplicated is the order reversal of the components. Also, the dummy element can cause confusion for anyone not familiar with the application and, unless it is zero sized which can also be confusing, add to the size of the structure.

The use of a designated zero offset component is not a new idea. The Digital Equipment Corporation software development group has used a language independent data aggregate description (I forget its name) that included a base address designation. It was rarely used mainly because it was seldom needed and lacked high-level language support.

Negative offsets are used effectively for the library transfer vectors components of the Amiga operating system. They are also an implicit part of the solution to the multiple inheritance problem associated with virtual functions in C++.

Any negative extent to structures can complicate the compilation of unions. With negative offsets, a compiler has to track both a negative and a positive extent where only a positive extent needs to be tracked if negative offsets are not allowed.

There are a number of additional problems associated with any application controlled negative component offset scheme. Most have to do with allocating and deallocating data structures where the address of the data structure is not the address of the start of the allocated storage area. For C++ with separate 'new' operators for each kind of data structure, this is not really a problem. For C, the problem is casts to and from type 'void *'. Adding or subtracting the appropriate amount to the pointer address
at these points should vitiate these problems.
This has been done successfully in some implementations of C++. Also, the offsetof macro
would return a ptr_diff, rather than a size_t.

The main remaining problem I am aware of is the use of hidden Virtual Function Table pointers in C++. In many cases these are put at offset zero for the convenience of implementing the run time system, although there are implementations where this is not a requirement. The simplest solution would be to restrict zero offset designators to data structures that do not require VFT pointers.

Summary --

Negative structure component offsets can be useful in improving the efficiency of applications, but are rarely critical to the design of an application. The least complicated method for allowing negative component offsets is to designate some component of the structure as the zero offset component. There are a number of problems associated with implementing negative component offsets both at compile and run time. In most cases, the work associated with solving these problems will not pay off in significant
improvements in application design, so it is not a good idea to make this a required part of either C or C++. However, the concept involved is simple enough that a standard, possibly informal in nature, designating the syntax and semantics of such a specification would be useful.

Candidate syntax --

Since a zero width bit field is useless and bit fields can not be designated as the zero offset component, the use of ': 0' could be used to designate the zero offset component. This has the advantage of leaving the syntax of the language virtually unchanged and only defines a meaning for what was previously an error condition. It is only vaguely mnemonic (and some people would find it definitely confusing) but anything more meaningful would probably require the introduction of additional key words.

[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]

Author: "Ken Hagan" <K.Hagan@thermoteknix.co.uk>
Date: 1999/08/13 Raw View

Max TenEyck Woodbury wrote in message <37B2D159.C5C781C4@cds.duke.edu>...
>
>As C and C++ are currently defined, all the offsets of aggregate components
must be positive or zero. Since most modern computers used signed offsets in
their instruction formats, this restriction can reduce program efficiency.
The following considers ways to remove this restriction.

For anything other than PODs, C++ is free to use either positive or negative
offsets. That is, "this" need not point to the beginning of the data. Also,
most
modern computers use 32-bit offsets, so the range is not a problem. (The
16-bit Intel chips will wrap around within a segment, giving a 64K range.)
Since
addition is pertty similar for signed and unsigned values, I am at a loss to
understand how this can "reduce efficiency".
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Barry Margolin <barmar@bbnplanet.com>
Date: 1999/08/13 Raw View

In article <37B2D159.C5C781C4@cds.duke.edu>,
Max TenEyck Woodbury  <mtew@cds.duke.edu> wrote:
>As C and C++ are currently defined, all the offsets of aggregate
>components must be positive or zero. Since most modern computers used
>signed offsets in their instruction formats, this restriction can reduce
>program efficiency. The following considers ways to remove this
>restriction.

How does this reduce efficiency?  The only impact I can think of is that it
limits the size of aggregates.  If the processor uses signed offsets,
aggregates are limited to 2^(n-1) bytes in size, rather than 2^n bytes
(where n is the size of the offset field in instructions), unless the
compiler inserts extra code for components with very large offsets.  I
suppose you could be assuming that the compiler does the latter, and that's
where the efficiency impact is.  But who defines structures that are so
huge that this is an issue?

--
Barry Margolin, barmar@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Douglas A. Gwyn" <DAGwyn@null.net>
Date: 1999/08/13 Raw View

I don't know what problem you're trying to solve.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]