Topic: Pointer semantics


Author: seurer@rchland.ibm.com (Bill Seurer)
Date: 1996/10/16
Raw View
In article <01bbb759$cf6e3820$9276adce@azguard>, bradds@concentric.net (Bradd W. Szonye) writes:
|>
|>   3.9  Types                                               [basic.types]
|>
|> 2 For any object type T, whether or not the object holds a  valid  value
|>   of  type T, the underlying bytes (_intro.memory_) making up the object
|>   can be copied into an array of char or unsigned char.12)
|>
|>   If the content of the array of char or unsigned char  is  copied  back
|>   into  the  object,  the  object  shall  subsequently hold its original
|>   value.

This will not work on hardware with tagged addresses.  If it really is
written in the standard this way the standard is in error.
--

- Bill Seurer     ID Tools and Compiler Development      IBM Rochester, MN
  Business: BillSeurer@vnet.ibm.com               Home: BillSeurer@aol.com
  WWW:  http://members.aol.com/BillSeurer
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]





Author: clamage@taumet.eng.sun.com (Steve Clamage)
Date: 1996/10/17
Raw View
In article 1i8a@news.rchland.ibm.com, seurer@rchland.ibm.com (Bill Seurer) writes:
>In article <01bbb759$cf6e3820$9276adce@azguard>, bradds@concentric.net (Bradd W. Szonye) writes:
>|>
>|>   3.9  Types                                               [basic.types]
>|>
>|> 2 For any object type T, whether or not the object holds a  valid  value
>|>   of  type T, the underlying bytes (_intro.memory_) making up the object
>|>   can be copied into an array of char or unsigned char.12)
>|>
>|>   If the content of the array of char or unsigned char  is  copied  back
>|>   into  the  object,  the  object  shall  subsequently hold its original
>|>   value.
>
>This will not work on hardware with tagged addresses.  If it really is
>written in the standard this way the standard is in error.

That is what is written in the current draft of the standard. For example,
given that operator== is defined for type T, and objects of type T with
the same value compare equal, consider this program fragment:

 T t = ...; // create a valid T
 T t2 = t;  // make a copy of t
 assert( t == t2 ); // presumably this assertion passes

 char buf[sizeof(T)];
 memcpy(buf, &t, sizeof(T)); // copy bytes out
 memcpy(&t, buf, sizeof(T)); // copy bytes back
 assert( t == t2 ); // what the quoted rule requires

The paragraph above says the second assertion must not fail. If an
implementation cannot support that requirement, I think it must also
fail to support other requirements on pointers, such as conversion to
and from void*. (The requirement doesn't say I have to be able to copy
the bytes to an arbitrary location and have a valid object as a result.
It just says I can copy the bytes of an object, and that the bytes of
the object determine its value.)

Do you have in mind an implementation that supports all other requirements
on objects and pointers, and yet would not be able to compile and
execute the sample program?
---
Steve Clamage, stephen.clamage@eng.sun.com



[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: seurer@rchland.ibm.com (Bill Seurer)
Date: 1996/10/18
Raw View
In article <199610171929.MAA23389@taumet.eng.sun.com>, clamage@taumet.eng.sun.com (Steve Clamage) writes:
|> That is what is written in the current draft of the standard. For example,
|> given that operator== is defined for type T, and objects of type T with
|> the same value compare equal, consider this program fragment:
|>
|>  T t = ...; // create a valid T
|>  T t2 = t;  // make a copy of t
|>  assert( t == t2 ); // presumably this assertion passes
|>
|>  char buf[sizeof(T)];
|>  memcpy(buf, &t, sizeof(T)); // copy bytes out
|>  memcpy(&t, buf, sizeof(T)); // copy bytes back
|>  assert( t == t2 ); // what the quoted rule requires
|>
|> The paragraph above says the second assertion must not fail. If an
|> implementation cannot support that requirement, I think it must also
|> fail to support other requirements on pointers, such as conversion to
|> and from void*. (The requirement doesn't say I have to be able to copy
|> the bytes to an arbitrary location and have a valid object as a result.
|> It just says I can copy the bytes of an object, and that the bytes of
|> the object determine its value.)
|>
|> Do you have in mind an implementation that supports all other requirements
|> on objects and pointers, and yet would not be able to compile and
|> execute the sample program?

I did some checking about the tagged architecture I was thinking of
(AS/400) and it turns out that memcpy will preserve the address tag bits
but only if the array is on the same alignment as the pointer.  I believe
that's a requirement of the hardware.  There's another memcpy-like function
that doesn't preserve the tags (and is a bit faster).
--

- Bill Seurer     ID Tools and Compiler Development      IBM Rochester, MN
  Business: BillSeurer@vnet.ibm.com               Home: BillSeurer@aol.com
  WWW:  http://members.aol.com/BillSeurer
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]





Author: bradds@concentric.net (Bradd W. Szonye)
Date: 1996/10/14
Raw View
C++ guarantees that you can copy the underlying bytes of any object into an
array of char or unsigned char (hereafter called a byte array). It further
guarantees that, if copied back to the original object, the original object
then contains its previous value. It further guarantees that copying a
scalar value into a *different* object of the same type also gives that
other object the first object's value.

Now, if an object of class type is saved and restored in this manner, it
might have its original value, but that original value is no longer valid.
For example, given

 struct mumble { int* foo; } blurgle;

if you allocate blurgle.foo, save the underlying bytes of blurgle, delete
blurgle.foo, then restore the underlying bytes of blurgle, blurgle.foo is
now an invalid pointer. That is, blurgle contains its 'original value,' but
that value is no longer entirely valid.

Can the same thing happen to a scalar? There is good reason to believe that
it might. For example, consider a floating-point scalar 'goo'. Suppose goo
contains an 'exceptional' value such as infinity, not-a-number, or a
denormal value. Providing floats with such values is definitely
'conforming.' Now suppose that the hardware provides a switch that can turn
such values into invalid values which cause havok when used. Now if you
apply the save-and-restore technique, it's possible that flipping the
switch causes previously-valid scalars to become invalid when restored. In
fact, the mere copying of the float out of storage could cause the program
to die. Please clarify that this is in fact a conforming implementation of
floating-point, and--if not--why not.

Now suppose an implementation which implements pointers not as addresses
but as 'handles' to allocated blocks of memory. This can allow some
interesting and significant optimizations of the object layout model and
some sophisticated debugging techniques. The trouble is that it requires
some sort of automatic management of the handles themselves.

Pointers are defined as scalars.

If you copy the underlying bytes of such a pointer into a byte array, the
language guarantees that copying those bytes into the original object or a
new object yields the original object's value, since pointers are scalars.
That doesn't cause trouble with most aspects of pointer semantics, so long
as your handle representation is sufficiently sophisticated. My question
is: does the language guarantee that such a (restored-from-raw-memory)
pointer is still valid? I'm especially concerned with this situation:

1 int* p = new int;
2 char buf[sizeof p];
3 memcpy(buf, &p, sizeof p);
4 p = NULL;
5 // do lots of arbitrary stuff here
6 memcpy(&p, buf, sizeof p);

At the end of the block, is p *guaranteed* to still be a valid pointer,
pointing to the integer allocated in line 1? Or does "killing" the pointer
in line 4 waive guarantees about the pointer's future validity? I don't
argue that in the presence the line

3.5 int* pcopy = p;

that the pointer *must* still be valid. But does the same guarantee hold
for a pointer (or other object) stored in an arbitrary byte array? I'm
guessing no, since a class-type object or floating-point number might be
restored to an 'invalid' former value in this way.

I'm concerned about this situation, not because I'm afraid of code like the
above failing, but because I'm rather hoping that it is not guaranteed.
There are workarounds to the pointers-as-handles approach; for example, you
could keep sufficient information available to re-map a handle should it
ever 'disappear' in this way. Such a mapping is probably prohibitively
expensive, however. I'd *rather* see some sort of relaxing of the
object-copy rule:

  3.9  Types                                               [basic.types]

2 For any object type T, whether or not the object holds a  valid  value
  of  type T, the underlying bytes (_intro.memory_) making up the object
  can be copied into an array of char or unsigned char.12)

  If the content of the array of char or unsigned char  is  copied  back
  into  the  object,  the  object  shall  subsequently hold its original
  value.

To this I'd like to see added a (possibly non-normative) paragraph or note:

  However, the original value is not guaranteed to still be valid when
  copied back into the object, even if the value was valid when copied
  into the array of char or unsigned char. [Note: this is likely for
  objects of class, floating-point, or pointer type. --end note] The
  scalar types for which this guarantee does not hold are
  implementation-defined.

That would be sufficient verbiage to allow a pointers-as-handles
implementation to conform without greatly diluting the original meaning of
the clause. A later paragraph in the same section needs a similar loophole
to handle scalar-to-different-scalar copies. The idea is that assignment
between objects with statically-known types guarantee continuously-valid
values, but copying to an 'unchecked' byte array can lead to aggressive
compiler optimizations. This would not break existing implementations;
whether future translators actually took advantage of the loosened
restriction to implement "smarter dumb pointers" would become a
quality-of-implementation issue. I don't see this as a terrible problem,
since copying the raw bytes out of pointers has always been an unsafe
practice.

The problem is especially hairy because byte arrays are so heavily
overloaded in C and C++; there is no notion of 'this array is for storing
copies of objects' as opposed to 'this array contains a text string.' You
could add run-time checks to determine when objects were copied into such
arrays, but the checks would seriously degrade, for example, string
processing. I don't think the problem is tractable by static analysis, not
even given whole-program (as opposed to translation-unit) analysis.

I should note that I'm not thinking of garbage collection in the general
case. I'm looking at a more specific case of providing pointers with more
run-time information without increasing their actual data size. One way to
accomplish that is to implement pointers as automatically-managed 'handles'
with some hidden state. The object-copy rule is a significant barrier to
that implementation.

In particular, such pointers provide a means of keeping track of them;
object-copy tends to defeat any means of tracking pointers. Keeping track
of the pointers in a program is the first step toward the distant goal of
optional (but not required) automatic memory management in general. It
also, as I said before, provides a good hook for other optimizations and
debugging techniques. I believe that adding a few 'weasel words' to the
object-copy rules is acceptable in pursuit of those goals.

Is it too late to add this paragraph to the standard as an 'editorial
correction,' or does it have wider scope than that? Or do implementors
using this unconventional technique suffer the 'non-conforming' label until
C++200x? I'm not sure that turning this type of behavior 'off' is
implementable as a simple #pragma or compiler switch, so if it's an
extension, it's a tricky one.
--
Bradd W. Szonye
bradds@concentric.net
http://www.concentric.net/~bradds
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]