Topic: bitwise logical operations on pointers


Author: rgp@seiko.mpd.tandem.com (Ramon Pantin)
Date: 24 Jul 92 04:39:46 GMT
Raw View
Somewhere along the evolution of C this type of expressions where made illegal:
 char *p = ...;
 p &= ~1;

the workaround:
 p = (char *) ((int)p & ~1);

makes my code:
 - harder to write
 - less portable because it assumes that there is an integral type
   (if not "int", "long") that has enough precision (bits) so that
   the casts will be harmless.

I understand that (in theory) any code that does bitwise logical
operations (|, &, ~, etc.) is not portable because it is somehow interpreting
the representation of the pointer while doing such manipulations.  In practice,
for the type of work that I do (writing operating systems) and for the class
of machines that I'm interested in (machines with large flat virtual
address spaces)
I don't need to worry about machines with bizare addressing machanisms.

Could the people in the ANSI C++ committe consider allowing these traditional
K&R expressions back into the ANSI C++ language?

Do I have to write a _proposal_ for this?

Does anybody know what was the rationale for making these expressions illegal
in ANSI-C ?

Ramon Pantin







Author: steve@taumet.com (Steve Clamage)
Date: Fri, 24 Jul 1992 17:52:09 GMT
Raw View
rgp@seiko.mpd.tandem.com (Ramon Pantin) writes:

>Somewhere along the evolution of C this type of expressions where made illegal:
> char *p = ...;
> p &= ~1;

>the workaround:
> p = (char *) ((int)p & ~1);

>makes my code:
> - harder to write
> - less portable because it assumes that there is an integral type
>   (if not "int", "long") that has enough precision (bits) so that
>   the casts will be harmless.

You could use 'unsigned long' rather than 'int' to improve the chances of
the code working.

The workaround is not less portable, since bitwise operations on pointers
are not portable at all.  Any such messing with pointers has to be
confined to machine-specific modules to have any hope of porting your
program.  If it makes sense to bit-twiddle a pointer, nothing prevents
a compiler from providing an extension to do so -- the Standard only
requires the compiler to have a way to diagnose such code as an error.

The reason the C Standard makes such twiddling illegal is that there is
no way to specify semantics for it (in the context of a programming-
language Standard).  It doesn't make any sense to me to say that a
construct is completely legal but has no defined semantics.

Those things in the C Standard which have undefined behavior are
those which are in general infeasible to detect at compile time.
--

Steve Clamage, TauMetric Corp, steve@taumet.com
Vice Chair, ANSI C++ Committee, X3J16




Author: rgp@mpd.tandem.com (Ramon Pantin)
Date: 25 Jul 92 00:11:36 GMT
Raw View
In article <1992Jul24.175209.17306@taumet.com> steve@taumet.com (Steve Clamage) writes:
>rgp@seiko.mpd.tandem.com (Ramon Pantin) writes:
>
>>Somewhere along the evolution of C this type of expressions where made illegal:
>> char *p = ...;
>> p &= ~1;
>
>>the workaround:
>> p = (char *) ((int)p & ~1);
>
>>makes my code:
>> - harder to write
>> - less portable because it assumes that there is an integral type
>>   (if not "int", "long") that has enough precision (bits) so that
>>   the casts will be harmless.
>
>You could use 'unsigned long' rather than 'int' to improve the chances of
>the code working.

I already said 'if not "int", "long"' in my previous paragraph.

>
>The workaround is not less portable, since bitwise operations on pointers
>are not portable at all.

What an argument!  With K&R compilers and off the shelf 32 bit
processors: MIPS, Sparc, 88K, i[34]86 in 32 bit mode, VAX, NS-32X32,
VAX, RS/6000, etc (_long_ etc here) it is perfectly possible to write a
portable Virtual Memory Subsystem where the portable code (i.e. MMU
independent code) uses bitwise logical operations on pointers to
perform the splitting of an address into a page frame number and the
offset within the page.  Now, the same code could be used on a processor
were pointers are 64 bits and both "ints" and "longs" are 32 bits,
for example, with a compiler where some non-portable code would be
broken if "longs" were 64 bits.  Now if I use the "workaround", then
my pointers would get truncated to 32 bits, it is really _less_ portable.
Of course, a workaround based on unions of a pointer and an array of
some integral type could also be used (even more uglier code).

> ... Any such messing with pointers has to be
>confined to machine-specific modules to have any hope of porting your
>program.

For the class of machines that I'm interested in, my K&R code is portable
among them, no need to call it machine-specific in this case.

> ...  If it makes sense to bit-twiddle a pointer, nothing prevents
>a compiler from providing an extension to do so -- the Standard only
>requires the compiler to have a way to diagnose such code as an error.

Most ANSI-C compilers that I have tried this on simply say something
like "error: operators of different types".  Most compiler writters
seem to simply try to implement what the standard dictates and nothing
else.

>
>The reason the C Standard makes such twiddling illegal is that there is
>no way to specify semantics for it (in the context of a programming-
>language Standard).

The semantic would be "the logical bitwise operation is performed
on the raw bits of the pointer storage with the semantics of the
operation being exactly the same as the semantics for the operation
on an unsigned integral type on that machine."

>... It doesn't make any sense to me to say that a
>construct is completely legal but has no defined semantics.

How strong was the oposition within ANSI-C with respect to keeping
these traditional K&R semantics?  Were these semantics discarded
because of some purist sense of language design or was there some
vendor with bizare pointers strongly opposed to the acceptance
of these semantics?

>Those things in the C Standard which have undefined behavior are
>those which are in general infeasible to detect at compile time.
>--
>
>Steve Clamage, TauMetric Corp, steve@taumet.com
>Vice Chair, ANSI C++ Committee, X3J16

Thanks for your feedback.

Ramon Pantin




Author: steve@taumet.com (Steve Clamage)
Date: Sat, 25 Jul 1992 16:39:09 GMT
Raw View
rgp@mpd.tandem.com (Ramon Pantin) writes:

>>The workaround is not less portable, since bitwise operations on pointers
>>are not portable at all.

>What an argument!  With K&R compilers and off the shelf 32 bit
>processors: MIPS, Sparc, 88K, i[34]86 in 32 bit mode, VAX, NS-32X32,
>VAX, RS/6000, etc (_long_ etc here) it is perfectly possible to write a
>portable Virtual Memory Subsystem where the portable code (i.e. MMU
>independent code) uses bitwise logical operations on pointers to
>perform the splitting of an address into a page frame number and the
>offset within the page.

Unfortunately, or perhaps fortunately, these are not the only machines
in use with C and C++ compilers.  As discussed in the FAQ list for
comp.lang.c, there are other machines, old and new, to which your code
is not portable.  Some of these machines are designed for safe access to
memory via pointers, and I would hope to see more, not fewer, of these
designs in the future.

>Now, the same code could be used on a processor
>were pointers are 64 bits and both "ints" and "longs" are 32 bits,
>for example, with a compiler where some non-portable code would be
>broken if "longs" were 64 bits.  Now if I use the "workaround", then
>my pointers would get truncated to 32 bits, it is really _less_ portable.

Well, let me turn your own argument back on you:  I don't care about
machines with 64-bit pointers, so I don't care if casting a pointer
to a long doesn't work on them.  (You forgot to mention machines with
48-bit pointers, but I don't care about them either.)  Do you find this
attitude inappropriate?  So do I.  I also don't think it makes sense
to say whether one unportable construct is "less portable" than another
(in the context of a language Standard).

>For the class of machines that I'm interested in, my K&R code is portable
>among them, no need to call it machine-specific in this case.

Again, there are other machines that other programmers care about.  A
question which must be addressed is which machines shall be disparaged
by the language Standard.  What will be the impact of making it
impossible to make a conforming compiler on those machines?

>>The reason the C Standard makes such twiddling illegal is that there is
>>no way to specify semantics for it (in the context of a programming-
>>language Standard).

>The semantic would be "the logical bitwise operation is performed
>on the raw bits of the pointer storage with the semantics of the
>operation being exactly the same as the semantics for the operation
>on an unsigned integral type on that machine."

No, that is not a language semantic specification, that is an
implementation specification.  If I have a pointer to an object
of a specified type, the Standard defines what it means to perform
various operations on it.  I can dereference it and get the object.
I can add 1 to it and get the address of the next object in the
array of objects (if there is one).

Suppose we allow:
 T* p = ...;
 int i = ...;
 ... p & i ...
What is the type of this expression?  It isn't T*, since in general it
will not point to an object of type T.  Is it void*?  If so, what can
I do with this void*?  Can I cast it to some type on which operations may
be performed?  Remember that guarantees about casting from void* apply
only when casting a pointer value back to its original type.

I don't think you can find sensible (in a language Standard context)
answers to these questions.  Your answers have to make sense on all
machines for which it is otherwise possible to have a conforming C or
C++ compiler.  Otherwise all you could say is "undefined".
--

Steve Clamage, TauMetric Corp, steve@taumet.com
Vice Chair, ANSI C++ Committee, X3J16