Thread

Topic: Endian Functions

Author: "kanze" <kanze@gabi-soft.fr>
Date: Tue, 18 Jul 2006 10:16:32 CST Raw View

Frederick Gotham wrote:
> kanze posted:

> > There are places in the C++ (but not the C) standard where
> > the non-trapping guarantees for unsigned char are extended
> > to plain char.

> Chapter and Verse... ?

   3.9/2:
    For any complete POD object type T, whether or not the
    object holds a valid value of type T, the underlying bytes
    making up the object can be copied into an array of char or
    unsigned char.  If the content of the array of char or
    unsigned char is copied back into the object, the object
    shall subsequently hold its original value.

Since an array of unsigned char is a POD, anything that can be
in an unsigned char can be written into a char and read back
without loss of information (and without trapping).

The rough equivalent to this in the C standard is    6.2.6.1/4,
where is says "[...] The value may be copied into an object of
tyep unsigned char [n] (e.g., by memcpy);[...]"  No mention of
char.  Interestingly enough, however, the following paragraph
says "Certain object representations need not represent a value
of the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined."  Looks like
they couldn't make up their minds either:-).

In practice, I will always use unsigned char for raw memory.
For one reason or another, I don't feel sure about the other
types.  In practice, however, you're really just as safe with
char, if for no other reason than the fact that C++ uses a
basic_streambuf<char> for reading and writing binary data, and
doesn't have a basic_streambuf<unsigned char>.  And the presence
of specific bit patterns that you cannot read and write from
disk in binary mode isn't acceptable from a quality of
implementation point of view, no matter what the standard says
(or can be made to say with a little bit of forcing).

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Mon, 17 Jul 2006 09:40:47 CST Raw View

kuyper@wizard.net wrote:
> Frederick Gotham wrote:

> > Does this mean that if you want to access an object in C++
> > as if it were an array of bytes, then you MUST use unsigned
> > char rather than signed char or plain char, because the
> > latter two can have a "bad value"?

> You've got it.

Almost.  There are places in the C++ (but not the C) standard
where the non-trapping guarantees for unsigned char are extended
to plain char.  Presumably, on machines where signed char has
trapping values, plain char must be congruent to unsigned char.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Mon, 17 Jul 2006 09:41:29 CST Raw View

johnchx2@yahoo.com wrote:
> kanze wrote:

> > Given that, I'd say that there is a definite intention that
> > signed char can have trapping representations.

> That would make sense, but I'm not sure how to square it with
> 3.9.1/7, which says, "The representations for integral types
> shall define values by use of a pure binary numeration
> system."

Taken too literally, wouldn't that also mean that 2's complement
is illegal.  The bits in a negative number in 2's complement
certainly aren't what I would understand as a "pure binary
numeration system" -- bit 1 doesn't have the value 2^1.

> How can you have a trap representation when all the object
> representation bits participate in the value representation,
> and the value representation is a pure binary numeration?

Some numeric values trap?  The C standard explicitly says that
if 2's complement (or signed magnitude) is used, the bit pattern
of sign bit 1 and all other bits 0 may trap.  (For 1's
complement, the bit pattern of all 1 bits may trap.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Mon, 17 Jul 2006 18:04:46 GMT Raw View

kanze posted:

> There are places in the C++ (but not the C) standard where the
> non-trapping guarantees for unsigned char are extended to plain char.


Chapter and Verse... ?



--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kuyper@wizard.net
Date: Mon, 17 Jul 2006 16:52:41 CST Raw View

johnchx2@yahoo.com wrote:
> kuyper@wizard.net wrote:
.
> > No, you're being under-literal. The relevant section says that "The
> > representations of integral types shall define values by use of a pur
> > binary numeration system."  Bit patterns which don't repesent anything
> > aren't covered by that statement.
>
> I don't agree.  The language specifically does *not* say "...define
> NUMERIC values by use of..."  So it would appear that you're assuming
> that a trap value isn't a value, simply because it doesn't represent a
> number.  I don't see any support for that view.

The C++ standard contains no definition for the term "trap
representation". The C standard does. Neither language has a definition
of a "trap value". The C standard defines a trap representation by the
fact that it does not represent a value. The fact that such a
representation can "trap" under certain circumstances is covered in the
C standard as  a consequence of the representation being a trap
representation, not as the defining characteristic.

Given the close family connections between C and C++, i've chosen to
assume the the references to "traps" in this thread correspond to the
traps as defined in C, even though they don't exist as such in C++. For
the most part, the people who argued against explicit description of
trap representations in C99 did so by saying that it was redundant,
that the C90 wording already implied the same things that the C99
changes made explicit. The relevant C++ wording is essentially
equivalent (for this purpose) to the corresponding C90 wording,

> For example, an uninitialized POD is said to have an "indeterminate
> initial value" (8.5/9).  Nothing permits it to have "no value."

In the C standard, which is the only one that actually defines trap
representations, an "indeterminate value" is defined as "either an
unspecified value, or a trap representation". This is clearly a bad
definition, given that repesentations and value are two distinct
categories, and that trap representations, by definition, do not
represent values. However, this was nonetheless deliberate - I believe
that it was defined that way to allow sections that refer to an
"indeterminate value" to remain unchanged, when the standard was
modified to explicitly cover trap representations.

I'd strongly recommend that if and when C++ takes on board the concept
of trap representations, it will change the sections that currently say
that an object has an indeterminate value, to say that instead that it
contains an indeterminate representation.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Thu, 13 Jul 2006 16:04:40 CST Raw View

johnchx2@yahoo.com wrote:
> Bob Bell wrote:
> >
> > 4.1 Lvalue-to-rvalue conversion.
>
> Oh, that!  :-)
>
> Just to clarify, 4.1 gets into the question of what you can do with the
> lvalue once you've got it.  I thought you were arguing that merely
> forming the lvalue (dereferencing the pointer) yielded undefined
> behavior.  Which, under your reading of 5.2.10/7, it should.

Let me clarify a bit. You cited 3.10/15, which seems to say among other
things that accessing a T object (where T is not char) using a char
lvalue is valid. I asked if you knew of any clause that says what the
result is. You then asked if there is a clause saying what you get if
accessing a type T object through a T lvalue, saying that if I found
that you'd be able to find language to answer my question. I pointed
out 4.1. My reading of 5.2.10/7 doesn't come into that response, since
the question to which "4.1" was the answer was about a T object and a T
lvalue.

So I ask again: is there a clause somewhere that says what the result
of accessing a T object through a char lvalue is?

> My difficulty with that reading is that, strictly speaking, it makes
> any use of reinterpret_cast (other than as the argument to the inverse
> reinterpret cast) undefined.  For example:
>
>   char* p2 = reinterpret_cast<char*>(p1);
>
> yields undefined behavior, because nothing guarantees that the result
> of the reinterpret_cast can be stored to a pointer variable.

The very first sentence of 5.2.10/7 says, "A pointer to an object can
be explicitly converted to a pointer to an object of different type."
That seems a pretty clear statement that this line of code is
well-defined. (As for whether p1 can be stored in p2, you're right --
nothing (not even 5.2.10/10) says that.) Next, 5.2.10/7 says that I can
convert the pointer back:

   // Assuming the type p1 points to is T:
   assert(p1 == reinterpret_cast<T*>(p2));

I can imagine an implementation where p2 never held the bit pattern of
p1, yet this still works.

> I don't think anyone believes that was the intent, so 5.2.10/7 cannot
> mean what it appears -- on literal reading -- to say.

I believe the intent of 5.2.10/7 is exactly what it says: it allows the
conversion of T1 pointers to T2, such that the original T1 pointer can
be recovered if certain conditions are met. I don't see how this can be
interpreted to make all reinterpret_casts of pointers undefined.

> Since the
> literal reading is a dead letter, the live question is, what *can* you
> do with the resulting pointer value.  And 5.2.10/10 would seem to
> answer that question.

5.2.10/7 seems to answer that question too; thus the confusion.

> I do agree that the language in 5.2.10/7 is misleading and could
> certainly use some cleaning up.

What would you suggest?

> Now, to look at the question of what can be done with the resulting
> lvalue...there's an interesting open DR on 4.1:
>
>   http://www2.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#240
>
> The DR is interesting both for what it says and for what it doesn't.
> It says (in part):
>
>   There's no exception made for unsigned char types.
>   The wording in 3.9.1  basic.fundamental was carefully crafted
>   to allow use of unsigned char to access uninitialized data so that
>   memcpy and such could be written in C++ without undefined
>   behavior, but this statement undermines that intent.
>
> which confirms the intent to make it possible to read more or less
> arbitrary bytes in memory via an lvalue of type unsigned char, just as
> memcpy() would.  The interesting omission is that the DR is entirely
> concerned with the issue of reading *uninitialized* memory; it makes no
> mention of the type congruence language ("If the object to which the
> lvalue refers is not an object of type T...").
>
> But there's still no way to implement memcpy() in C++ if we're not
> allowed to read initialized memory of non-unsigned-char type through
> lvalues of type unsigned char.  Why no mention of this in the DR?  It
> would appear that there's a sort of unspoken assumption that this *is*
> permitted under the current rules.  But why?

I don't know. Perhaps the DR author didn't think of it. Speculating on
the motivations of a DR author is not the same as reading the standard
to determine what's undefined behavior or not.

> I suspect that the clue is 3.9/4, which says, "The object
> representation of an object of type T is the sequence of N unsigned
> char objects taken up by the object of type T, where N equals
> sizeof(T)."  This would seem to imply that the underlying bytes of an
> object really are -- and may legally be accessed as -- unsigned char
> objects.

Except that 3.9/4 applies to non-POD types as well, and I've always
understood (perhaps incorrectly) that reading them as bytes (e.g.,
using memcpy to copy a non-POD type) is undefined behavior.

> Now, there's a certain slipperiness in the standard about whether it's
> legal to access the underlying bytes of an object via an lvalue of type
> char.  The intent seems to be that you can (i.e. all the rules and
> special exceptions that appear intended to support this use encompass
> both char and unsigned char), but I'm not sure that the last link in
> the chain is actually there.

That's the only conclusion I can reach about 5.2.10/10. It seems that
there is an intent to allow dereferencing of pointers obtained with
reinterpret_cast; it's the language that's lacking.

The only way I can interpret 5.2.10/10 is that since it says the effect
of

   T1 t1;
   T2& t2(reinterpret_cast<T2&>(t1));

is equivalent to

   T1 t1;
   T2& t2(*reinterpret_cast<T2*>(&t1));

then the first would be well-defined or undefined whenever the second
is well-defined or undefined. So when is the second well-defined?
5.2.10/7 says "never". However, other clauses (pointed out by you and
others) seem to imply that there is an intent to allow it in some
cases. It's hard for me to regard an implied intention as carrying more
weight than the clear statement of 5.2.10/7.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Fri, 14 Jul 2006 00:25:08 CST Raw View

Bob Bell wrote:

> So I ask again: is there a clause somewhere that says what the result
> of accessing a T object through a char lvalue is?

Actually, my revised view is that you aren't accessing an object of
class T, but rather an object of type unsigned char, under 3.9/4.  That
would at least authorize reading it through an unsigned char lvalue
under 4.1.

> > My difficulty with that reading is that, strictly speaking, it makes
> > any use of reinterpret_cast (other than as the argument to the inverse
> > reinterpret cast) undefined.  For example:
> >
> >   char* p2 = reinterpret_cast<char*>(p1);
> >
> > yields undefined behavior, because nothing guarantees that the result
> > of the reinterpret_cast can be stored to a pointer variable.
>
> The very first sentence of 5.2.10/7 says, "A pointer to an object can
> be explicitly converted to a pointer to an object of different type."
> That seems a pretty clear statement that this line of code is
> well-defined. (As for whether p1 can be stored in p2, you're right --
> nothing (not even 5.2.10/10) says that.)

I should have said "any use of *the result of* reinterpret_cast (other
than..."  The cast in my example is fine, it's the initialization of p2
that yields UB.

The point I'm trying to make is that this interpretation renders any
non-trivial use of reinterpret_cast undefined.  You can't store the
result in a variable, you can't pass the result to a function, you
literally can't do anything with it except *immediately* cast it back
to the original type.  The one and only well-defined use of
reinterpret_cast looks like this:

  reinterpret_cast<T1*>(reinterpret_cast<T2*>(p));

where p has type T1*.  (Well, you could also just perform the cast and
discard the result.  I'd call that trivial as well.)

Is there some non-trivial usage that I'm overlooking?

> I believe the intent of 5.2.10/7 is exactly what it says: it allows the
> conversion of T1 pointers to T2, such that the original T1 pointer can
> be recovered if certain conditions are met. I don't see how this can be
> interpreted to make all reinterpret_casts of pointers undefined.

It doesn't make the casts undefined, but it restricts them to trivial
uses.

> > I do agree that the language in 5.2.10/7 is misleading and could
> > certainly use some cleaning up.
>
> What would you suggest?
>

Replacing the second sentence with something like:

  The value returned by reinterpret_cast is unspecified, except that
  reinterpret_cast<T1*>(reinterpet_cast<T2*>(p)) == p if p has
  has type T1*, T1 and T2 are object types, and the alignment
  requirements of T2 are no stricter than those of T1.  The
  result of dereferencing a pointer returned by
  reinterpret_cast<T1*>(p)  is an lvalue of type T1 designating
  the object designated by *p.   (Note: the uses of the
  resulting lvalue are subject to 3.10/15 and  4.1.)

> > I suspect that the clue is 3.9/4, which says, "The object
> > representation of an object of type T is the sequence of N unsigned
> > char objects taken up by the object of type T, where N equals
> > sizeof(T)."  This would seem to imply that the underlying bytes of an
> > object really are -- and may legally be accessed as -- unsigned char
> > objects.
>
> Except that 3.9/4 applies to non-POD types as well, and I've always
> understood (perhaps incorrectly) that reading them as bytes (e.g.,
> using memcpy to copy a non-POD type) is undefined behavior.
>

I don't think that's necessarily UB.  With PODs, there are other
guarantees that don't apply to non-PODs.  You can memcpy() a POD into
an array of char or unsigned char and memcpy() the bytes back to the
original location and get the original value.  You can memcpy() one POD
"over" another of the same type, and the target object will hold the
same value as the source.  Neither of these is true of non-PODs.  But I
don't think there's any restriction on simply examining the underlying
bytes of a non-POD.  You can even scribble over the top of a non-POD
without necessarily generating UB (3.8/4).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: dave@boost-consulting.com (David Abrahams)
Date: Fri, 14 Jul 2006 15:47:56 GMT Raw View

johnchx2@yahoo.com writes:

> The point I'm trying to make is that this interpretation renders any
> non-trivial use of reinterpret_cast undefined.  You can't store the
> result in a variable, you can't pass the result to a function, you
> literally can't do anything with it except *immediately* cast it
> back to the original type.  The one and only well-defined use of
> reinterpret_cast looks like this:
>
>   reinterpret_cast<T1*>(reinterpret_cast<T2*>(p));
>
> where p has type T1*.  (Well, you could also just perform the cast and
> discard the result.  I'd call that trivial as well.)

Which makes that interpretation almost certainly wrong,
i.e. inconsistent with the intent of the standard.

However, it is true that there's not much you can do portably with the
result of reinterpret_cast.  The right way to accomplish such a
conversion is:

   static_cast<T2*>( static_cast<void*>(p) );

(using implicit_cast for the inner cast would be better of course).

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Fri, 14 Jul 2006 10:46:35 CST Raw View

kuyper@wizard.net wrote:
> johnchx2@yahoo.com wrote:
> > kuyper@wizard.net wrote:
> > Perhaps I'm being over-literal, but I don't see any exception to the
> > rule that integral types define their values by use of a pure binary
> > numeration system.  It doesn't say "...except for trap or invalid
> > values."
>
> No, you're being under-literal. The relevant section says that "The
> representations of integral types shall define values by use of a pur
> binary numeration system."  Bit patterns which don't repesent anything
> aren't covered by that statement.

I don't agree.  The language specifically does *not* say "...define
NUMERIC values by use of..."  So it would appear that you're assuming
that a trap value isn't a value, simply because it doesn't represent a
number.  I don't see any support for that view.

For example, an uninitialized POD is said to have an "indeterminate
initial value" (8.5/9).  Nothing permits it to have "no value."  But
since we know that an uninitialized int (for instance) is allowed to
have a "trapping bit pattern," it seems reasonable to conclude that a
trapping value is indeed a value.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Fri, 14 Jul 2006 10:43:55 CST Raw View

Frederick Gotham wrote:
> SuperKoko posted:
>
>
> > Not *signed char* ! The standard says it for *char* and *unsigned char*
> > only!
>
>
> 3.9.1.1
>
> A char, a signed char, and an unsigned char occupy the same amount of
> storage and have the same alignment requirements; that is, they have the
> same object representation. For character types, all bits of the object
> representation participate in the value representation.
>
It is not sufficient!
In fact, 3.10-25 gives the answer.

"
15If  a program attempts to access the stored value of an object
through
  an lvalue of other than one of the following  types  the  behavior
is
  undefined25):
  _________________________
  25) The intent of this list is to specify those circumstances in
which

  --the dynamic type of the object,

  --a cv-qualified version of the dynamic type of the object,

  --a type that is the signed or  unsigned  type  corresponding  to
the
    dynamic type of the object,

  --a  type  that  is the signed or unsigned type corresponding to a
cv-
    qualified version of the dynamic type of the object,

  --an aggregate or union type that includes one of  the
aforementioned
    types  among its members (including, recursively, a member of a
sub-
    aggregate or contained union),

  --a type that is a (possibly cv-qualified)  base  class  type  of
the
    dynamic type of the object,

  --a char or unsigned char type.

  _________________________
  an object may or may not be aliased.
"
Accessing any value through a char lvalue or unsigned char lvalue is
legal, but accessing an object whose dynamic type is not signed char
nor unsigned char through a signed char lvalue has undefined behavior.
I don't know why conv.lval doesn't include this list... I have always
seen that as a defect in the standard.

Similarly, even on platforms where long and int have the same size,
representation and representation value, and even when it is well
document, accessing an int via a long lvalue has UB.
With GCC 4.0.2 on i386 GNU/Linux:
#include <iostream>

// on this platform int and long have the same size & value
representation.
void assign(void** pp,void* p) {*pp=p;}
int main() {
 typedef int base_type;
 typedef long alias_type;

 base_type c=0;
 void* first_p=&c;
 void* other_p;
 assign(&other_p,first_p);
 alias_type* ap=static_cast<alias_type*>(other_p);
 c=1;
 *ap=2;

 std::cout << c << '\n';
}

Compiled with:

g++ --pedantic-errors -Wall -Wextra -O2 alias_int.cpp

The output is 1

With, -O1, the output is 2


> > Now, the real question is : Does reinterpret_cast produce a *valid*
> > pointer pointing at the *first byte* of the object?
>
>
> I think we have to use common-sense here. Also, the following tells us that
> any object's address can be stored in a char*.
>
>
> 5.2.10.7
>
> A pointer to an object can be explicitly converted to a pointer to an
> object of different type. Except that converting an rvalue of type "pointer
> to T1" to the type "pointer to T2" (where T1 and T2 are object types and
> where the alignment requirements of T2 are no stricter than those of T1)
> and back to its original type yields the original pointer value, the result
> of such a pointer conversion is unspecified.
>
>
Here common sense doesn't help... 5.2.10-7 doesn't say anything on the
pointer produced by reinterpret_cast... If there were no 5.2.10-10 it
would mean that behavior is undefined.
But, I admit (as I said in a previous post) that 5.2.10-10 implies that
using the reinterpet_cast'ed pointer is correct, even if the wording of
the WG21 is very confusing.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Fri, 14 Jul 2006 10:43:25 CST Raw View

Manfred von Willich wrote:
> Okay, perhaps no-one will take this seriously, but here goes: a
> suggestion for cleaning up the char type as used for accessing raw
> memory.
>
> As can be seen from this thread, the types char and signed/unsigned
> have been coopted to do the work of representing memory for the purpose
> of copying PODs etc., and we can see that due to the natural semantics
> attached to char types, a lot of confusion is occurring.
>
Yes. I think that char and unsigned char have these special properties
for historical reasons : compatibility with existing pre-standard code.

> Would it not make sense to to introduce a new type, e.g. __octet (or
> __mem etc.), that has as its sole purpose copying and
> equality-comparing POD data?
I have already thought of that, though *byte* would be a much more
adequate name.
In human language, "octet" means exactly 8 bits, while byte can have
less or more.
For the C and C++ standards, a byte have an implementation-defined
number of bits.
So, it should be __byte.

Of course for backward compatibility reasons, we can't remove the
actual semantics of char and unsigned char, otherwise, and incredibely
huge amount of code would break.

> Only operators =, ==, !=, new, address-of
> may be used with it (no inequality or arithmetic operators).  The type
> __octet* would allow standard pointer arithmetic.  All pointers to POD
> could be implicitly converted to __octet*, and __octet* could be
> implicitly converted to void*.  New functions replacing memcpy and
> memcmp should use __octet* and __octet const* as appropriate.  Also,
> static_cast would allow casting of void* to __octet* and any other
> pointer, and __octet* to POD types, but not __octet* to a pointer to a
> non-POD type.  (When one is dealing at this low level, leading
> underscores should not be seen as ugly - after all, we use it only to
> dig into the guts of the hardware, and we should be reminded of this.)
>
> Advantages include:
>  * Introducing the new type has no impact on code that does not make
> reference to it.
>  * The semantics of char and its variations can be decoupled from their
> mem-like uses.
>  * The compiler can keep track of PODness, providing additional
> type-safety.
>  * Fewer uses of reinterpret_cast are needed - bytes of a POD can be
> accessed using an implicit cast, hence safer code.
>  * PODness can be directly compile-time checked in a template (by
> casting a pointer to the object implicitly to __octet*).
>
I don't know if it is worth the complication... Since it is quite
low-level, only a few programmers need to use that. In that case,
unsigned char is not that bad.
It doesn't really bring anything "new" in the language, so, most people
would prefer to use the old "unsigned char" to be more compatible with
older compilers.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kanze.james@neuf.fr (James Kanze)
Date: Fri, 14 Jul 2006 18:25:17 GMT Raw View

johnchx2@yahoo.com wrote:
 > Alberto Ganesh Barbati wrote:
 >> johnchx2@yahoo.com ha scritto:
 >>> Alberto Ganesh Barbati wrote:
 >>>> Frederick Gotham ha scritto:
 >>>>> A char has no trap representations, and so it is safe to
 >>>>> access the bytes of any object in memory as if it were a
 >>>>> char, signed char or unsigned char:
 >>>> Could you please quote the clause in the C++ standard that guarante=
es
 >>>> that?
 >>> 3.9.1/1, which says, "For character types, all bits of the object
 >>> representation participate in the value representation."
 >> So what?

 > Well, that's the language that guarantees that char has no
 > trap representation.

Except that it doesn't.  The fact that all bits participate in
the object representation doesn't mean that there can't be
trapping values.  All of the bits in an IEEE float participate
in the value representation, but that doesn't prevent the
existance of trapping NaN's.  And the C standard explicitly says
that a value with all (non-padding) bits 1 can trap in 1's
complement, as can a value with a sign bit 1 and all other bits
0 in 2's complement or signed magnitude.  (In 1's complement or
signed magnitude, these values would otherwise correspond to a
negative 0.)

--=20
James Kanze                                    kanze.james@neuf.fr
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "mark" <markw65@gmail.com>
Date: Fri, 14 Jul 2006 14:11:38 CST Raw View

johnchx2@yahoo.com wrote:
> kuyper@wizard.net wrote:
> > johnchx2@yahoo.com wrote:
> > > kuyper@wizard.net wrote:
> > > Perhaps I'm being over-literal, but I don't see any exception to the
> > > rule that integral types define their values by use of a pure binary
> > > numeration system.  It doesn't say "...except for trap or invalid
> > > values."
> >
> > No, you're being under-literal. The relevant section says that "The
> > representations of integral types shall define values by use of a pur
> > binary numeration system."  Bit patterns which don't repesent anything
> > aren't covered by that statement.
>
> I don't agree.  The language specifically does *not* say "...define
> NUMERIC values by use of..."  So it would appear that you're assuming
> that a trap value isn't a value, simply because it doesn't represent a
> number.  I don't see any support for that view.

3.9/4 [...] For POD types, the value representation is a set of bits in
the object representation that determines a value, which is one
discrete element of an implementation-defined set of values.

In other words, if the implementation says its not a value, its not a
value.

Mark Williams

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Sat, 15 Jul 2006 01:20:21 CST Raw View

johnchx2@yahoo.com wrote:
> Bob Bell wrote:
> > > My difficulty with that reading is that, strictly speaking, it makes
> > > any use of reinterpret_cast (other than as the argument to the inverse
> > > reinterpret cast) undefined.  For example:
> > >
> > >   char* p2 = reinterpret_cast<char*>(p1);
> > >
> > > yields undefined behavior, because nothing guarantees that the result
> > > of the reinterpret_cast can be stored to a pointer variable.
> >
> > The very first sentence of 5.2.10/7 says, "A pointer to an object can
> > be explicitly converted to a pointer to an object of different type."
> > That seems a pretty clear statement that this line of code is
> > well-defined. (As for whether p1 can be stored in p2, you're right --
> > nothing (not even 5.2.10/10) says that.)
>
> I should have said "any use of *the result of* reinterpret_cast (other
> than..."  The cast in my example is fine, it's the initialization of p2
> that yields UB.
>
> The point I'm trying to make is that this interpretation renders any
> non-trivial use of reinterpret_cast undefined.  You can't store the
> result in a variable, you can't pass the result to a function, you
> literally can't do anything with it except *immediately* cast it back
> to the original type.  The one and only well-defined use of
> reinterpret_cast looks like this:
>
>   reinterpret_cast<T1*>(reinterpret_cast<T2*>(p));
>
> where p has type T1*.  (Well, you could also just perform the cast and
> discard the result.  I'd call that trivial as well.)
>
> Is there some non-trivial usage that I'm overlooking?

I think you're taking 5.2.10/7 more literally than I was, but I see
what you mean. I'm now convinced that there is an intention to allow
things like this:

   // T is a POD type:
   T x;
   char a[sizeof(T)];
   for (int i = 0; i != sizeof(T); ++i)
      a[i] = *(reinterpret_cast<char*>(&x) + i);

While preventing things like this:

   // T1 and T2 are non-POD class types; T2 has an "f" member function.
   T1 x;
   T2* y(reinterpret_cast<T2*>(&x);
   t->f();

In addition, 5.2.10/7 seems intended to allow this:

   T1 x;
   T2* y(reinterpret_cast<T2*>(&x);
   T1* x2(reinterpret_cast<T1*>(y);
   assert(&x == x2);

This seems allowable as well, although I'm not sure what it's good for:

   // T is a non-POD type:
   T x;
   char a[sizeof(T)];
   for (int i = 0; i != sizeof(T); ++i)
      a[i] = *(reinterpret_cast<char*>(&x) + i);

The standard, being written in a human language by humans, is
necesssarily imperfect; I think what's going on is that we've hit one
of those imperfections. (For that matter, me being a human also means
that I might be the one with the imperfection.) Perhaps this intention
could be expressed more clearly; perhaps the language you suggested
would be enough to make it clear. On the other hand, it's just as
posssible that this really isn't a problem at all, because the
intention is understood well enough by those who need to (e.g.,
implementors). I'll leave that judgment up to the readers of this
thread (if there are any left, that is ;-) ).

It's been an educational discussion; thanks for not giving up on me.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Sat, 15 Jul 2006 13:06:26 CST Raw View

James Kanze wrote:
> johnchx2@yahoo.com wrote:
>  > Alberto Ganesh Barbati wrote:
>  >> johnchx2@yahoo.com ha scritto:
>  >>> Alberto Ganesh Barbati wrote:
>  >>>> Frederick Gotham ha scritto:
>  >>>>> A char has no trap representations, and so it is safe to
>  >>>>> access the bytes of any object in memory as if it were a
>  >>>>> char, signed char or unsigned char:
>  >>>> Could you please quote the clause in the C++ standard that guarantees
>  >>>> that?
>  >>> 3.9.1/1, which says, "For character types, all bits of the object
>  >>> representation participate in the value representation."
>  >> So what?
>
>  > Well, that's the language that guarantees that char has no
>  > trap representation.
>
> Except that it doesn't.  The fact that all bits participate in
> the object representation doesn't mean that there can't be
> trapping values.  All of the bits in an IEEE float participate
> in the value representation, but that doesn't prevent the
> existance of trapping NaN's.  And the C standard explicitly says
> that a value with all (non-padding) bits 1 can trap in 1's
> complement, as can a value with a sign bit 1 and all other bits
> 0 in 2's complement or signed magnitude.  (In 1's complement or
> signed magnitude, these values would otherwise correspond to a
> negative 0.)
>
I agree.
Actually, reading the paragraph:
"For  character types, all bits of the object
  representation participate in the value representation.  For
unsigned
  character types, all possible bit patterns of the value
representation
  represent numbers.  These requirements do not hold  for  other
types."

For unsigned character types, all bit patterns represent *numbers*...
For signed char, this requirement doesn't hold.
The fact that all bits participate to the value representation only
means that there is no padding bits.

The C99 standard is much more clear about trap representations.
"  3.17.2
1 indeterminate value
  either an unspeci   ed value or a trap representation
  3.17.3
1 unspeci   ed value
  valid value of the relevant type where this International Standard
imposes no
  requirements on which value is chosen in any instance
  NOTE     An unspeci   ed value cannot be a trap representation.

6.2.6 Representations of types
6.2.6.1 General
5 Certain object representations need not represent a value of the
object type. If the stored
  value of an object has such a representation and is read by an lvalue
expression that does
  not have character type, the behavior is unde   ned. If such a
representation is produced
  by a side effect that modi   es all or any part of the object by an
lvalue expression that
  does not have character type, the behavior is unde   ned.41) Such a
representation is called
  a trap representation.
"
Note : Even if character types, for the WG14, can have trap
representations (object representations that doesn't represent any
value), reading a character via an lvalue-to-rvalue conversion doesn't
have UB.

"  6.2.6.2 Integer types
1 For unsigned integer types other than unsigned char, the bits of the
object
  representation shall be divided into two groups: value bits and
padding bits (there need
  not be any of the latter). If there are N value bits, each bit shall
represent a different
  power of 2 between 1 and 2 N    1 , so that objects of that type
shall be capable of
  representing values from 0 to 2 N     1 using a pure binary
representation; this shall be
  known as the value representation. The values of any padding bits are
unspeci   ed.44)
44) Some combinations of padding bits might generate trap
representations, for example, if one padding
    bit is a parity bit. Regardless, no arithmetic operation on valid
values can generate a trap
    representation other than as part of an exceptional condition such
as an over   ow, and this cannot occur
    with unsigned types. All other combinations of padding bits are
alternative object representations of
    the value speci   ed by the value bits.

"6.2.6.2 Integer types
So, here, the WG14 is pretty clear (unfortunately the WG21 is not):
unsigned char have no padding bits.
other unsigned types may have padding bits, and some combination of
padding bits and other bits might be trap representations.
But, for an unsigned type having no padding bits, there is no trap
representation, because:
"If there are N value bits, each bit shall represent a different
  power of 2 between 1 and 2 N    1 , so that objects of that type
shall be capable of
  representing values from 0 to 2 N     1 using a pure binary
representation"

"2 For signed integer types, the bits of the object representation
shall be divided into three
  groups: value bits, padding bits, and the sign bit. There need not be
any padding bits;
  there shall be exactly one sign bit. Each bit that is a value bit
shall have the same value as
  the same bit in the object representation of the corresponding
unsigned type (if there are
  M value bits in the signed type and N in the unsigned type, then M
    N ). If the sign bit
is zero, it shall not affect the resulting value. If the sign bit is
one, the value shall be
modi   ed in one of the following ways:
    the corresponding value with sign bit 0 is negated (sign and
magnitude);
    the sign bit has the value    (2 N ) (two   s complement );
    the sign bit has the value    (2 N     1) (ones    complement).
Which of these applies is implementation-de   ned, as is whether the
value with sign bit 1
and all value bits zero (for the    rst two), or with sign bit and all
value bits 1 (for ones
complement), is a trap representation or a normal value. In the case of
sign and
magnitude and ones    complement, if this representation is a normal
value it is called a
negative zero.
"

You see that (for C99) there can be padding bits for signed types.
The WG21 says that there is no padding bits (knowing that padding bits
may give trap representations) for character types (even signed char).
But, it doesn't mean that there is no trap representation (this point
is different for unsigned types)!
The WG14 explicitly says that there might be implementation-defined
trap representations depending only on the object-representation of
sign bit+value bits!
So, I think that (since the WG21 says nothing about it) there might be
also trap representations for signed char.

Auxiliary note : For the WG14, a structure can't have any trap
representation:
"6 When a value is stored in an object of structure or union type,
including in a member
  object, the bytes of the object representation that correspond to any
padding bytes take
  unspeci   ed values.42) The value of a structure or union object is
never a trap
  representation, even though the value of a member of the structure or
union object may be
  a trap representation.
"
It probably means that assigning a structure which have trap
representations as one of its member, to another structure of the same
type, even if it "reads" the structure, has not UB.

Well, I think that the WG21 should read the C99 standard to improve the
wording of the C++ standard.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Sat, 15 Jul 2006 13:09:52 CST Raw View

mark wrote:

> 3.9/4 [...] For POD types, the value representation is a set of bits in
> the object representation that determines a value, which is one
> discrete element of an implementation-defined set of values.
>
> In other words, if the implementation says its not a value, its not a
> value.

I don't think that proves what you think it proves.  :-)

When I read the words, "...the value representation is a set of
bits...that determines a value," I don't see authorization for a value
representation that does NOT determine a value.

Absent a value representation that doesn't determine a value, either
the "trap value" is inside the set of implementation-defined values, or
there is no value representation for the "trap value."  If the latter,
then signed char can't have a trap representation, since all the bits
of a signed char's object representation participate in the value
representation.  If the former, then the "trap value" has a value
representation, but that conflicts with the requirement that integral
types use a pure binary numeration to represent their values, which
scheme provides no way to encode the concept "trap."

I think that there are certainly clues elsewhere in the standard that
there was an intent to allow signed char to have a trap value, but it
would seem (to my over-pedantic eye) to be disallowed by an
(over-literal) reading of 3.9.1/7.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: James Kanze <kanze.james@neuf.fr>
Date: Sat, 15 Jul 2006 14:39:48 CST Raw View

johnchx2@yahoo.com wrote:
 > mark wrote:

 >> 3.9/4 [...] For POD types, the value representation is a set
 >> of bits in the object representation that determines a value,
 >> which is one discrete element of an implementation-defined
 >> set of values.

 >> In other words, if the implementation says its not a value,
 >> its not a value.

 > I don't think that proves what you think it proves.  :-)

 > When I read the words, "...the value representation is a set
 > of bits...that determines a value," I don't see authorization
 > for a value representation that does NOT determine a value.

But the standard does allow "trapping values".  So the fact that
all of the bits are used to determine a value doesn't mean that
that value cannot trap.

 > Absent a value representation that doesn't determine a value,
 > either the "trap value" is inside the set of
 > implementation-defined values, or there is no value
 > representation for the "trap value."  If the latter, then
 > signed char can't have a trap representation, since all the
 > bits of a signed char's object representation participate in
 > the value representation.  If the former, then the "trap
 > value" has a value representation, but that conflicts with the
 > requirement that integral types use a pure binary numeration
 > to represent their values, which scheme provides no way to
 > encode the concept "trap."

The C standard gives some explicit counter examples: in 1's
complement, it is implementation defined whether all bits set is
a trapping value, or a negative 0.  Even in 2's complement, it
is implementation defined whether the value with the sign bit
set, and all other bits reset, traps, or represents a negative
value whose positive value cannot be represented.

 > I think that there are certainly clues elsewhere in the
 > standard that there was an intent to allow signed char to have
 > a trap value, but it would seem (to my over-pedantic eye) to
 > be disallowed by an (over-literal) reading of 3.9.1/7.

I think you're reading too much into "value representation" and
"pure binary numeration".  In fact, the standard is quite
explicit about it (   3.9.1/1): "For unsigned character types, all
possible bit patterns of the value representation represent
numbers.  THESE REQUIREMENTS DO NOT HOLD FOR OTHER TYPES."

--
James Kanze                                    kanze.james@neuf.fr
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Manfred von Willich" <manfred@techniroot.co.za>
Date: Sun, 16 Jul 2006 08:42:54 CST Raw View

SuperKoko wrote:
> Yes. I think that char and unsigned char have these special properties
> for historical reasons : compatibility with existing pre-standard code.
>
> > Would it not make sense to to introduce a new type, e.g. __octet (or
> > __mem etc.), that has as its sole purpose copying and
> > equality-comparing POD data?
> I have already thought of that, though *byte* would be a much more
> adequate name.
> In human language, "octet" means exactly 8 bits, while byte can have
> less or more.
> For the C and C++ standards, a byte have an implementation-defined
> number of bits.
> So, it should be __byte.
>
Yes - I was not focussing on the size or actual reserved word choice -
though perhaps I should have used a different example keyword.

> Of course for backward compatibility reasons, we can't remove the
> actual semantics of char and unsigned char, otherwise, and incredibely
> huge amount of code would break.
>
> > [...]
> >
> I don't know if it is worth the complication... Since it is quite
> low-level, only a few programmers need to use that. In that case,
> unsigned char is not that bad.
> It doesn't really bring anything "new" in the language, so, most people
> would prefer to use the old "unsigned char" to be more compatible with
> older compilers.
>
The combination of having to retain this backward compatibility and the
addition of a new type (hence complication) will almost certainly doom
this idea (and similarly many others) to never become actuality.  I
guess I am trying to influence the development of the language into a
less hamstrung direction by increasing awareness of such issues.  This
is really silliness on my part - most of the people in a forum such as
this are fully aware of these issues, including the reasons for them to
remain as they are.  This is not intended to detract from the areas
that have changed and will change.

However, I (and I suppose many others) find the historically imposed
complexities and constraints of programming in C++ tedious and
time-consuming (most of my "learning" time is spent on discovering
precisely these oddities, and how to express fully something in C++
that should be straightforward).

Manfred

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Tue, 11 Jul 2006 16:54:19 GMT Raw View

SuperKoko posted:


> Not *signed char* ! The standard says it for *char* and *unsigned char*
> only!


3.9.1.1

A char, a signed char, and an unsigned char occupy the same amount of=20
storage and have the same alignment requirements; that is, they have the=20
same object representation. For character types, all bits of the object=20
representation participate in the value representation.


> Anyway, I agree that it is safe to access the bytes of an object via a
> char* or an unsigned char*, if these char pointers are really pointing
> to a byte of the representation of the object.
> Of course, the value of the char read is unspecified, but uniquely
> depend on the bits of the byte.
> So, it is safe to copy a POD object via a char pointer, from one memory
> location to another (if alignment requirements are correct... For
> instance, to assign one POD object to another POD object).
>=20
> Now, the real question is : Does reinterpret_cast produce a *valid*
> pointer pointing at the *first byte* of the object?


I think we have to use common-sense here. Also, the following tells us th=
at=20
any object's address can be stored in a char*.


5.2.10.7

A pointer to an object can be explicitly converted to a pointer to an=20
object of different type. Except that converting an rvalue of type =93poi=
nter=20
to T1=94 to the type =93pointer to T2=94 (where T1 and T2 are object type=
s and=20
where the alignment requirements of T2 are no stricter than those of T1)=20
and back to its original type yields the original pointer value, the resu=
lt=20
of such a pointer conversion is unspecified.


--=20

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kuyper@wizard.net
Date: Tue, 11 Jul 2006 14:49:17 CST Raw View

SuperKoko wrote:
.
> IIRC the C99 standard has requirements on signed char representations,
> but not the C++ standard.

The C99 standard says that signed types can have trap representations,
which are defined as not representing a valid value (they are not,
oddly enough, defined by the fact that they normally "trap" when
accessed). However, it says that the behavior of code which accesses an
object containing a trap representation is undefined, only if the it is
accessed through an lvalue that is not of character type. Therefore, if
a signed char object contains a trap representation, accessing it
retrieves an unspecified value, but the behavior is not undefined.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Tue, 11 Jul 2006 22:05:13 CST Raw View

Frederick Gotham wrote:
> Bob Bell posted:
>
> > I don't see how to get past these difficulties to allow Frederick
> > Gotham's code to be portable.
>
> I see this part of the discussion as merely a formality, because I
> instinctively know that there's nothing wrong with the code.

The problem with instincts is that they're not the standard.

[snip some quotes about the C++ memory model]

None of these quotes state that it's OK to *read* the bytes of an
object.

Here's another quote:

4.1/1

   An lvalue (3.10) of a non-function, non-array type T can be
   converted to an rvalue. If T is an incomplete type, a program
   that necessitates this conversion is ill-formed. If the object to
   which the lvalue refers is not an object of type T and is not an
   object of a type derived from T, or if the object is uninitialized,
   a program that necessitates this conversion has undefined
   behavior. If T is a non-class type, the type of the rvalue is the
   cv-unqualified version of T. Otherwise, the type of the rvalue is
   T.

Here's a snippet from your program:

    do *p_index++ = *p++;
    while( p != p_over );

In this context, the "*p++" converts an unsigned char lvalue to an
unsigned char rvalue. According to 4.1/1 above, that's undefined
behavior, because the object p points to is not an unsigned char.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Tue, 11 Jul 2006 22:05:03 CST Raw View

johnchx2@yahoo.com wrote:
> Bob Bell wrote:
> > johnchx2@yahoo.com wrote:
> > > Bob Bell wrote:
> > >
> > > > Unless there's some other clause that defines what happens when you
> > > > deference the T2*, it sounds like undefined behavior to me.
> > >
> > > 5.2.10/10 says "The result is an lvalue that refers to the same object
> > > as the source lvalue, but with a different type."  3.10/15 governs
> > > whether accessing an object of one type through an lvalue of another
> > > type engenders undefined behavior.
> >
> > Interesting. Are you aware of any clause anywhere that states what the
> > result of accessing an object through a char lvalue would be?
>
> I'm not sure.  Perhaps if you can point to the clause in the standard
> that tells us the result of accessing an object of type T through an
> lvalue of type T (or const T, or an accessible base class of T) is, I
> can find the corresponding language for accesing an object of type T
> through an lvalue of type char.  :-)

4.1 Lvalue-to-rvalue conversion.

> My point is simply that I'm not even sure what such a clause might say
> if there were one.

It could say something like "accessing an object through a char lvalue
returns an implementation-defined value".

> > In any case, I'm not convinced; 5.2.10/10 equates reinterpret_cast of a
> > reference with the result of dereferencing an equivalent pointer, but
> > nowhere is the result of dereferencing such a pointer defined.
>
> It says that the result of a reinterpret_cast to reference type is the
> same as the result of dereferencing the result of a reinterpret_cast to
> the corresponding pointer type.  It also says that the result of the
> reinterpret_cast to reference type is an lvalue designating the
> original object.  That looks like a definition (of *both*) to me.

I understand the logic, and in all honesty, I keep going back and forth
about it.

> My reasoning is that if the result of A is the same as the result of B,
> and if the result of A is X, then we can conclude that the result of B
> is X.

Unless A says that the result can't be X, in which case we have a
contradiction. 5.2.10/7 makes it clear that the only thing that can be
done with such a pointer is to convert it back. The language is "Except
that [converting the pointer back might work], the result of such a
pointer conversion is unspecified." It is clearly listing the one and
only thing that can be done with such a pointer.

I think that if it were legal to use such a pointer to initialize a
reference, then the sentence would instead read something like "Except
that [converting the pointer back might work] and [dereferencing it in
order to initialize a reference would work], the result of such a
pointer conversion is unspecified."

> The only alternative I see is to assert that *all* reinterpret_casts to
> reference type exhibit undefined behavior.  Is that what you mean?

It seems clear to me that the intention of the standard is for
reinterpret_cast of references to have some meaning; the wording of
5.2.10/7 and 5.2.10/10 make it unclear to me what that meaning is.

> I suspect that the language in 5.2.10/7 is causing the confusion.  It
> says, "...the result of such a pointer conversion is unspecified."  And
> I'm not contradicting that: the *value* of the pointer is not
> specified.  That is, the bit-pattern of the pointer is not specified.

But that isn't what it says. It uses the word "result", not "value."

> (In particular, it need not be the same bit-pattern as the original
> pointer, even if it denotes the same machine address; type-tagged
> pointer types are legal.)  But that doesn't prevent a later paragraph
> from defining the result of dereferencing the pointer.  And that's what
> 5.2.10/10 does.

I would agree with this interpretation if the sentence in question from
5.2.10/7 didn't begin by describing the only allowed use of a
reinterpret_case pointer.

I agree there is a problem, but it seems to me the easiest way to solve
it is to simply remove the "That is..." sentence from 5.2.10/10. Since
this sentence is attempting to clarify the sentence before it, it
really isn't necessary; without it, 5.2.10/10 says everything it needs
to say. The only thing this sentence adds is confusion over whether
there's a case where it's OK to dereference a pointer obtained from
reinterpret_cast.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 12 Jul 2006 10:20:09 CST Raw View

Frederick Gotham wrote:
> SuperKoko posted:

> > Not *signed char* ! The standard says it for *char* and
> > *unsigned char* only!

> 3.9.1.1

> A char, a signed char, and an unsigned char occupy the same
> amount of storage and have the same alignment requirements;
> that is, they have the same object representation. For
> character types, all bits of the object representation
> participate in the value representation.

In sum, there cannot be any padding bits.  The C99 standard
explicitly says that all signed integral types (including thus
signed char) can have trapping representations, based uniquely
on the sign and value bits (and thus, even if "all bits [...]
participate in the value representation").  The C++ standard, as
far as I can see, doesn't forbid this either.

Historically, the C++ standard is based on C90; as far as I
know, the only intentional difference relevant here is that C++
extends the non-trapping guarantee to char, and not just to
unsigned char.  Other than that, I'm pretty sure that the intent
was to be the same as C.  And the intent of C99, here, was
simply to be clearer, without really changing anything
fundamental.  Given that, I'd say that there is a definite
intention that signed char can have trapping representations.
And that the absense of anything forbidding such representations
is intentional, and expresses this intent.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Wed, 12 Jul 2006 19:23:01 GMT Raw View

kanze posted:


> In sum, there cannot be any padding bits.  The C99 standard
> explicitly says that all signed integral types (including thus
> signed char) can have trapping representations, based uniquely
> on the sign and value bits (and thus, even if "all bits [...]
> participate in the value representation").  The C++ standard, as
> far as I can see, doesn't forbid this either.


Okay I'm slightly confused here.

[ Firstly let's assume we're working with 8-Bit bytes. ]

C provides the following guarantees:

    (1) unsigned char shall contain no padding.
    (2) unsigned char shall have no "bad values".

C++ provides the following guarantees:

    (1) unsigned char shall contain no padding.
    (2) unsigned char shall have no "bad values".

    (3) signed char shall contain no padding.

    (4) char shall contain no padding.


A particular excerpt from the Standard implies to me that a char in C++ can
in fact have a "bad value":

3.9.1.1
    For unsigned character types, all possible bit patterns of the value
representation represent numbers. These requirements do not hold for other
types.

Does this mean that if you want to access an object in C++ as if it were an
array of bytes, then you MUST use unsigned char rather than signed char or
plain char, because the latter two can have a "bad value"?

I've seen LOADS of code that presumes a char has no bad values.

(Just for argument sake, a "bad value" for a char could be negative zero:

    1000 0000


Forgive me if I'm mistaken, but I vaguely remember having read something to
the effect of "the value representation bits shall no take place in
trapping". If there were such a guarantee, then everything would be fine
and dandy.


--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Wed, 12 Jul 2006 14:42:55 CST Raw View

kanze wrote:

> Given that, I'd say that there is a definite
> intention that signed char can have trapping representations.

That would make sense, but I'm not sure how to square it with 3.9.1/7,
which says, "The representations for integral types shall define values
by use of a pure binary numeration system."

How can you have a trap representation when all the object
representation bits participate in the value representation, and the
value representation is a pure binary numeration?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Manfred von Willich" <manfred@techniroot.co.za>
Date: Wed, 12 Jul 2006 14:44:32 CST Raw View

Okay, perhaps no-one will take this seriously, but here goes: a
suggestion for cleaning up the char type as used for accessing raw
memory.

As can be seen from this thread, the types char and signed/unsigned
have been coopted to do the work of representing memory for the purpose
of copying PODs etc., and we can see that due to the natural semantics
attached to char types, a lot of confusion is occurring.

Would it not make sense to to introduce a new type, e.g. __octet (or
__mem etc.), that has as its sole purpose copying and
equality-comparing POD data?  Only operators =, ==, !=, new, address-of
may be used with it (no inequality or arithmetic operators).  The type
__octet* would allow standard pointer arithmetic.  All pointers to POD
could be implicitly converted to __octet*, and __octet* could be
implicitly converted to void*.  New functions replacing memcpy and
memcmp should use __octet* and __octet const* as appropriate.  Also,
static_cast would allow casting of void* to __octet* and any other
pointer, and __octet* to POD types, but not __octet* to a pointer to a
non-POD type.  (When one is dealing at this low level, leading
underscores should not be seen as ugly - after all, we use it only to
dig into the guts of the hardware, and we should be reminded of this.)

Advantages include:
 * Introducing the new type has no impact on code that does not make
reference to it.
 * The semantics of char and its variations can be decoupled from their
mem-like uses.
 * The compiler can keep track of PODness, providing additional
type-safety.
 * Fewer uses of reinterpret_cast are needed - bytes of a POD can be
accessed using an implicit cast, hence safer code.
 * PODness can be directly compile-time checked in a template (by
casting a pointer to the object implicitly to __octet*).

Manfred

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kuyper@wizard.net
Date: Wed, 12 Jul 2006 16:20:37 CST Raw View

johnchx2@yahoo.com wrote:
> kanze wrote:
>
> > Given that, I'd say that there is a definite
> > intention that signed char can have trapping representations.
>
> That would make sense, but I'm not sure how to square it with 3.9.1/7,
> which says, "The representations for integral types shall define values
> by use of a pure binary numeration system."
>
> How can you have a trap representation when all the object
> representation bits participate in the value representation, and the
> value representation is a pure binary numeration?

If every bit participates in the value representation, that doesn't
imply that every possible bit pattern is a valid one. The existence of
bit patterns that don't represent any number doesn't change the fact
that the ones that do represente a number are interpreted as binary
numbers.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "mark" <markw65@gmail.com>
Date: Wed, 12 Jul 2006 16:22:24 CST Raw View

johnchx2@yahoo.com wrote:
> kanze wrote:
>
> > Given that, I'd say that there is a definite
> > intention that signed char can have trapping representations.
>
> That would make sense, but I'm not sure how to square it with 3.9.1/7,
> which says, "The representations for integral types shall define values
> by use of a pure binary numeration system."
>
> How can you have a trap representation when all the object
> representation bits participate in the value representation, and the
> value representation is a pure binary numeration?

Easy.

For example, with an n-bit representation, and fewer than 2^(n-1)
trap-representations then clearly every bit participates, and we can
trivially arrange that every value has a pure binary representation.

Of course, depending on which trap-representations we pick, we could
violate other constraints... but if we choose sign-bit=1, value-bits=0
as the (only) trap representation, everything would be fine.

Mark Williams

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kuyper@wizard.net
Date: Wed, 12 Jul 2006 16:41:04 CST Raw View

Frederick Gotham wrote:
.
> Does this mean that if you want to access an object in C++ as if it were an
> array of bytes, then you MUST use unsigned char rather than signed char or
> plain char, because the latter two can have a "bad value"?

You've got it.

> I've seen LOADS of code that presumes a char has no bad values.

There's a lot of bad code out there, and a lot of code which correctly
relies upon platform specific guarantees of 2's complement arithmetic.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Wed, 12 Jul 2006 17:25:23 CST Raw View

Bob Bell wrote:
>
> 4.1 Lvalue-to-rvalue conversion.
>

Oh, that!  :-)

Just to clarify, 4.1 gets into the question of what you can do with the
lvalue once you've got it.  I thought you were arguing that merely
forming the lvalue (dereferencing the pointer) yielded undefined
behavior.  Which, under your reading of 5.2.10/7, it should.

My difficulty with that reading is that, strictly speaking, it makes
any use of reinterpret_cast (other than as the argument to the inverse
reinterpret cast) undefined.  For example:

  char* p2 = reinterpret_cast<char*>(p1);

yields undefined behavior, because nothing guarantees that the result
of the reinterpret_cast can be stored to a pointer variable.

I don't think anyone believes that was the intent, so 5.2.10/7 cannot
mean what it appears -- on literal reading -- to say.  Since the
literal reading is a dead letter, the live question is, what *can* you
do with the resulting pointer value.  And 5.2.10/10 would seem to
answer that question.

I do agree that the language in 5.2.10/7 is misleading and could
certainly use some cleaning up.

Now, to look at the question of what can be done with the resulting
lvalue...there's an interesting open DR on 4.1:

  http://www2.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#240

The DR is interesting both for what it says and for what it doesn't.
It says (in part):

  There's no exception made for unsigned char types.
  The wording in 3.9.1  basic.fundamental was carefully crafted
  to allow use of unsigned char to access uninitialized data so that
  memcpy and such could be written in C++ without undefined
  behavior, but this statement undermines that intent.

which confirms the intent to make it possible to read more or less
arbitrary bytes in memory via an lvalue of type unsigned char, just as
memcpy() would.  The interesting omission is that the DR is entirely
concerned with the issue of reading *uninitialized* memory; it makes no
mention of the type congruence language ("If the object to which the
lvalue refers is not an object of type T...").

But there's still no way to implement memcpy() in C++ if we're not
allowed to read initialized memory of non-unsigned-char type through
lvalues of type unsigned char.  Why no mention of this in the DR?  It
would appear that there's a sort of unspoken assumption that this *is*
permitted under the current rules.  But why?

I suspect that the clue is 3.9/4, which says, "The object
representation of an object of type T is the sequence of N unsigned
char objects taken up by the object of type T, where N equals
sizeof(T)."  This would seem to imply that the underlying bytes of an
object really are -- and may legally be accessed as -- unsigned char
objects.

Now, there's a certain slipperiness in the standard about whether it's
legal to access the underlying bytes of an object via an lvalue of type
char.  The intent seems to be that you can (i.e. all the rules and
special exceptions that appear intended to support this use encompass
both char and unsigned char), but I'm not sure that the last link in
the chain is actually there.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Wed, 12 Jul 2006 18:50:08 CST Raw View

kuyper@wizard.net wrote:
> johnchx2@yahoo.com wrote:
>
> > How can you have a trap representation when all the object
> > representation bits participate in the value representation, and the
> > value representation is a pure binary numeration?
>
> If every bit participates in the value representation, that doesn't
> imply that every possible bit pattern is a valid one. The existence of
> bit patterns that don't represent any number doesn't change the fact
> that the ones that do represente a number are interpreted as binary
> numbers.

Perhaps I'm being over-literal, but I don't see any exception to the
rule that integral types define their values by use of a pure binary
numeration system.  It doesn't say "...except for trap or invalid
values."

But at the same time, 3.9.1/1 does make a point to restrict the
requirement that all bit patterns represent numbers to unsigned char.
So maybe the intent is clearer than I thought.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kuyper@wizard.net
Date: Thu, 13 Jul 2006 01:27:14 CST Raw View

johnchx2@yahoo.com wrote:
> kuyper@wizard.net wrote:
> > johnchx2@yahoo.com wrote:
.
> Perhaps I'm being over-literal, but I don't see any exception to the
> rule that integral types define their values by use of a pure binary
> numeration system.  It doesn't say "...except for trap or invalid
> values."

No, you're being under-literal. The relevant section says that "The
representations of integral types shall define values by use of a pur
binary numeration system."  Bit patterns which don't repesent anything
aren't covered by that statement.

The stereotypical example is a signed type, using 1's complement or
sign/magnitude representation. These representations have two different
ways of representing 0. The C99 standard is much clearer on this point
than the C++ one, but I susped that the intent was the same: that an
implementation which treats one of those two representations as a trap
representation would be a legal one.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Thu, 13 Jul 2006 09:29:04 CST Raw View

Frederick Gotham wrote:
> SuperKoko posted:
>
>
> > Not *signed char* ! The standard says it for *char* and *unsigned char*
> > only!
>
>
> 3.9.1.1
>
> A char, a signed char, and an unsigned char occupy the same amount of
> storage and have the same alignment requirements; that is, they have the
> same object representation. For character types, all bits of the object
> representation participate in the value representation.
>
It is not sufficient!
In fact, 3.10-25 gives the answer.

"
15If  a program attempts to access the stored value of an object
through
  an lvalue of other than one of the following  types  the  behavior
is
  undefined25):
  _________________________
  25) The intent of this list is to specify those circumstances in
which

  --the dynamic type of the object,

  --a cv-qualified version of the dynamic type of the object,

  --a type that is the signed or  unsigned  type  corresponding  to
the
    dynamic type of the object,

  --a  type  that  is the signed or unsigned type corresponding to a
cv-
    qualified version of the dynamic type of the object,

  --an aggregate or union type that includes one of  the
aforementioned
    types  among its members (including, recursively, a member of a
sub-
    aggregate or contained union),

  --a type that is a (possibly cv-qualified)  base  class  type  of
the
    dynamic type of the object,

  --a char or unsigned char type.

  _________________________
  an object may or may not be aliased.
"
Accessing any value through a char lvalue or unsigned char lvalue is
legal, but accessing an object whose dynamic type is not signed char
nor unsigned char through a signed char lvalue has undefined behavior.
I don't know why conv.lval doesn't include this list... I have always
seen that as a defect in the standard.

Similarly, even on platforms where long and int have the same size,
representation and representation value, and even when it is well
document, accessing an int via a long lvalue has UB.
With GCC 4.0.2 on i386 GNU/Linux:
#include <iostream>

// on this platform int and long have the same size & value
representation.
void assign(void** pp,void* p) {*pp=p;}
int main() {
 typedef int base_type;
 typedef long alias_type;

 base_type c=0;
 void* first_p=&c;
 void* other_p;
 assign(&other_p,first_p);
 alias_type* ap=static_cast<alias_type*>(other_p);
 c=1;
 *ap=2;

 std::cout << c << '\n';
}

Compiled with:

g++ --pedantic-errors -Wall -Wextra -O2 alias_int.cpp

The output is 1

With, -O1, the output is 2


> > Now, the real question is : Does reinterpret_cast produce a *valid*
> > pointer pointing at the *first byte* of the object?
>
>
> I think we have to use common-sense here. Also, the following tells us that
> any object's address can be stored in a char*.
>
>
> 5.2.10.7
>
> A pointer to an object can be explicitly converted to a pointer to an
> object of different type. Except that converting an rvalue of type "pointer
> to T1" to the type "pointer to T2" (where T1 and T2 are object types and
> where the alignment requirements of T2 are no stricter than those of T1)
> and back to its original type yields the original pointer value, the result
> of such a pointer conversion is unspecified.
>
>
Here common sense doesn't help... 5.2.10-7 doesn't say anything on the
pointer produced by reinterpret_cast... If there were no 5.2.10-10 it
would mean that behavior is undefined.
But, I admit (as I said in a previous post) that 5.2.10-10 implies that
using the reinterpet_cast'ed pointer is correct, even if the wording of
the WG21 is very confusing.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Fri, 7 Jul 2006 01:20:43 GMT Raw View

Bob Bell wrote:
> Frederick Gotham wrote:
>
>>Bob Bell posted:
>>
>>
>>
>>>The way you use reinterpret_cast is undefined behavior; the code is
>>>not portable.
>>
>>
>>It is indeed portable, because any object's address can reliably be stored
>>in a char*.

     Actually, no.  A pointer to a function cannot be stored in a "char *".
Nor can a pointer to a function member.

    John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Fri, 7 Jul 2006 01:23:59 GMT Raw View

Alberto Ganesh Barbati posted:

> Frederick Gotham ha scritto:
>> Bob Bell posted:
>>
>>> I was talking about the fact that after reinterpret_cast'ing the
>>> pointer, you then dereference it.
>>
>> A char has no trap representations, and so it is safe to access the
>> bytes of any object in memory as if it were a char, signed char or
>> unsigned char:
>>
>
> Could you please quote the clause in the C++ standard that guarantees
> that?


I was working off common-sense more than anything. Any object of any type
is stored in memory as a finite, fixed-length sequence of bytes, whereby
a byte consists of CHAR_BIT bits.

If the memory is ours to access, then there is no reason why we can't
access it in whatever way we please. Take a look at the following snippet
for instance:

    #include <cstdlib>

    int main()
    {
        void * const p = std::malloc(512);

        if(!p) return -1;

        int *p_int = (int*)p;
        double *p_double = (double*)p;
        void **p_voidptr = (void**)p;

        *p_int = 7;
        *p_double = 42.3;
        *p_voidptr = p_double + 3;

        p_int += 54;

        p_double += 87;

        p_voidptr += 123;


        *p_int = 4235;
        *p_double = 35.2352;
        *p_voidptr = &p_voidptr;

        free(p);
    }


I don't know if the Standard spells it out in black and white that you
can access memory however you like, but I don't think it has to.



> PS: C-style casts are of no help, as they are defined in terms of
> const_cast/static_cast/reinterpret_cast (5.4/5).


The new-style casts take up too much horizontal screen space.


--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Thu, 6 Jul 2006 20:23:39 CST Raw View

Alberto Ganesh Barbati wrote:
> Frederick Gotham ha scritto:
> > A char has no trap representations, and so it is safe to access the bytes
> > of any object in memory as if it were a char, signed char or unsigned
> > char:
> >
>
> Could you please quote the clause in the C++ standard that guarantees
> that?

3.9.1/1, which says, "For character types, all bits of the object
representation participate in the value representation."

In addition, 3.10/15 (7th bullet) allows access to the value of an
object of any type through an lvalue of type char or unsigned char.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Thu, 6 Jul 2006 20:28:49 CST Raw View

Bob Bell wrote:

> I'm glad to hear that that's the case on your platform of choice.
> Nevertheless, reinterpret_cast'ing a pointer to a different type of
> pointer and then dereferencing it is undefined behavior. See section
> 5.2.10 in the standard.

I did, and 5.2.10/10 says exactly the opposite.  In particular, it
says:

  That is, a reference cast reinterpret_cast<T&>(x) has the same
  effect as the conversion *reinterpret_cast<T*>(&x) with the
  built-in & and * operators.  The result is an lvalue that refers
  to the same object as the source lvalue, but with a different
  type.

If reinterpret_cast<T&>(x) has a well-defined effect, as described
here, and this is the "same effect" as reinterpret_cast<T*>(&x), I
don't see why we should think that *reinterpret_cast<T*>(&x) yields
undefined behavior.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: James Dennett <jdennett@acm.org>
Date: Fri, 7 Jul 2006 09:17:59 CST Raw View

John Nagle wrote:
> Bob Bell wrote:
>> Frederick Gotham wrote:
>>
>>> Bob Bell posted:
>>>
>>>
>>>
>>>> The way you use reinterpret_cast is undefined behavior; the code is
>>>> not portable.
>>>
>>>
>>> It is indeed portable, because any object's address can reliably be
>>> stored
>>> in a char*.
>
>     Actually, no.  A pointer to a function cannot be stored in a "char *".
> Nor can a pointer to a function member.

Functions are not "objects" in the language of the C++ standard.

-- James

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Fri, 7 Jul 2006 14:15:44 GMT Raw View

John Nagle posted:


>>>It is indeed portable, because any object's address can reliably be stored
>>>in a char*.
>
>      Actually, no.  A pointer to a function cannot be stored in a "char *".
> Nor can a pointer to a function member.


While we're being pedantic, I wonder if you can call a function an
"object"... ?

Can you store the address of a function in a void*?

I believe the Standard says something about char* and void* being the same...
?


--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: alfps@start.no ("Alf P. Steinbach")
Date: Fri, 7 Jul 2006 14:18:08 GMT Raw View

* John Nagle:
> Bob Bell wrote:
>> Frederick Gotham wrote:
>>
>>> Bob Bell posted:
>>>
>>>> The way you use reinterpret_cast is undefined behavior; the code is
>>>> not portable.
>>>
>>> It is indeed portable, because any object's address can reliably be
>>> stored in a char*.
>
>     Actually, no.  A pointer to a function cannot be stored in a "char *".
> Nor can a pointer to a function member.

A function isn't an object.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Fri, 7 Jul 2006 20:58:00 CST Raw View

johnchx2@yahoo.com wrote:
> Bob Bell wrote:
>
> > I'm glad to hear that that's the case on your platform of choice.
> > Nevertheless, reinterpret_cast'ing a pointer to a different type of
> > pointer and then dereferencing it is undefined behavior. See section
> > 5.2.10 in the standard.
>
> I did, and 5.2.10/10 says exactly the opposite.  In particular, it
> says:
>
>   That is, a reference cast reinterpret_cast<T&>(x) has the same
>   effect as the conversion *reinterpret_cast<T*>(&x) with the
>   built-in & and * operators.  The result is an lvalue that refers
>   to the same object as the source lvalue, but with a different
>   type.
>
> If reinterpret_cast<T&>(x) has a well-defined effect, as described
> here, and this is the "same effect" as reinterpret_cast<T*>(&x), I
> don't see why we should think that *reinterpret_cast<T*>(&x) yields
> undefined behavior.

5.2.10/10 doesn't say that using the new reference is OK, it says you
get the same result you would get if you were casting and then
dereferencing the equivalent pointers.

5.2.10/7 says:

   A pointer to an object can be explicitly converted to a pointer to
   an object of different type. Except that converting an rvalue of
   type "pointer to T1" to the type "pointer to T2" (where T1 and T2
   are object types and where the alignment requirements of T2 are
   no stricter than those of T1) and back to its original type yields
   the original pointer value, the result of such a pointer conversion
   is unspecified.

The way I read this, it says that if you reinterpret_cast a T1* to a
T2*, then depending on what T1 and T2 are you might be able to
reinterpret_cast it back to T1* and get the original pointer value, but
that's it. Notably, it doesn't define what happens if you dereference
the T2*. 1.3.12 says (among other things), "Undefined behavior may also
be expected when this International Standard omits the description of
any explicit definition of behavior."

Unless there's some other clause that defines what happens when you
deference the T2*, it sounds like undefined behavior to me.

The way I read 5.2.10/10, it's designed to allow round-trip casting, as
in:

struct T1 {
   char x;
};

struct T2 {
   double y;
};

void F()
{
   T1 t1;

   T2& t2(reinterpret_cast<T2&>(t1));

   t2.y = 0.0;  // undefined behavior.

   T1& t3(reinterpret_cast<T1&>(t2));

   t3.x = 0;  // OK, stores 0 in t1.x
}

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Sat, 8 Jul 2006 17:47:31 CST Raw View

Bob Bell wrote:

> Unless there's some other clause that defines what happens when you
> deference the T2*, it sounds like undefined behavior to me.

5.2.10/10 says "The result is an lvalue that refers to the same object
as the source lvalue, but with a different type."  3.10/15 governs
whether accessing an object of one type through an lvalue of another
type engenders undefined behavior.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Mon, 10 Jul 2006 07:58:50 GMT Raw View

johnchx2@yahoo.com ha scritto:
> Alberto Ganesh Barbati wrote:
>> Frederick Gotham ha scritto:
>>> A char has no trap representations, and so it is safe to access the bytes
>>> of any object in memory as if it were a char, signed char or unsigned
>>> char:
>>>
>> Could you please quote the clause in the C++ standard that guarantees
>> that?
>
> 3.9.1/1, which says, "For character types, all bits of the object
> representation participate in the value representation."

So what?

> In addition, 3.10/15 (7th bullet) allows access to the value of an
> object of any type through an lvalue of type char or unsigned char.

It doesn't say so. It says that any access other than the specified list
shall produce undefined behaviour, but, strictly speaking, it doesn't
say what happens when one requirement of the list is satisfied.

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Mon, 10 Jul 2006 11:30:44 CST Raw View

Alberto Ganesh Barbati wrote:
> johnchx2@yahoo.com ha scritto:
> > Alberto Ganesh Barbati wrote:
> >> Frederick Gotham ha scritto:
> >>> A char has no trap representations, and so it is safe to access the bytes
> >>> of any object in memory as if it were a char, signed char or unsigned
> >>> char:
> >>>
> >> Could you please quote the clause in the C++ standard that guarantees
> >> that?
> >
> > 3.9.1/1, which says, "For character types, all bits of the object
> > representation participate in the value representation."
>
> So what?

Well, that's the language that guarantees that char has no trap
representation.  Which whas part of the claim for which you requested
the supporting citatations.

>
> > In addition, 3.10/15 (7th bullet) allows access to the value of an
> > object of any type through an lvalue of type char or unsigned char.
>
> It doesn't say so. It says that any access other than the specified list
> shall produce undefined behaviour, but, strictly speaking, it doesn't
> say what happens when one requirement of the list is satisfied.

True enough, but the strict reading begs the question of intent.  What
would be the point of a clause which says nothing but, "This undefined
operation doesn't yield undefined behavior under this clause?"  In
other words, why make an exception if the exception is already UB?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Mon, 10 Jul 2006 16:39:07 CST Raw View

johnchx2@yahoo.com wrote:
> Bob Bell wrote:
>
> > Unless there's some other clause that defines what happens when you
> > deference the T2*, it sounds like undefined behavior to me.
>
> 5.2.10/10 says "The result is an lvalue that refers to the same object
> as the source lvalue, but with a different type."  3.10/15 governs
> whether accessing an object of one type through an lvalue of another
> type engenders undefined behavior.

Interesting. Are you aware of any clause anywhere that states what the
result of accessing an object through a char lvalue would be?

In any case, I'm not convinced; 5.2.10/10 equates reinterpret_cast of a
reference with the result of dereferencing an equivalent pointer, but
nowhere is the result of dereferencing such a pointer defined.
Specifically, it doesn't state that dereferencing a T2* obtained by
reinterpret_cast yields a T2 lvalue, so I don't see how 3.10/15 comes
in to play. Further, it seems obvious to me that the intention of
5.2.10/7 is that pointers that have been cast using reinterpret_cast
cannot be dereferenced without invoking undefined behavior. I don't see
how to get past these difficulties to allow Frederick Gotham's code to
be portable.

It's posssible that I'm still missing something, though.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Mon, 10 Jul 2006 23:52:36 GMT Raw View

Bob Bell posted:

> I don't see how to get past these difficulties to allow Frederick
> Gotham's code to be portable.


I see this part of the discussion as merely a formality, because I
instinctively know that there's nothing wrong with the code. Here's a few
quotes from the Standard that might sway your way of thinking.


1.7.1

The fundamental storage unit in the C++ memory model is the byte. A byte is
at least large enough to contain any member of the basic execution character
set and is composed of a contiguous sequence of bits, the number of which is
implementation-defined. The least significant bit is called the low-order
bit; the most significant bit is called the high-order bit. The memory
available to a C++ program consists of one or more sequences of contiguous
bytes. Every byte has a unique address.

1.8.5

Unless it is a bit-field (9.6), a most derived object shall have a non-zero
size and shall occupy one or more bytes of storage.

3.9.5

Object types have alignment requirements. The alignment of a complete object
type is an implementation-defined integer value representing a number of
bytes; an object is allocated at an address that meets the alignment
requirements of its object type.

5.3.3.2

When sizof is applied to a class, the result is the number of bytes in an
object of that class including any padding required for placing objects of
that type in an array.

3.9.2.3

A valid value of an object pointer type represents either the address of a
byte in memory or a null pointer.


Even before C++, people have been playing around with memory as if it were
"simply bytes" in C. Why? Because memory is just bytes!


--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: johnchx2@yahoo.com
Date: Mon, 10 Jul 2006 18:52:07 CST Raw View

Bob Bell wrote:
> johnchx2@yahoo.com wrote:
> > Bob Bell wrote:
> >
> > > Unless there's some other clause that defines what happens when you
> > > deference the T2*, it sounds like undefined behavior to me.
> >
> > 5.2.10/10 says "The result is an lvalue that refers to the same object
> > as the source lvalue, but with a different type."  3.10/15 governs
> > whether accessing an object of one type through an lvalue of another
> > type engenders undefined behavior.
>
> Interesting. Are you aware of any clause anywhere that states what the
> result of accessing an object through a char lvalue would be?

I'm not sure.  Perhaps if you can point to the clause in the standard
that tells us the result of accessing an object of type T through an
lvalue of type T (or const T, or an accessible base class of T) is, I
can find the corresponding language for accesing an object of type T
through an lvalue of type char.  :-)

My point is simply that I'm not even sure what such a clause might say
if there were one.

> In any case, I'm not convinced; 5.2.10/10 equates reinterpret_cast of a
> reference with the result of dereferencing an equivalent pointer, but
> nowhere is the result of dereferencing such a pointer defined.

It says that the result of a reinterpret_cast to reference type is the
same as the result of dereferencing the result of a reinterpret_cast to
the corresponding pointer type.  It also says that the result of the
reinterpret_cast to reference type is an lvalue designating the
original object.  That looks like a definition (of *both*) to me.

My reasoning is that if the result of A is the same as the result of B,
and if the result of A is X, then we can conclude that the result of B
is X.

The only alternative I see is to assert that *all* reinterpret_casts to
reference type exhibit undefined behavior.  Is that what you mean?

I suspect that the language in 5.2.10/7 is causing the confusion.  It
says, "...the result of such a pointer conversion is unspecified."  And
I'm not contradicting that: the *value* of the pointer is not
specified.  That is, the bit-pattern of the pointer is not specified.
(In particular, it need not be the same bit-pattern as the original
pointer, even if it denotes the same machine address; type-tagged
pointer types are legal.)  But that doesn't prevent a later paragraph
from defining the result of dereferencing the pointer.  And that's what
5.2.10/10 does.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Tue, 11 Jul 2006 10:23:12 CST Raw View

Frederick Gotham wrote:
> Bob Bell posted:
>
>
> > I was talking about the fact that after reinterpret_cast'ing the
> > pointer, you then dereference it.
>
>
> A char has no trap representations, and so it is safe to access the bytes
> of any object in memory as if it were a char, signed char or unsigned
> char:
>
Not *signed char* ! The standard says it for *char* and *unsigned char*
only!
I'm pretty sure that the fact that the signedness of char is
implementation-defined is in the standard to allow machines where
signed char have trap representations or multiple representations of
the same value to use unsigned chars for char and be compliant.
IIRC the C99 standard has requirements on signed char representations,
but not the C++ standard.

Anyway, I agree that it is safe to access the bytes of an object via a
char* or an unsigned char*, if these char pointers are really pointing
to a byte of the representation of the object.
Of course, the value of the char read is unspecified, but uniquely
depend on the bits of the byte.
So, it is safe to copy a POD object via a char pointer, from one memory
location to another (if alignment requirements are correct... For
instance, to assign one POD object to another POD object).

Now, the real question is : Does reinterpret_cast produce a *valid*
pointer pointing at the *first byte* of the object?

During a very long time, I thought the negative answer... That's why I
only used static_cast<char*>(static_cast<void*>(ptr)) which seems
sensible if we read 3.8-5 (basic.life, paragraph 5).

5.2.10 seems to say that the mapping of reinterpret_cast is
implementation defined and that paragraph 7 doesn't say anything about
the result of a reinterpret_cast from pointer to object type to pointer
to another object type.. It only says what happens when converting back
to the original pointer (with correct alignment requirements
constraints).

I initially thought that it was everything we could assume about
reinterpret_cast...
And, it was very plausible that a machine produce an invalid pointer.
For instance, with an hypotetical 32 bits machine (which can't address
natively 8-bits bytes, but only 32 bits words), it is possible to write
a compiler using 32 bits bytes... But it is also possible to write a
compiler using 8-bits bytes, cutting 32 bits words in 4 bytes.
In that case, pointers to 32 bits words (probably int) will be
represented with the canonical representation of the machine, and have
an alignment requirement of 4.
But, char* and void* must have a special representation.

Two main possibilities : Having 64 bits char* pointers... With one of
these two words which contain a number in range [0,3] indicating which
subbyte of the word is pointed-to.

Another possibility : Having 32 bits char* pointers, equal to
(pointer_to_word << 4)+subbyte_index (pseudocode).
In that case, it might be sensible to have a reinterpret_cast which
doesn't change the representation of the pointer at all... And
reinterpret_casting an int* to char* would yield an incorrect pointer!

But now, I see that 5.2.10, paragraph 10 seems to say that the pointer
(or reference) must points (or refers) to the original lvalue.

johnchx2@yahoo.com wrote:
>  That is, a reference cast reinterpret_cast<T&>(x) has the same
>  effect as the conversion *reinterpret_cast<T*>(&x) with the
>  built-in & and * operators.  The result is an lvalue that refers
>  to the same object as the source lvalue, but with a different
>  type.

Theorically this pargraph is normative and implies that
reinterpret_cast works the same with pointers than it does with
references.
But, IMHO it is worth a defect report : This type of "backward
normative implication" is weird...
This requirement should be moved from paragraph 10 to paragraph 7.

In that case, everything would be clear, and it would be correct to
reinterpret_cast any pointer to char* and access it (of course, the
resulting char still has an unspecified value).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Tue, 4 Jul 2006 12:20:56 CST Raw View

Steve Clamage wrote:

> There is a standard Unix facility, also available in Linux,
> called XDR (eXternal Data Representation). It allows transfer
> of data between different systems.

XDR is, I believe, also an Internet standard.  As such, I expect
you can find some support for it on any machine which you can
connect to the Internet.  How much, on the other hand, might
depend, as it isn't the most widely used standard:-).

Note that xdr is a representation layer protocol, used by higher
layers, like RPC.

> Run "man xdr" for a description.

> Briefly, the sender marshalls the data values into a standard
> linear interchange format, and the receiver un-marshalls the
> data into its internal format.

> The problem of data transfer is not limited to one programming
> language, so it seems more appropriate for it to be
> standardized by something like the Single Unix Standard.

Also, of course, it isn't a universal protocol.  Having XDR
support in the language (or elsewhere) doesn't help if the other
side is speaking BER encoded ASN.1.  Perhaps the biggest
argument against standardizing such things is the fact that
there are so many to choose from.  (Of course, XDR has the
advantage of being one of the simplest, and a lot of protocols
use some subset of it, officially as such, or because what they
specify happens to coincide with XDR, at least for the common
types.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Wed, 5 Jul 2006 16:49:39 CST Raw View

Frederick Gotham wrote:
> Goalie_Ca posted:
>
> > I have done some searching and it appears that there isn't any standard
> > way to convert numbers from big endian to little endian and determine
> > if it is a big endian or little endian machine.
>
>
> I recently posted fully-portable code for determining the endianness of a
> machine. Here's the google link:
>
> http://groups.google.ie/group/comp.std.c++/msg/320642c7b4a21366?hl=en&

The way you use reinterpret_cast is undefined behavior; the code is not
portable.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Thu, 6 Jul 2006 01:32:02 GMT Raw View

Bob Bell posted:


> The way you use reinterpret_cast is undefined behavior; the code is
> not portable.


It is indeed portable, because any object's address can reliably be stored
in a char*.

But if you'd like to use old-style casts, then go ahead:

    char const *p = (char const*)&guinea_pig;

--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Thu, 6 Jul 2006 09:28:35 CST Raw View

Frederick Gotham wrote:
> Bob Bell posted:
>
>
> > The way you use reinterpret_cast is undefined behavior; the code is
> > not portable.
>
>
> It is indeed portable, because any object's address can reliably be stored
> in a char*.
>
> But if you'd like to use old-style casts, then go ahead:
>
>     char const *p = (char const*)&guinea_pig;

I was talking about the fact that after reinterpret_cast'ing the
pointer, you then dereference it.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Thu, 6 Jul 2006 15:24:11 GMT Raw View

Bob Bell posted:


> I was talking about the fact that after reinterpret_cast'ing the
> pointer, you then dereference it.


A char has no trap representations, and so it is safe to access the bytes
of any object in memory as if it were a char, signed char or unsigned
char:


#include <iostream>
#include <string>

template<class T>
void PrintObjectBytes( T const &obj )
{
    unsigned char const *p = (unsigned char const *)&obj;

    unsigned char const * const p_over = (unsigned char const *)(&obj +
1);

    do std::cout << (unsigned)*p++;
    while(p != p_over);
}

int main()
{
    std::string obj;

    PrintObjectBytes(obj);



    std::cout << '\n';



    unsigned char array[] = {1,2,3,4,5,6,7,8,9};

    PrintObjectBytes(array);
}



--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Thu, 6 Jul 2006 18:25:22 GMT Raw View

Frederick Gotham ha scritto:
> Bob Bell posted:
>
>> I was talking about the fact that after reinterpret_cast'ing the
>> pointer, you then dereference it.
>
> A char has no trap representations, and so it is safe to access the bytes
> of any object in memory as if it were a char, signed char or unsigned
> char:
>

Could you please quote the clause in the C++ standard that guarantees
that? I could not find it. The best I could find is this one, which is
quite close but still doesn't say that you can directly dereference
reinterpret_cast<char*>(ptr):

3.9/2: For any object (other than a base-class subobject) of POD type T,
whether or not the object holds a valid value of type T, the underlying
bytes (1.7) making up the object can be copied into an array of char or
unsigned char.

So, unless I'm missing something, in order to be fully-portable you have
to *copy* the object in a char array before inspecting it. Notice that
doing so does not require a reinterpret_cast.

Ganesh

PS: C-style casts are of no help, as they are defined in terms of
const_cast/static_cast/reinterpret_cast (5.4/5).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Bob Bell" <belvis@pacbell.net>
Date: Thu, 6 Jul 2006 15:27:18 CST Raw View

Frederick Gotham wrote:
> Bob Bell posted:
>
> > I was talking about the fact that after reinterpret_cast'ing the
> > pointer, you then dereference it.
>
> A char has no trap representations, and so it is safe to access the bytes
> of any object in memory as if it were a char, signed char or unsigned
> char:

I'm glad to hear that that's the case on your platform of choice.
Nevertheless, reinterpret_cast'ing a pointer to a different type of
pointer and then dereferencing it is undefined behavior. See section
5.2.10 in the standard.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Goalie_Ca" <goalieca@gmail.com>
Date: Sun, 2 Jul 2006 12:53:42 CST Raw View

I have done some searching and it appears that there isn't any standard
way to convert numbers from big endian to little endian and determine
if it is a big endian or little endian machine. There appears to be a
version for short and int in various network libraries. People appear
to invent their own every time for every data type from short to
double. Some are high performance and some are naive. Some machines
like itanium are actually capable of running natively in either but
must be compiled with a specific flag.

Is there perhaps a reason why it is not in the standard libraries or in
any of the technical reports?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Sun, 2 Jul 2006 23:42:55 GMT Raw View

Goalie_Ca posted:

> I have done some searching and it appears that there isn't any standard
> way to convert numbers from big endian to little endian and determine
> if it is a big endian or little endian machine.


I recently posted fully-portable code for determining the endianness of a
machine. Here's the google link:

http://groups.google.ie/group/comp.std.c++/msg/320642c7b4a21366?hl=en&


If you want fully-portable code to convert between endiannesses, then post
to comp.lang.c++ and I'll help you out.


--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: stephen.clamage@sun.com (Steve Clamage)
Date: Sun, 2 Jul 2006 23:44:22 GMT Raw View

Goalie_Ca wrote:

> I have done some searching and it appears that there isn't any standard
> way to convert numbers from big endian to little endian and determine
> if it is a big endian or little endian machine. There appears to be a
> version for short and int in various network libraries. People appear
> to invent their own every time for every data type from short to
> double. Some are high performance and some are naive. Some machines
> like itanium are actually capable of running natively in either but
> must be compiled with a specific flag.
>
> Is there perhaps a reason why it is not in the standard libraries or in
> any of the technical reports?

The data transfer problem is seldom limited to endianness. The sizes of
basic types might differ, and other aspects of data format -- especially
for floating-point numbers, and the padding of structs or arrays.

There is a standard Unix facility, also available in Linux, called XDR
(eXternal Data Representation). It allows transfer of data between
different systems.  Run "man xdr" for a description.

Briefly, the sender marshalls the data values into a standard linear
interchange format, and the receiver un-marshalls the data into its
internal format.

The problem of data transfer is not limited to one programming language,
so it seems more appropriate for it to be standardized by something like
the Single Unix Standard.

---
Steve Clamage, stephen.clamage@sun.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Goalie_Ca" <goalieca@gmail.com>
Date: Mon, 3 Jul 2006 04:03:29 CST Raw View

Steve Clamage wrote:
> Goalie_Ca wrote:
>
> > I have done some searching and it appears that there isn't any standard
> > way to convert numbers from big endian to little endian and determine
> > if it is a big endian or little endian machine. There appears to be a
> > version for short and int in various network libraries. People appear
> > to invent their own every time for every data type from short to
> > double. Some are high performance and some are naive. Some machines
> > like itanium are actually capable of running natively in either but
> > must be compiled with a specific flag.
> >
> > Is there perhaps a reason why it is not in the standard libraries or in
> > any of the technical reports?
>
>
> The data transfer problem is seldom limited to endianness. The sizes of
> basic types might differ, and other aspects of data format -- especially
> for floating-point numbers, and the padding of structs or arrays.
>
> There is a standard Unix facility, also available in Linux, called XDR
> (eXternal Data Representation). It allows transfer of data between
> different systems.  Run "man xdr" for a description.
>
> Briefly, the sender marshalls the data values into a standard linear
> interchange format, and the receiver un-marshalls the data into its
> internal format.
>
> The problem of data transfer is not limited to one programming language,
> so it seems more appropriate for it to be standardized by something like
> the Single Unix Standard.
>
> ---
> Steve Clamage, stephen.clamage@sun.com
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

What about the simple case where you need to read/mmap a large binary
file. Perhaps this file was downloaded from a server or retrieved off
of a network drive. Some binary files have the endianess in the spec
itself and users are required to convert depending on the machine. Many
libraries, including vtk [1], reivent the wheel to solve the problem. I
know I have a library of my own handy but it is not high-performance.
Having a standard library could possibly make use of machine
instructions for speed.

I suppose if c++ would like a stardard networking class, like on the
wishlist [2], then endianess features would be required as well.

[1] http://www.vtk.org/doc/nightly/html/classvtkByteSwap.html
[2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1901.html

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: cbarron3@ix.netcom.com (Carl Barron)
Date: Mon, 3 Jul 2006 15:21:54 GMT Raw View

Goalie_Ca <goalieca@gmail.com> wrote:

> I have done some searching and it appears that there isn't any standard
> way to convert numbers from big endian to little endian and determine
> if it is a big endian or little endian machine. There appears to be a
> version for short and int in various network libraries. People appear
> to invent their own every time for every data type from short to
> double. Some are high performance and some are naive. Some machines
> like itanium are actually capable of running natively in either but
> must be compiled with a specific flag.
>
> Is there perhaps a reason why it is not in the standard libraries or in
> any of the technical reports?
>
 Caring about the byte order of integeral types meens the source is
for/from an external device since otherwise the order is unimportant as
long as it is fixed. If from some other source in the wrong order then
A use text any integer of size N bytes can be sent as a text string in
Hex with M *k

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: nagle@animats.com (John Nagle)
Date: Mon, 3 Jul 2006 15:27:13 GMT Raw View

Goalie_Ca wrote:
> I have done some searching and it appears that there isn't any standard
> way to convert numbers from big endian to little endian and determine
> if it is a big endian or little endian machine. There appears to be a
> version for short and int in various network libraries. People appear
> to invent their own every time for every data type from short to
> double. Some are high performance and some are naive. Some machines
> like itanium are actually capable of running natively in either but
> must be compiled with a specific flag.
>
> Is there perhaps a reason why it is not in the standard libraries or in
> any of the technical reports?

    It's in the Single Unix Specification, Version 2.  See

     http://www.opengroup.org/onlinepubs/007908799/xns/htonl.html

    for "arpa/inet.h".

Whether those primitives belong in the language specification is an interesting
issue.

    John Nagle

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]