Topic: Just want to clarify the whole "char" thing


Author: Frederick Gotham <fgothamNO@SPAM.com>
Date: Thu, 13 Jul 2006 16:49:12 CST
Raw View
I realise that none of the following types may contain padding bits, and
that all of their bits must take part in value representation:

    signed char
    unsigned char

(As "char" must map to one of the above, this also applies to "char".)

What I would like to clarify however, is the possiblity of a "signed char"
(and thus plain char) having an invalid value, or "trap value".

Below, I reproduce code which prints out an object's bytes by using an
"unsigned char". If I were to change it to "signed char" or plain char,
could the code potentially invoke undefined behaviour?

#include <ostream>

template<class T>
void PrintObjectBytes(T const &obj,std::ostream &os)
{
    unsigned char const * const p_over =
        reinterpret_cast<unsigned char const *>(&obj + 1);

    unsigned char const *p =
        reinterpret_cast<unsigned char const *>(&obj);

    do os << (unsigned)*p++ << "  ";
    while(p != p_over);

    os << '\n';
}

#include <iostream>
#include <vector>

int main()
{
    std::vector<double> obj1;

    int obj2[5] = {2,3,4,5,6};

    PrintObjectBytes(obj1, std::cout);
    PrintObjectBytes(obj2, std::cout);
}


--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "SuperKoko" <tabkannaz@yahoo.fr>
Date: Fri, 14 Jul 2006 13:35:07 CST
Raw View
Frederick Gotham wrote:
> I realise that none of the following types may contain padding bits, and
> that all of their bits must take part in value representation:
>
>     signed char
>     unsigned char
>
> (As "char" must map to one of the above, this also applies to "char".)
>
> What I would like to clarify however, is the possiblity of a "signed char"
> (and thus plain char) having an invalid value, or "trap value".
>
> Below, I reproduce code which prints out an object's bytes by using an
> "unsigned char". If I were to change it to "signed char" or plain char,
> could the code potentially invoke undefined behaviour?
>
> #include <ostream>
>
> template<class T>
> void PrintObjectBytes(T const &obj,std::ostream &os)
> {
>     unsigned char const * const p_over =
>         reinterpret_cast<unsigned char const *>(&obj + 1);
>
>     unsigned char const *p =
>         reinterpret_cast<unsigned char const *>(&obj);
>
>     do os << (unsigned)*p++ << "  ";
>     while(p != p_over);
>
>     os << '\n';
> }
>
> #include <iostream>
> #include <vector>
>
> int main()
> {
>     std::vector<double> obj1;
>
>     int obj2[5] = {2,3,4,5,6};
>
>     PrintObjectBytes(obj1, std::cout);
>     PrintObjectBytes(obj2, std::cout);
> }
>

Using "signed char" leads to UB : Even if the value-representation
would be ok.
Because the compiler may assume that a signed char can't be an alias of
a non-char type.

The revelant paragraph is 3.10-15 [basic.lval]

"15If  a program attempts to access the stored value of an object
through
  an lvalue of other than one of the following  types  the  behavior
is
  undefined25):
  _________________________
  25) The intent of this list is to specify those circumstances in
which

  --the dynamic type of the object,

  --a cv-qualified version of the dynamic type of the object,

  --a type that is the signed or  unsigned  type  corresponding  to
the
    dynamic type of the object,

  --a  type  that  is the signed or unsigned type corresponding to a
cv-
    qualified version of the dynamic type of the object,

  --an aggregate or union type that includes one of  the
aforementioned
    types  among its members (including, recursively, a member of a
sub-
    aggregate or contained union),

  --a type that is a (possibly cv-qualified)  base  class  type  of
the
    dynamic type of the object,

  --a char or unsigned char type.

  _________________________
  an object may or may not be aliased."

For unsigned char or char, I tend to think that it's ok, even though
4.1 [conv.lval] seems to say that it has UB.

"
If the object to which
  the lvalue refers is not an object of type T and is not an object of
a
  type derived from T, or if the object is uninitialized, a program
that
  necessitates this conversion has undefined behavior.
"
Actually, 4.1 is not clear since it doesn't say what means the *object
to which the lvalue refers*. Is it the dynamic type of the object? Is
it any type through which 3.10-15 says that the object can be accessed?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: kanze.james@neuf.fr (James Kanze)
Date: Fri, 14 Jul 2006 19:47:16 GMT
Raw View
Frederick Gotham wrote:
 > I realise that none of the following types may contain padding
 > bits, and that all of their bits must take part in value
 > representation:

 >     signed char
 >     unsigned char

 > (As "char" must map to one of the above, this also applies to
 > "char".)

It doesn't "map" to one of the above; it is always a distinct
type.  I think you mean that it must have the same
representation as one of the above.

 > What I would like to clarify however, is the possiblity of a
 > "signed char" (and thus plain char) having an invalid value,
 > or "trap value".

The C99 standard explicitly says it can, and explains how.
(More exactly, it says that a signed integral type can have
trapping values even in the absence of padding bits.)

 From what I understand, this part of the C99 standard is meant
to simply state more precisely and unambiguously what the
earlier version (and the C++ standard) says.

Also, as far as I can see, there is nothing in either standard
to even suggest that this is not the case.  The only guarantee
one has for signed char, as opposed to other signed integral
types, is that it has no padding bits.  But that's irrelevant;
an IEEE float has no padding bits either, but it explicitly has
trapping representations.

 > Below, I reproduce code which prints out an object's bytes by
 > using an "unsigned char". If I were to change it to "signed
 > char" or plain char, could the code potentially invoke
 > undefined behaviour?

 > #include <ostream>

 > template<class T>
 > void PrintObjectBytes(T const &obj,std::ostream &os)
 > {
 >     unsigned char const * const p_over =3D
 >         reinterpret_cast<unsigned char const *>(&obj + 1);

 >     unsigned char const *p =3D
 >         reinterpret_cast<unsigned char const *>(&obj);
 >
 >     do os << (unsigned)*p++ << "  ";
 >     while(p !=3D p_over);

 >     os << '\n';
 > }

If you change the type to signed char, you have undefined
behavior.  I once conceptually designed a machine in which it
would crash.  (1's complement, all bits one on memory read
crashes, and all new memory automatically initialized with all
bits 1.  The idea is that the hardware catches uninitialized
memory reads.)  I don't think any such machine has ever been
built, however.

Changing the type to signed char, or even plain char on my
machine, does cause the output to be somewhat different than
what you probably expect (a number of bytes are displayed with 8
digits), but that's a different problem.

--=20
James Kanze                                    kanze.james@neuf.fr
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]