Topic: Uninitialised Unsigned Integral == Okay


Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 17 May 2006 12:37:27 CST
Raw View
"Tom   s" wrote:
> Hyman Rosen posted:

> > NULL@NULL.NULL wrote:

> >> Now, here's the bit pattern for the doors I want open, and
> >> the ones I want closed: 11110 0011 1001 1111 But... opps...
> >> wait a minute... certain combinations are invalid!

> > No. You misunderstand. Any type aside from chars may have
> > extra bits in its object representation aside from those
> > that form its value representation. Unsigned short always
> > has at least sixteen bits in its value representation, and
> > all combinations of those bits are legal. But it may have
> > extra bits that are unaffected by numerical operations - you
> > would be able to get at them by using memcpy into unsigned
> > char arrays.

> So if it is said that an unsigned integral type is 16-Bit, all
> it means is that you have 16 "value representation" bits? The
> integral type may in fact be comprised of 32 bits in memory
> (hypothetically) ?

It depends on what you say.  If you say that a type has 16 bits
in its value representation, yes.  If you say that it has 16
bits in its object representation, it means something else.
Just saying that a type has 16 bits doesn't say anything.
(FWIW: I've seen a lot of Fortran implementations where integers
had 32 bits in their object representation, but only 16 in their
value representation.)

> Looking at the following code:

> /* Assuming CHAR_BIT == 8

>    Assuming a short is 16-Bit.

16 bits what? Object representation, or value representation.
> */

> typedef unsigned char u8;

> typedef unsigned short u16;

> Does all of this imply that the following expression need not
> be true:

> sizeof( u16 ) == sizeof ( u8 )

I usually would imply that this expression is not true.  It's
probably true on some signal processors, but not on any machine
I've ever worked on.

> Okay, so I'm starting to acquiesce to the whole idea of
> "16-Bit != 16 bits", but it leaves one question lingering on
> my tongue:

>     Is it not woefully inefficient to have invalid bit
> patterns? I'd expect as much from a dumbed-down language like
> Java for instance, but not from C++.

If it's what the hardware does, then you really can't avoid,
now, can you?  (It's also used to some degree by some debugging
tools -- Purify associates a certain number of bits with each
pointer, for example, and checks them on each access.  And that
does slow things down -- enough that I've never heard of an
application being delivered with Purify linked in.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 17 May 2006 13:51:18 CST
Raw View
johnchx2@yahoo.com wrote:
> Tom   s wrote:

> > Here's a quotation from page 82 of the Standard:-

> > 3.9.1.4 :

> > /* Begin Quotation */
> > Unsigned integers, declared unsigned, shall obey the laws of
> > arithmetic modulo 2^n where n is the number of bits in the
> > value representation of that particular size of integer.
> > (This implies that unsigned arithmetic does not overflow
> > because a result that cannot be represented by the resulting
> > unsigned integer type is reduced modulo the number that is
> > one greater than the largest value that can be represented
> > by the resulting unsigned integer type.)
> > /* End Quotation */

> > >From this, we can conclude the following:

> > 1) 16-Bit unsigned integers obey the laws of arithmetic
> > modulo 65 536.
> > 2) The number by which the figure is reduced modulo is one
> > greater than the largest value that can be represented, and
> > therefore, the max value can be calculated from 65 536 - 1,
> > yielding the max value as: 65 535.

> > Mathematically speaking, if a 16-Bit unsigned integral type
> > can store values in the range 0 to 65 535 inclusive, then
> > each unique bit-pattern has a corresponding valid value.
> > Therefore, there can be no "invalid" bit pattern.

> I think that this logic runs into trouble with the difference
> between the value representation and the object representation
> (3.9/4).

Quite.  The current C standard (C99) makes this a lot clearer;
an object representation may contain padding bits which are not
part of its value, and these padding bits may be required to
have certain values, which may be incorrectly set if the value
has not been initialized.  And there has been at least one
implementation where this was the case.

> Character types have a special requirement (3.9.1/1)
> that all bits in their object representation participate in
> their value representation.  Therefore, any bit pattern in a
> char represents a valid value of char.

Not in C -- this is an extension in C++.  In C, the standard
explicitly allows negative 0 (in 1's complement and signed
magnitude representation) to be a trapping value -- this
obviously doesn't affect unsigned char, but it does mean that
signed char, and plain char if it is signed, can have a trapping
representation.

> Longer numeric types don't have such a requirement.  Nothing
> in the standard prohibits an implementation in which
> sizeof(unsigned short) == 3, of which only 2 bytes participate
> in the value representation, with the remaining bits available
> for creating special trapping patterns.

The intent was, at least at one time, to allow implementations
on a tagged architecture, like the Unisys ex-Burroughs series A,
where each "object" was associated with bits containing
information concerning its type.  This affects both unions and
uninitialized variables.  Writing to a integer field in a union
uses an instruction which sets the flags saying that the object
contains an integer.  Trying to read from the field as a float
would trigger a hardware trap.  Similarly, the bits in an
uninitialized variable were randomly set -- if the combination
happened to correspond to a floating point, and you read an
integer, you get a hardware trap.

I don't know how this machine handled character accesses.  (It
was a word addressed machine.)  But presumably it could be made
to work -- I'm pretty sure that members of the C committee, when
the standard was being written in the 1980's, were aware of this
architecture.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 17 May 2006 13:51:16 CST
Raw View
"Tom   s" wrote:
> > Longer numeric types don't have such a requirement.  Nothing
> > in the standard prohibits an implementation in which
> > sizeof(unsigned short) == 3, of which only 2 bytes
> > participate in the value representation, with the remaining
> > bits available for creating special trapping patterns.

> So is regular bitmasking unreliable in C++?

Why would you say that?  The special bits don't participate in
the value representation of the mask (which also has integral
type) either.  As long as you don't violate the type system
(except for the special exception of unsigned char), and don't
try to read uninitialized memory, bits not participating in the
value representation are invisible.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 17 May 2006 14:04:39 CST
Raw View
kuyper@wizard.net wrote:

    [...]
> Unsigned char is, I believe, one of the few types that is not
> allowed to have unused bits. I know this is stated explicitly
> for the C standard, and I think it's also specified, less
> clearly, in the C++ standard.

The text in the current C standard (C99) is considered by the C
committee to be a clarification of the intent of that in C90.
C++ is based on the C90 standard, and with the possible
exception of allowing arbitrary access through char as well as
through unsigned char, I don't believe that there were any
intentional differences with regards to the representation of
integral types.

    [...]

> There are DEC platforms where the natural byte size is 9 bits.

s/are/were/.

There are still Unisys platforms where this is the case.  Except
that I'm not sure whether talking about the natural byte size on
these platforms makes sense.  The machines are word addresses,
with 36 bit words.  The tradition in the PDP-10 world was to put
5 seven bit bytes in a word, with one bit left over.  This isn't
legal in C nor in C++, and the classical C implementations did
use 9 bit bytes.

(The byte manipulation instructions -- or was it the pointers
themselves -- in a PDP-10 had a field which specified the size
of the bytes.  The instructions which incremented a byte pointer
would add this size to the bit offset of the pointer, then if
the results were greater than or equal to 36, set the bit offset
to 0, and increment the word address.  The bit offset was on the
high order bits, so if p had the type char*, (unsigned)p <
(unsigned)(p + 1) was often false.)

> On such a platform, there's many different ways C++ could be
> implemented, and on of those options is to emulate 8, 16, 32,
> and 64 bit types by leaving one bit per byte unused.

Maybe, but that would probably have considerable runtime
overhead.  The PDP-10 implementation of C used 9 bit bytes, and
I presume that that is also what the C compiler for Unisys 2200
uses.

> I've also heard of at least one platform which had little or
> no hardware support for integer arithmetic, but only a
> floating point processor. I've heard that there was at least
> one implemenation for such a platform that used the mantissa
> portion of a floating point number to represent integer types,
> leaving the exponent portion unused, so
> sizeof(long)==sizeof(double), and LONG_MAX was something
> bizarre like 2^50-1.

It's not quite that.  I suspect that you're thinking of the
Unisys n    Burroughs Series A.  48 bit words, but type tagged.
There were no separate instructions for floating point or
integer, but rather the instruction looked at the word, and
decided according to its contents.  A mantissa field of all 0's
meant that it was an integer; otherwise it was floating point.
(There were additional tagging options to specify pointers,
etc.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 17 May 2006 14:06:18 CST
Raw View
"Tom   s" wrote:
> Seungbeom Kim posted:

> > And what does that buy you?  What useful things could you do
> > with values of uninitialized objects?

> struct ManufacturerInfo {
>     bool has_id;
>     unsigned long id;
> };
>
> ManufacturerInfo SomeFunc()
> {
>      ManufacturerInfo info;
>
>      /* Let's try to retrieve their ID from a database.
>         And let's assume that the retrieval failed.
>      */
>
>     info.has_id = false;
>     return info;
> }

> If there were no invalid bit patterns, the above function
> wouldn't have to worry about the calling function accessing
> the "id" member. Of course, "SomeFunc" could simply set it to
> zero, but that would be inefficient.

That's actually an interesting example.  The above code has
undefined behavior, regardless of what the user code does, since
you cannot copy an uninitialized value (in theory, and except
with memcpy and the like.)

And while I don't know about this exact example, I have had
problems in the past with code like:

    struct
    {
        char            name[ 20 ] ;
        char            id[ 15 ] ;
        //  ...
    } ;

being used for disk layout, and the fields being initialized
using strcpy.  I got error messages from the system about
reading uninitialized memory.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "kanze" <kanze@gabi-soft.fr>
Date: Thu, 18 May 2006 00:56:44 CST
Raw View
"Tom   s" wrote:
> > Does all of this imply that the following expression need not be true:

> > sizeof( u16 ) == sizeof ( u8 )

> Meant to write:

>     sizeof(u16) == 2 * sizeof( u8 )

The important point here is that sizeof(T) * CHAR_BIT doesn't
necessarily give you the log2 of
std::numeric_limits<T>::max()+1 (supposing the latter
representable).  Unless T is unsigned char.  Thus, on a Unisys
series A, short, int and long all have a sizeof of 6, CHAR_BIT
is 8, but UINT_MAX is 2^40, and not 2^48.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: kuyper@wizard.net
Date: Thu, 18 May 2006 00:58:21 CST
Raw View
kanze wrote:
.
> Not in C -- this is an extension in C++.  In C, the standard
> explicitly allows negative 0 (in 1's complement and signed
> magnitude representation) to be a trapping value -- this
> obviously doesn't affect unsigned char, but it does mean that
> signed char, and plain char if it is signed, can have a trapping
> representation.

Oddly enough, trap representations aren't defined by the fact that they
trap, they are actually defined by the fact that they don't represent a
valid value. The term "trap" refers to the fact that any attempt to
actually use the value of an object containing a trap representation
(in C++ terms, this would correspond to an lvalue->rvalue conversion)
has undefined behavior - except if it is accessed through an lvalue of
character type. The C standard doesn't explain what happens if you try
to access a trap representation of a character type, but it clearly
prohibits the operation from trapping.

I have never quite figured out what to make of that. Some people have
argued that the absense of any definition of the behavior renders the
behavior implicitly undefined, and that makes a bit of sense, but that
interpretation makes the exception for character types pointless.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Tom s" <NULL@NULL.NULL>
Date: Fri, 12 May 2006 12:15:31 CST
Raw View
Jeff Rife posted:

>> Is there not a requirement whereby an X-Bit unsigned integral type
>> can:
>>
>> a) Hold 2^X unique values
>>
>> b) Store values 0 to ( 2^X - 1 ) inclusive
>
> For "signed char" and "unsigned char" (and thus, "char"), there appears
> to be this requirement based on the wording in 3.9.1.1, but "these
> requirements do not hold for other types".  There are no changes to
> this section in the draft concerning this wording, so I'd say it's not
> gonna change.


[Note to moderator: This is my second time sending this; I sent it yesterday
but it never showed up. Please discard the previous post if you end up
letting this one through.]

Here's a quotation from page 82 of the Standard:-

3.9.1.4 :

/* Begin Quotation */
Unsigned integers, declared unsigned, shall obey the laws of arithmetic
modulo 2^n where n is the number of bits in the value representation of that
particular size of integer.
(This implies that unsigned arithmetic does not overflow because a result
that cannot be represented by the resulting unsigned integer type is reduced
modulo the number that is one greater than the largest value that can be
represented by the resulting unsigned integer type.)
/* End Quotation */


>From this, we can conclude the following:

1) 16-Bit unsigned integers obey the laws of arithmetic modulo 65 536.
2) The number by which the figure is reduced modulo is one greater than the
largest value that can be represented, and therefore, the max value can be
calculated from 65 536 - 1, yielding the max value as: 65 535.

Mathematically speaking, if a 16-Bit unsigned integral type can store values
in the range 0 to 65 535 inclusive, then each unique bit-pattern has a
corresponding valid value. Therefore, there can be no "invalid" bit pattern.

And so the following code can't possibly fail:

#include <iostream>

int main()
{
    unsigned char a;
    unsigned short b;
    unsigned c;
    unsigned long;

    std::cout << a << b << c << d;
}


I propose that the Standard should expicitly state that there's nothing
wrong with reading the value of an uninitialised object whose type is one of
the unsigned integral types.


-Tom   s

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: johnchx2@yahoo.com
Date: 12 May 2006 20:30:08 GMT
Raw View
Tom   s wrote:

> Here's a quotation from page 82 of the Standard:-
>
> 3.9.1.4 :
>
> /* Begin Quotation */
> Unsigned integers, declared unsigned, shall obey the laws of arithmetic
> modulo 2^n where n is the number of bits in the value representation of that
> particular size of integer.
> (This implies that unsigned arithmetic does not overflow because a result
> that cannot be represented by the resulting unsigned integer type is reduced
> modulo the number that is one greater than the largest value that can be
> represented by the resulting unsigned integer type.)
> /* End Quotation */
>
>
> >From this, we can conclude the following:
>
> 1) 16-Bit unsigned integers obey the laws of arithmetic modulo 65 536.
> 2) The number by which the figure is reduced modulo is one greater than the
> largest value that can be represented, and therefore, the max value can be
> calculated from 65 536 - 1, yielding the max value as: 65 535.
>
> Mathematically speaking, if a 16-Bit unsigned integral type can store values
> in the range 0 to 65 535 inclusive, then each unique bit-pattern has a
> corresponding valid value. Therefore, there can be no "invalid" bit pattern.

I think that this logic runs into trouble with the difference between
the value representation and the object representation (3.9/4).
Character types have a special requirement (3.9.1/1) that all bits in
their object representation participate in their value representation.
Therefore, any bit pattern in a char represents a valid value of char.

Longer numeric types don't have such a requirement.  Nothing in the
standard prohibits an implementation in which sizeof(unsigned short) ==
3, of which only 2 bytes participate in the value representation, with
the remaining bits available for creating special trapping patterns.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: hyrosen@mail.com (Hyman Rosen)
Date: Fri, 12 May 2006 21:21:50 GMT
Raw View
Tom=E1s wrote:
> Mathematically speaking, if a 16-Bit unsigned integral type can store v=
alues=20
> in the range 0 to 65 535 inclusive, then each unique bit-pattern has a=20
> corresponding valid value. Therefore, there can be no "invalid" bit pat=
tern.

Incorrect. There is no requirement on integers that their
object representation have the same number of bits as their
value representation (3.9.1/1). It would be legal according
to the standard to augment integers with an "initialized"
bit such that attempting to access an uninitialized integer
would result in a trap.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: NULL@NULL.NULL ("Tom s")
Date: Fri, 12 May 2006 21:57:49 GMT
Raw View
> Longer numeric types don't have such a requirement.  Nothing in the
> standard prohibits an implementation in which sizeof(unsigned short) =3D=
=3D
> 3, of which only 2 bytes participate in the value representation, with
> the remaining bits available for creating special trapping patterns.

So is regular bitmasking unreliable in C++?

Let's say I have a mansion, and this mansion has 16 doors.

I have a computer automated system which keeps track of which doors are=20
open, and which are closed.

Furthermore, I can open and close doors automatically by supplying a=20
particular function with a bitmask.

Firstly, I want a 16-Bit type:

typedef unsigned short u16; /* On this platform */


Now, here's the bit pattern for the doors I want open, and the ones I wan=
t=20
closed:

11110 0011 1001 1111


But... opps... wait a minute... certain combinations are invalid!

Lastly, is there ANY platform you know of where you haven't got the full=20
range of an unsigned integral type?

-Tom=E1s

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: hyrosen@mail.com (Hyman Rosen)
Date: Sat, 13 May 2006 01:58:56 GMT
Raw View
NULL@NULL.NULL wrote:
> Now, here's the bit pattern for the doors I want open, and the ones I want
> closed:
> 11110 0011 1001 1111
> But... opps... wait a minute... certain combinations are invalid!

No. You misunderstand. Any type aside from chars may have extra
bits in its object representation aside from those that form its
value representation. Unsigned short always has at least sixteen
bits in its value representation, and all combinations of those
bits are legal. But it may have extra bits that are unaffected
by numerical operations - you would be able to get at them by
using memcpy into unsigned char arrays.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Fri, 12 May 2006 21:01:14 CST
Raw View
Tom   s wrote:
> Jeff Rife posted:
>
> >> Is there not a requirement whereby an X-Bit unsigned integral type
> >> can:
> >>
> >> a) Hold 2^X unique values
> >>
> >> b) Store values 0 to ( 2^X - 1 ) inclusive
> >
> > For "signed char" and "unsigned char" (and thus, "char"), there appears
> > to be this requirement based on the wording in 3.9.1.1, but "these
> > requirements do not hold for other types".  There are no changes to
> > this section in the draft concerning this wording, so I'd say it's not
> > gonna change.
>
> Here's a quotation from page 82 of the Standard:-
>
> 3.9.1.4 :
>
> /* Begin Quotation */
> Unsigned integers, declared unsigned, shall obey the laws of arithmetic
> modulo 2^n where n is the number of bits in the value representation of that
> particular size of integer.
> (This implies that unsigned arithmetic does not overflow because a result
> that cannot be represented by the resulting unsigned integer type is reduced
> modulo the number that is one greater than the largest value that can be
> represented by the resulting unsigned integer type.)
> /* End Quotation */
>
>
> >From this, we can conclude the following:
>
> 1) 16-Bit unsigned integers obey the laws of arithmetic modulo 65 536.
> 2) The number by which the figure is reduced modulo is one greater than the
> largest value that can be represented, and therefore, the max value can be
> calculated from 65 536 - 1, yielding the max value as: 65 535.
>
> Mathematically speaking, if a 16-Bit unsigned integral type can store values
> in the range 0 to 65 535 inclusive, then each unique bit-pattern has a
> corresponding valid value. Therefore, there can be no "invalid" bit pattern.

The number of bits in the "value representation" may be less than the
number of bits stored in the type. Therefore a 16-bit type may have
only, say, 14 bits of value represented - with the two additional bits
serving as flags for some kind of trap mechanism or other
implementation-defined purpose.

> And so the following code can't possibly fail:
>
> #include <iostream>
>
> int main()
> {
>     unsigned char a;
>     unsigned short b;
>     unsigned c;
>     unsigned long d;
>
>     std::cout << a << b << c << d;
> }

Because the values of b, c and d are indeterminate they might not
contain valid value representations for their respective types.
Accessing them in this state can therefore lead to undefined behavior.

Greg


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Sat, 13 May 2006 14:11:48 GMT
Raw View
In article <20060512221133.1235210518@mscan3.ucar.edu>, Hyman Rosen
<hyrosen@mail.com> writes
>NULL@NULL.NULL wrote:
>> Now, here's the bit pattern for the doors I want open, and the ones I
>>want  closed:
>> 11110 0011 1001 1111
>> But... opps... wait a minute... certain combinations are invalid!
>
>No. You misunderstand. Any type aside from chars may have extra
>bits in its object representation aside from those that form its
>value representation. Unsigned short always has at least sixteen
>bits in its value representation, and all combinations of those
>bits are legal. But it may have extra bits that are unaffected
>by numerical operations - you would be able to get at them by
>using memcpy into unsigned char arrays.

And it is entirely valid, and perhaps even useful, to have a non-value
bit in an objects representation that identifies the fact that the
object has not been initialised.

#include <iostream>
#include <ostream>
void foo(i&);

int main(){
   int i;
   foo(i);
   std::cout << i;
}

Is, I believe, entirely allowed to fail at execution time, perhaps with
a diagnostic such as:

'Output of uninitialised int attempted.'

(And the purpose of foo() above is to remind readers that sometimes the
compiler cannot diagnose the failure to initialise a variable. Compare:

void foo(int &){ return; }
with
void foo(int & i){i = 0;}




--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: NULL@NULL.NULL ("Tom s")
Date: Sat, 13 May 2006 14:13:30 GMT
Raw View
Hyman Rosen posted:

> NULL@NULL.NULL wrote:
>> Now, here's the bit pattern for the doors I want open, and the ones I
>> want closed: 11110 0011 1001 1111
>> But... opps... wait a minute... certain combinations are invalid!
>=20
> No. You misunderstand. Any type aside from chars may have extra
> bits in its object representation aside from those that form its
> value representation. Unsigned short always has at least sixteen
> bits in its value representation, and all combinations of those
> bits are legal. But it may have extra bits that are unaffected
> by numerical operations - you would be able to get at them by
> using memcpy into unsigned char arrays.


So if it is said that an unsigned integral type is 16-Bit, all it means i=
s=20
that you have 16 "value representation" bits? The integral type may in fa=
ct=20
be comprised of 32 bits in memory (hypothetically) ?


Looking at the following code:


/* Assuming CHAR_BIT =3D=3D 8

   Assuming a short is 16-Bit.
*/


typedef unsigned char u8;

typedef unsigned short u16;


Does all of this imply that the following expression need not be true:

sizeof( u16 ) =3D=3D sizeof ( u8 )


Okay, so I'm starting to acquiesce to the whole idea of "16-Bit !=3D 16 b=
its",=20
but it leaves one question lingering on my tongue:

    Is it not woefully inefficient to have invalid bit patterns? I'd expe=
ct=20
as much from a dumbed-down language like Java for instance, but not from=20
C++.


-Tom=E1s

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: NULL@NULL.NULL ("Tom s")
Date: Sat, 13 May 2006 16:22:18 GMT
Raw View
> Does all of this imply that the following expression need not be true:
>=20
> sizeof( u16 ) =3D=3D sizeof ( u8 )

Meant to write:

    sizeof(u16) =3D=3D 2 * sizeof( u8 )


-Tom=E1s

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: wevsr@nabs.net (Jeff Rife)
Date: Sat, 13 May 2006 22:18:10 GMT
Raw View
Tom=E1s (NULL@NULL.NULL) wrote in comp.std.c++:
>     Is it not woefully inefficient to have invalid bit patterns? I'd ex=
pect=20
> as much from a dumbed-down language like Java for instance, but not fro=
m=20
> C++.

The extra check bits don't even have to be available to the C++
implementation.  They could be like ECC memory bits that aren't available
without hardware-dependant system calls.  So, you'd allocate a 16-bit
value that's all that is allocated in real memory, but somewhere else
some bits are set aside to determine the validity of that memory.

Likewise, for "checked" or "debug" builds, the implementation could add
extra bits to allow more rigorous testing for errors, but then remove
them for "release" builds.  It could even lie to any code that asks and
say that sizeof(int) =3D=3D 4 even though it allocates extra bits somewhe=
re
as check bits.  Since the implementation controls all access to memory,
this would work, even if you try to roll your own raw memory copy functio=
n.

--=20
Jeff Rife | =20
          | http://www.netfunny.com/rhf/jokes/99/Apr/columbine.html=20

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: kuyper@wizard.net
Date: Sat, 13 May 2006 17:15:16 CST
Raw View
"Tom   s" wrote:
> Hyman Rosen posted:
>
> > NULL@NULL.NULL wrote:
> >> Now, here's the bit pattern for the doors I want open, and the ones I
> >> want closed: 11110 0011 1001 1111
> >> But... opps... wait a minute... certain combinations are invalid!
> >
> > No. You misunderstand. Any type aside from chars may have extra
> > bits in its object representation aside from those that form its
> > value representation. Unsigned short always has at least sixteen
> > bits in its value representation, and all combinations of those
> > bits are legal. But it may have extra bits that are unaffected
> > by numerical operations - you would be able to get at them by
> > using memcpy into unsigned char arrays.
>
>
> So if it is said that an unsigned integral type is 16-Bit, all it means is
> that you have 16 "value representation" bits? ...

Unfortunately, you can't guarantee which number the person is referring
to; having the number of value bits equal to the number of
representation bits is so commonplace that most people don't bother
distinguishing them. As a result, on those rare occasions when you do
need to distinguish them, the only way to be certain you're
communicating clearly with someone else is to specify explicitly
whether you're talking about a type with 16 value bits or a type with
16 representation bits.

> ... The integral type may in fact
> be comprised of 32 bits in memory (hypothetically) ?

Yes, that is permitted.

> Looking at the following code:
>
>
> /* Assuming CHAR_BIT == 8
>
>    Assuming a short is 16-Bit.
> */
>
>
> typedef unsigned char u8;
>
> typedef unsigned short u16;
>
>
> Does all of this imply that the following expression need not be true:
>
> sizeof( u16 ) == sizeof ( u8 )

Unsigned char is, I believe, one of the few types that is not allowed
to have unused bits. I know this is stated explicitly for the C
standard, and I think it's also specified, less clearly, in the C++
standard. However, if you were talking about an unsigned short int with
16 value bits and an unsigned int with 32 value bits, yes, it is
permitted for sizeof(u16) == sizeof(u32).

>     Is it not woefully inefficient to have invalid bit patterns? I'd expect
> as much from a dumbed-down language like Java for instance, but not from
> C++.

C++ shares the C heritage of deliberately underspecifying the language,
allowing a fully comforming efficient implementation upon a much wider
variety of platforms than is allowed by more strongly specified
languages. There are DEC platforms where the natural byte size is 9
bits. On such a platform, there's many different ways C++ could be
implemented, and on of those options is to emulate 8, 16, 32, and 64
bit types by leaving one bit per byte unused.

I've also heard of at least one platform which had little or no
hardware support for integer arithmetic, but only a floating point
processor. I've heard that there was at least one implemenation for
such a platform that used the mantissa portion of a floating point
number to represent integer types, leaving the exponent portion unused,
so sizeof(long)==sizeof(double), and LONG_MAX was something bizarre
like 2^50-1. Sorry; I can't give you more specific details on that
implementation - I have no personal experience with it. The important
thing is that such an implementation could be fully conforming.
Therefore, if you write code that wouldn't work on such an
implementation, it's not fully portable.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: bart@ingen.ddns.info (Bart van Ingen Schenau)
Date: Sun, 14 May 2006 02:03:02 GMT
Raw View
Tom=E1s wrote:

> Hyman Rosen posted:
>=20
>> NULL@NULL.NULL wrote:
>>> Now, here's the bit pattern for the doors I want open, and the ones
>>> I want closed: 11110 0011 1001 1111
>>> But... opps... wait a minute... certain combinations are invalid!
>>=20
>> No. You misunderstand. Any type aside from chars may have extra
>> bits in its object representation aside from those that form its
>> value representation. Unsigned short always has at least sixteen
>> bits in its value representation, and all combinations of those
>> bits are legal. But it may have extra bits that are unaffected
>> by numerical operations - you would be able to get at them by
>> using memcpy into unsigned char arrays.
>=20
> So if it is said that an unsigned integral type is 16-Bit, all it
> means is that you have 16 "value representation" bits? The integral
> type may in fact be comprised of 32 bits in memory (hypothetically) ?

That depends on who's doing the talking. :-)
Sometimes people will mean the storage-size (effectively
CHAR_BIT*sizeof(T) ), and sometimes they are referring to the number of
bits in the value representation (which defines the range of numbers
you can store).

For a typical PC, the distinction does not really matter, because those
processors don't (currently) use padding bits in their types.

<snip>=20
> Okay, so I'm starting to acquiesce to the whole idea of "16-Bit !=3D 16
> bits", but it leaves one question lingering on my tongue:
>=20
>     Is it not woefully inefficient to have invalid bit patterns? I'd
>     expect
> as much from a dumbed-down language like Java for instance, but not
> from C++.

That entirely depends on how the underlying hardware uses the various
types.
Imagine, for example, a processor that interprets each 8-bit byte as 7
value bits and a parity bit. At a cost, you can disable the parity
checking and use the full 8 bits as data bits.

I can easily see a C++ implementation for such a processor that has the
following characteristics:

- CHAR_BIT =3D=3D 8
- char, unsigned char and signed char trigger the special (expensive)
8-bit data instructions
- sizeof(unsigned short) =3D=3D 3
- USHORT_MAX =3D=3D 2097151 (2^21 - 1, 21 data bits + 3 padding/parity bi=
ts)

Note: The only way that you can find out where the padding bits are
located within an unsigned short object, is by looking at that object
as an array of unsigned char. The normal bitwise operations will act
as-if the 21 data bits are contiguous.

>=20
> -Tom=E1s
>=20
Bart v Ingen Schenau
--=20
a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://www.eskimo.com/~scs/C-faq/top.html
c.l.c++ FAQ: http://www.parashift.com/c++-faq-lite/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: musiphil@bawi.org (Seungbeom Kim)
Date: Sun, 14 May 2006 18:28:02 GMT
Raw View
Tom=E1s wrote:
>=20
> And so the following code can't possibly fail:
>=20
> #include <iostream>
>=20
> int main()
> {
>     unsigned char a;
>     unsigned short b;
>     unsigned c;
>     unsigned long;
>=20
>     std::cout << a << b << c << d;
> }
>=20
>=20
> I propose that the Standard should expicitly state that there's nothing=
=20
> wrong with reading the value of an uninitialised object whose type is o=
ne of=20
> the unsigned integral types.

And what does that buy you?
What useful things could you do with values of uninitialized objects?

--=20
Seungbeom Kim

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: NULL@NULL.NULL ("Tom s")
Date: Sun, 14 May 2006 22:50:25 GMT
Raw View
Seungbeom Kim posted:

> And what does that buy you?
> What useful things could you do with values of uninitialized objects?


struct ManufacturerInfo {
   =20
    bool has_id;

    unsigned long id;

};


ManufacturerInfo SomeFunc()
{
     ManufacturerInfo info;

     /* Let's try to retrieve their ID from a database.
        And let's assume that the retrieval failed.
     */

    info.has_id =3D false;

    return info;
}


If there were no invalid bit patterns, the above function wouldn't have t=
o=20
worry about the calling function accessing the "id" member. Of course,=20
"SomeFunc" could simply set it to zero, but that would be inefficient.

(Assuming that every Manufacturer ID is valid, including 0).

-Tom=E1s

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: musiphil@bawi.org (Seungbeom Kim)
Date: Tue, 16 May 2006 14:53:39 GMT
Raw View
NULL@NULL.NULL wrote:
>
> struct ManufacturerInfo {
>     bool has_id;
>     unsigned long id;
> };
>
> ManufacturerInfo SomeFunc()
> {
>      ManufacturerInfo info;
>
>      /* Let's try to retrieve their ID from a database.
>         And let's assume that the retrieval failed.
>      */
>
>     info.has_id = false;
>     return info;
> }
>
> If there were no invalid bit patterns, the above function wouldn't have to
> worry about the calling function accessing the "id" member. Of course,
> "SomeFunc" could simply set it to zero, but that would be inefficient.

I have a question: does returning a struct with an uninitialized member
by value cause undefined behaviour (because it involves copying)?

In that case, SomeFunc() will have to initialize info.id to whatever
value to avoid UB - which seems to be more work than really necessary.

--
Seungbeom Kim

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: NULL@NULL.NULL ("Tom s")
Date: Wed, 17 May 2006 06:07:08 GMT
Raw View
Seungbeom Kim posted:

> I have a question: does returning a struct with an uninitialized member
> by value cause undefined behaviour (because it involves copying)?
>=20
> In that case, SomeFunc() will have to initialize info.id to whatever
> value to avoid UB - which seems to be more work than really necessary.


Well it seems we've discovered that we can, in fact, have invalid bit=20
patterns for an unsigned integer type, so I'd say yes, it would be UB to=20
copy it, just like so:

struct Monkey { unsigned a; unsigned long b; };

int main()
{
    Monkey object1;

    Monkey object2 =3D object1; /* UB */
}


-Tom=E1s

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]