Thread

Topic: Standard way for coverting byte stream to a signed two's

Author: James Kanze <james.kanze@gmail.com>
Date: Tue, 20 Nov 2007 16:49:08 CST Raw View

On Nov 19, 4:50 pm, Sebastian Redl <e0226...@student.tuwien.ac.at>
wrote:
> petek1976 wrote:
> > unsigned long ExtractUInt32M(const unsigned char *apnOctets)
> > {
> >   const unsigned char *lpnOctets = apnOctets;
> >   unsigned long lnValue = *(lpnOctets++);
> >   lnValue = (lnValue << 8) | *(lpnOctets++);
> >   lnValue = (lnValue << 8) | *(lpnOctets++);
> >   lnValue = (lnValue << 8) | *(lpnOctets);
> >   return lnValue;
> >  }

> 1) unsigned char isn't guaranteed to be 8 bits large.
> 2) unsigned long isn't guaranteed to be 32 bits large.

> Of the two, the latter is the realistic issue. GCC in 64-bit
> mode has a 64-bit long. char is 8 bits on pretty much every
> platform, except for some embedded systems that only support
> word addressing and have 16- or 32-bit chars.

Both are realistic issues.  But in this case, what counts is the
external format: if the external format is 4 8 bit bytes, high
byte first, that's what he's got to read.  All that his code
above requires is that unsigned long be at least 32 bits (which
is guaranteed), and that any unused upper bits of an unsigned
char be 0 (and it's easy to get around that requirement by
adding an "& 0xFF" in the appropriate places).

> > In line 8 the process is completed by subtracting 2^31
> > again, which will result in the correct negative value.

> There is no guarantee that numbers use 2's complement either.
> (Again, more of a theoretical issue.)

No.  There are machines still being sold today which use 1's
complement.  However, again, what's important for his code to
work is that the external format use two's complement.

There is one possible problem with the signed values, however.
A two's complement signed 32 bit int isn't guaranteed to be
representable in a long -- -2147483648 is a possible value, and
LONG_MIN is only guaranteed to be at least -2147483647.

> For all of their closeness to the system, you still have to
> rely on platform-specific behaviour when you fiddle with raw
> binary data in C++.

Much less than you'd imagine.  About all that you really need is
some sort of guarantee that all of the values in the external
format will fit in some type.  You could get around that by
systematically using a user defined Int32, but that's a lot of
extra work (and probably a significant performance overhead) for
a problem that most people won't have in practice.

--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: petek1976 <pete.karousos@earthlink.net>
Date: Mon, 26 Nov 2007 11:47:03 CST Raw View

On Nov 20, 5:49 pm, James Kanze <james.ka...@gmail.com> wrote:
> On Nov 19, 4:50 pm, Sebastian Redl <e0226...@student.tuwien.ac.at>
> wrote:
>
>
>
> > petek1976 wrote:
> > > unsigned long ExtractUInt32M(const unsigned char *apnOctets)
> > > {
> > >   const unsigned char *lpnOctets = apnOctets;
> > >   unsigned long lnValue = *(lpnOctets++);
> > >   lnValue = (lnValue << 8) | *(lpnOctets++);
> > >   lnValue = (lnValue << 8) | *(lpnOctets++);
> > >   lnValue = (lnValue << 8) | *(lpnOctets);
> > >   return lnValue;
> > >  }
> > 1) unsigned char isn't guaranteed to be 8 bits large.
> > 2) unsigned long isn't guaranteed to be 32 bits large.
> > Of the two, the latter is the realistic issue. GCC in 64-bit
> > mode has a 64-bit long. char is 8 bits on pretty much every
> > platform, except for some embedded systems that only support
> > word addressing and have 16- or 32-bit chars.
>
> Both are realistic issues.  But in this case, what counts is the
> external format: if the external format is 4 8 bit bytes, high
> byte first, that's what he's got to read.  All that his code
> above requires is that unsigned long be at least 32 bits (which
> is guaranteed), and that any unused upper bits of an unsigned
> char be 0 (and it's easy to get around that requirement by
> adding an "& 0xFF" in the appropriate places).
>
> > > In line 8 the process is completed by subtracting 2^31
> > > again, which will result in the correct negative value.
> > There is no guarantee that numbers use 2's complement either.
> > (Again, more of a theoretical issue.)
>
> No.  There are machines still being sold today which use 1's
> complement.  However, again, what's important for his code to
> work is that the external format use two's complement.
>
> There is one possible problem with the signed values, however.
> A two's complement signed 32 bit int isn't guaranteed to be
> representable in a long -- -2147483648 is a possible value, and
> LONG_MIN is only guaranteed to be at least -2147483647.
>
> > For all of their closeness to the system, you still have to
> > rely on platform-specific behaviour when you fiddle with raw
> > binary data in C++.
>
> Much less than you'd imagine.  About all that you really need is
> some sort of guarantee that all of the values in the external
> format will fit in some type.  You could get around that by
> systematically using a user defined Int32, but that's a lot of
> extra work (and probably a significant performance overhead) for
> a problem that most people won't have in practice.
>
> --
> James Kanze (GABI Software)             email:james.ka...@gmail.com
> Conseils en informatique orient   e objet/
>                    Beratung in objektorientierter Datenverarbeitung
> 9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-...@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ:http://www.comeaucomputing.com/csc/faq.html                     ]

I should have been a little more clear. I was making the assumption
that my ExtractSInt32M was taking in a series of unsigned octet
values. That is the precondition to my routine. Since C++ guarantees
that a non-negative integer type is simply a binary number I wanted
the ability to treat the unsigned char as a number between 0 and 255
if I needed to. Also I knew there was a slight portability problem
with subtracting 2147483648 since as you point out the guaranteed
minimum is -2147483647. However if the machine is twos complement is
it really possible that -2147483648 will not be representable? How can
this be? If this number really cannot be represented what is a more
portable approach? Breaking the subtraction into two steps?



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: James Kanze <james.kanze@gmail.com>
Date: Tue, 27 Nov 2007 14:21:26 CST Raw View

On Nov 26, 6:47 pm, petek1976 <pete.karou...@earthlink.net> wrote:
> On Nov 20, 5:49 pm, James Kanze <james.ka...@gmail.com> wrote:

    [...]
> > Much less than you'd imagine.  About all that you really need is
> > some sort of guarantee that all of the values in the external
> > format will fit in some type.  You could get around that by
> > systematically using a user defined Int32, but that's a lot of
> > extra work (and probably a significant performance overhead) for
> > a problem that most people won't have in practice.

> I should have been a little more clear. I was making the assumption
> that my ExtractSInt32M was taking in a series of unsigned octet
> values. That is the precondition to my routine.

I understood that.

> Since C++ guarantees
> that a non-negative integer type is simply a binary number I wanted
> the ability to treat the unsigned char as a number between 0 and 255
> if I needed to. Also I knew there was a slight portability problem
> with subtracting 2147483648 since as you point out the guaranteed
> minimum is -2147483647. However if the machine is twos complement is
> it really possible that -2147483648 will not be representable?

If the machine is two's complement, no.  But there's no
guarantee that the machine is two's complement.  (Of course, the
only machine I know of around today that isn't two's complement
uses 36 bit words, so you're OK.)

> How can this be? If this number really cannot be represented
> what is a more portable approach?

If you have a machine with 32 bit 1's complement or signed
magnitude longs, and it doesn't yet support long long, you have
a real problem representing values which are represented by 32
bit signed 2's complement integers in an external format.  In
short, it can't be done: you have 2^32-1 different values at
your disposition, and you need to be able to represent 2^32
different values.

Realistically, of course, it's not sure that you need such
portability.  Realistically, all new architectures will use 2's
complement.  I'm not aware of there ever having been a machine
with other than 2's complement and 32 bit word size; the 1's
complement and signed magnitude machines I've heard of have all
had larger word sizes (36, 48 or more).  So if no such machine
has ever existed, and no such machine ever will, do you really
have to consider it?

> Breaking the subtraction into two steps?

You have to do that anyway at the source level, since you cannot
write the constant 2147483648 and expect the program to compile.

--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]