Topic: byte" in C++?


Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 16 Dec 1994 19:12:40 GMT
Raw View
In article AFE@fwrdc.rtsg.mot.com, jiwaniw@pts.mot.com (Jefrem Iwaniw) writes:

>
>: So what is the solution?  Is it trying to find a way to get GCC to give
>: me 2 byte integers?  Or is it doing all file io by just using chars?
>
>Have you considered using a short?  It should evaluate to a 16-bit
>integer on most implementations.

Since this is comp.STD.c++, we have to note that type short may have
any number of bits, as long as it has at least 16. The original poster
wanted to know how he could guarantee a 16-bit data type. The answer is,
"You can't." Assuming 16-bit shorts is not portable.

---
Steve Clamage, stephen.clamage@eng.sun.com






Author: matt@physics7.berkeley.edu (Matt Austern)
Date: 17 Dec 1994 07:04:38 GMT
Raw View
In article <3csor8$f2m@engnews2.Eng.Sun.COM> clamage@Eng.Sun.COM (Steve Clamage) writes:

> Since this is comp.STD.c++, we have to note that type short may have
> any number of bits, as long as it has at least 16. The original poster
> wanted to know how he could guarantee a 16-bit data type. The answer is,
> "You can't."

That's not quite true: you can use a short (or an int, or a long), and
mask out all but the lowest 16 bits.
--

                               --matt




Author: inkari@snakemail.hut.fi (Juha Inkari)
Date: 19 Dec 1994 00:21:46 GMT
Raw View
>   any number of bits, as long as it has at least 16. The original poster
>   wanted to know how he could guarantee a 16-bit data type. The answer is,
>   "You can't." Assuming 16-bit shorts is not portable.

One can although test, if 16 bits is supported by the compiler (it
often is). The evil preprocessor symbols in <limits.h> will tell you
the minimum and maximum values supported by the basic types.

#include <limits.h>

#if USHRT_MAX == 65535
typedef unsigned short u16bits;
#else
#error Get a compiler that supports 16 bit native types.
#endif

Instead giving an error message, one could write a class that handles
the 16 bits with a bit field, or masking the > 16 bits out from the
wider unsigned short type.

--
/* Juha.Inkari@hut.fi */




Author: jiwaniw@pts.mot.com (Jefrem Iwaniw)
Date: Thu, 15 Dec 1994 18:50:38 GMT
Raw View
David Beavon (beavond@river.it.gvsu.edu) wrote:
: It seems strange to me that there are no integer variables in C that can
: be a constant number of bytes from one compiler to the next or from one
: machine to the next.  For example, I will show you where I have run into
: problems with this.

: Say you are working with a GIF file decoder and you know that bytes 7 and
: 8 contain the screen width as a two byte integer.  The person who has
: written the code for the GIF file decoder assumes two byte integers and
: simply freads from the file into the int variable "width" at the seventh
: byte.  This works fine if you *can* assume two byte integers.  What if
: GCC 2.6 gives me four byte ints, however?  Then I read a false value in
: and at the same time advance my file pointer too far.  Bad news, I tell you.

: So what is the solution?  Is it trying to find a way to get GCC to give
: me 2 byte integers?  Or is it doing all file io by just using chars?

Have you considered using a short?  It should evaluate to a 16-bit
integer on most implementations.


: BTW, I compile in MSDOS with a pentium.
: Thanks in advance.
: --
: ---------------------------
:   David Beavon (hack)
:   beavond@river.it.gvsu.edu

--
-Jefrem Iwaniw
 Motorola
 Global Paging Control Systems
 **************************************************
 * These opinions are mine, and don't necessarily *
 * represent those of my employer in any way.     *
 **************************************************




Author: beavond@river.it.gvsu.edu (David Beavon)
Date: 4 Dec 1994 05:03:31 GMT
Raw View
It seems strange to me that there are no integer variables in C that can
be a constant number of bytes from one compiler to the next or from one
machine to the next.  For example, I will show you where I have run into
problems with this.

Say you are working with a GIF file decoder and you know that bytes 7 and
8 contain the screen width as a two byte integer.  The person who has
written the code for the GIF file decoder assumes two byte integers and
simply freads from the file into the int variable "width" at the seventh
byte.  This works fine if you *can* assume two byte integers.  What if
GCC 2.6 gives me four byte ints, however?  Then I read a false value in
and at the same time advance my file pointer too far.  Bad news, I tell you.

So what is the solution?  Is it trying to find a way to get GCC to give
me 2 byte integers?  Or is it doing all file io by just using chars?

BTW, I compile in MSDOS with a pentium.
Thanks in advance.
--
---------------------------
  David Beavon (hack)
  beavond@river.it.gvsu.edu




Author: matt@physics7.berkeley.edu (Matt Austern)
Date: 04 Dec 1994 07:23:20 GMT
Raw View
In article <3brij3$eac@news.it.gvsu.edu> beavond@river.it.gvsu.edu (David Beavon) writes:

> It seems strange to me that there are no integer variables in C that can
> be a constant number of bytes from one compiler to the next or from one
> machine to the next.

It's a fact of life that word sizes vary from machine to machine.
Some machine architectures have 16-bit words, some have 32-bit words,
and some have 36-bit words.  A standard can't ignore that fact.

On some machines, the hardware support for (say) a 32-bit integral
data type just isn't there.  It's better for the C standard (and, by
extension, the C++ standard) to admit that machine architectures vary,
rather than dictating requirements for an implementation that, on
some architectures, just can't be satisfied.
--

                               --matt




Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Tue, 6 Dec 1994 14:59:52 GMT
Raw View
In article <3brij3$eac@news.it.gvsu.edu> beavond@river.it.gvsu.edu (David Beavon) writes:
>It seems strange to me that there are no integer variables in C that can
>be a constant number of bytes from one compiler to the next or from one
>machine to the next.

 unsigned long readLittleEndian2Byte(FILE *);

--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,
        81A Glebe Point Rd, GLEBE   Mem: SA IT/9/22,SC22/WG21
        NSW 2037, AUSTRALIA     Phone: 61-2-566-2189




Author: swf@elsegundoca.ncr.com (Stan Friesen)
Date: Thu, 8 Dec 1994 17:34:33 GMT
Raw View
In article <3brij3$eac@news.it.gvsu.edu>, beavond@river.it.gvsu.edu (David Beavon) writes:
|>
|> Say you are working with a GIF file decoder and you know that bytes 7 and
|> 8 contain the screen width as a two byte integer.  The person who has
|> written the code for the GIF file decoder assumes two byte integers ...

Then he wrote the code wrong.  This is not portable.

|>  This works fine if you *can* assume two byte integers.

No it doesn't.  It *also* assumes that the two bytes in the file are
in the same order as the native 2-byte integers on the machine.

This is not a justified assumption either.

In the GIF file format, ae the 16 bit integers done high order byte
first or low order byte first?  You need to know this.

[Below I will assume low order byte first, as that is the Intel
hardware ordering].

|>  What if
|> GCC 2.6 gives me four byte ints, however?  Then I read a false value in
|> and at the same time advance my file pointer too far.  Bad news, I tell you.

Not if you code the reader correctly.  The portable way is to do something
like the following:

 char  b1, b2;
 short ans;
 FILE  *fp;

 b1 = getc(fp);
 b2 = getc(fp);
 ans = b1 + 256*b2;
[strictly speaking you need to check for EOF after each byte].

And, the file needs to be *written* in a similar manner, to make sure
that the output byte order is correct.
|>
|> So what is the solution?  Is it trying to find a way to get GCC to give
|> me 2 byte integers?  Or is it doing all file io by just using chars?

The last.  This is the *only* portable way, as byte order varies, not
just machine word size.

--
swf@elsegundoca.ncr.com  sarima@netcom.com

The peace of God be with you.




Author: tohoyn@janus.otol.fi (Tommi H|yn{l{nmaa)
Date: 26 Nov 1994 08:27:00 GMT
Raw View
 Although the standard doesn't specify the sizes of the integer
types (and it shouldn't), I think that the language should also have
integer types whose size is fixed. This would help writing portable code,
because programmers sometimes need such types. Nowadays it's common that
the programmers assume that e.g. byte == 8 bits, but the standard doesn't
guarantee this. The language could have types like int8, int16, int32
(and uint8, uint16, uint32), whose size in bits would be same in every
implementation. If the target machine didn't support the needed word
length, then the next greater length could be used. One alternative would
also be that arbitrary bit sizes could be specified for integer types, e.g
  int<size in bits> i;
  and
  uint<size in bits> j;

    Tommi H|yn{l{nmaa





Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Sat, 26 Nov 1994 16:36:25 GMT
Raw View
In article <3b6rgk$193@tethys.otol.fi> tohoyn@janus.otol.fi (Tommi H|yn{l{nmaa) writes:
> Although the standard doesn't specify the sizes of the integer
>types (and it shouldn't), I think that the language should also have
>integer types whose size is fixed.

 ISO C specified minimum sizes.

 C++ does not.

 Neither language supports the kind of specification
Pascal provides -- user defined integral range minima.

 Portable and fixed sized integers make sense for
an architecture specific binding of the Standard -- NOT
the Standard itself.

>because programmers sometimes need such types. Nowadays it's common that
>the programmers assume that e.g. byte == 8 bits, but the standard doesn't
>guarantee this.

 It can't. Some machines do not have 8 bit bytes.

>length, then the next greater length could be used. One alternative would
>also be that arbitrary bit sizes could be specified for integer types, e.g
>  int<size in bits> i;
>  and
>  uint<size in bits> j;

 How about:

 int : 16 j;

which you can already write?


--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,
        81A Glebe Point Rd, GLEBE   Mem: SA IT/9/22,SC22/WG21
        NSW 2037, AUSTRALIA     Phone: 61-2-566-2189




Author: "Ronald F. Guilmette" <rfg@rahul.net>
Date: 27 Nov 1994 09:52:51 GMT
Raw View
In article <CzvvGq.Bx4@ucc.su.OZ.AU>,
John Max Skaller <maxtal@physics.su.OZ.AU> wrote:
>In article <3b6rgk$193@tethys.otol.fi> tohoyn@janus.otol.fi (Tommi H|yn{l{nmaa) writes:
>> Although the standard doesn't specify the sizes of the integer
>>types (and it shouldn't), I think that the language should also have
>>integer types whose size is fixed.
>
> ISO C specified minimum sizes.
>
> C++ does not.

Ah, excuse me John, but I guess I haven't been paying close enough attention.

Isn't <limits.h> and all of the rules applicable thereto imported (by
reference) into the (draft) C++ standard?

--

-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- E-mail: rfg@segfault.us.com ----------- Purveyors of Compiler Test ----
-------------------------------------------- Suites and Bullet-Proof Shoes -




Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Sun, 27 Nov 1994 13:06:26 GMT
Raw View
In article <3b9ktj$pks@hustle.rahul.net> "Ronald F. Guilmette" <rfg@rahul.net> writes:
>In article <CzvvGq.Bx4@ucc.su.OZ.AU>,
>John Max Skaller <maxtal@physics.su.OZ.AU> wrote:
>>In article <3b6rgk$193@tethys.otol.fi> tohoyn@janus.otol.fi (Tommi H|yn{l{nmaa) writes:
>>> Although the standard doesn't specify the sizes of the integer
>>>types (and it shouldn't), I think that the language should also have
>>>integer types whose size is fixed.
>>
>> ISO C specified minimum sizes.
>>
>> C++ does not.
>
>Ah, excuse me John, but I guess I haven't been paying close enough attention.
>
>Isn't <limits.h> and all of the rules applicable thereto imported (by
>reference) into the (draft) C++ standard?

 We have limits.h, yes. And even some classes
to suit testing the limits in templates -- which are near
to LIA-1 compliant as well.

 But no, all the rules applicable thereto
are not imported, there are no required minimum or maximum
limits at all in the WD.

 As per Munich, implementation quantities must be
specified by the vendor -- but there are no restrictions
on them.

 So a conformance tester cannot disqualify a compiler
because it does not support 3 levels of nested brackets,
provided the vendor documentation says "supports
2 levels of nested brackets" or whatever.


--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,
        81A Glebe Point Rd, GLEBE   Mem: SA IT/9/22,SC22/WG21
        NSW 2037, AUSTRALIA     Phone: 61-2-566-2189




Author: schuenem@Informatik.TU-Muenchen.DE (Ulf Schuenemann)
Date: 29 Nov 1994 14:51:01 GMT
Raw View
In article <CzvvGq.Bx4@ucc.su.OZ.AU>, maxtal@physics.su.OZ.AU (John Max Skaller) writes:
[..]
|>  How about:
|>
|>  int : 16 j;
|>
|> which you can already write?

I thought this is called bitfield and works only for declaration
of members of a strcuture and not e.g. for global variables,
function parameters etc.

So can I define:

 typedef signed  :8 sbyte;
 typedef unsigned:8 ubyte;
 typedef signed  :16 sint16;
 typedef unsigned:16 uint16;
 typedef signed  :32 sint32;
 typedef unsigned:32 uint32;

???

Ulf Schuenemann

--------------------------------------------------------------------
Ulf Sch   nemann
Institut f   r Informatik, Technische Universit   t M   nchen.
email: schuenem@informatik.tu-muenchen.de




Author: kanze@us-es.sel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 23 Nov 1994 19:19:16 GMT
Raw View
In article <smeyersCzJB7w.Ipo@netcom.com> smeyers@netcom.com (Scott
Meyers) writes:

|> Page 56 of the ARM points out that there is no such thing as a "byte" in
|> C++;  all we know is that sizeof(char) == 1.  As I understand it, this is
|> different from ANSI C, which does define what a byte is.  I was recently
|> told by a member of the ISO committee that the type "unsigned char" is
|> defined to be a "byte" in C++.  I don't yet have a copy of the latest WP
|> (I'm waiting until the actions from Valley Forge are added), so could
|> someone confirm that C++ now defines a byte?

Unless something changed at the last meeting, the only reference to
"byte" in the index of the working paper is in section 5.3.3, where it
says: "The sizeof operator yields th size, in bytes, of its
operand.  [...]  A byte is unspecified by the language except in terms
of the value of sizeof; sizeof(char) is 1..."
--
James Kanze      Tel.: (+33) 88 14 49 00     email: kanze@lts.sel.alcatel.de
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                              -- Beratung in industrieller Datenverarbeitung






Author: mskuhn@cip.informatik.uni-erlangen.de (Markus Kuhn)
Date: Thu, 24 Nov 1994 17:57:52 GMT
Raw View
mcastle@umr.edu (Mike Castle) writes:

>I don't have the ARM, so I look this up myself.  But, what
>happens on a system that uses unicode by default?  Or is this a
>non-problem for programming languages?



Author: smeyers@netcom.com (Scott Meyers)
Date: Sat, 19 Nov 1994 21:47:56 GMT
Raw View
Page 56 of the ARM points out that there is no such thing as a "byte" in
C++;  all we know is that sizeof(char) == 1.  As I understand it, this is
different from ANSI C, which does define what a byte is.  I was recently
told by a member of the ISO committee that the type "unsigned char" is
defined to be a "byte" in C++.  I don't yet have a copy of the latest WP
(I'm waiting until the actions from Valley Forge are added), so could
someone confirm that C++ now defines a byte?

Thanks,

Scott




Author: mcastle@umr.edu (Mike Castle)
Date: 20 Nov 1994 06:08:51 GMT
Raw View
In article <smeyersCzJB7w.Ipo@netcom.com>,
Scott Meyers <smeyers@netcom.com> wrote:
>Page 56 of the ARM points out that there is no such thing as a "byte" in
>C++;  all we know is that sizeof(char) == 1.  As I understand it, this is

I don't have the ARM, so I look this up myself.  But, what
happens on a system that uses unicode by default?  Or is this a
non-problem for programming languages?

*pondering what happens if we try to read in an array of char of
2 byte characters*

mrc
--
Mike Castle .-=NEXUS=-.  Life is like a clock:  You can work constantly
  mcastle@cs.umr.edu     and be right all the time, or not work at all
   mcastle@umr.edu       and be right at least twice a day.  -- mrc
    We are all of us living in the shadow of Manhattan.  -- Watchmen




Author: matt@physics2.berkeley.edu (Matt Austern)
Date: 20 Nov 1994 07:27:13 GMT
Raw View
In article <3amp5j$27r@hptemp1.cc.umr.edu> mcastle@umr.edu (Mike Castle) writes:

> I don't have the ARM, so I look this up myself.  But, what
> happens on a system that uses unicode by default?  Or is this a
> non-problem for programming languages?

Who ever said that a byte was 8 bits?

There's nothing in the C++ "standard" (or in the C standard) that
guarantees that the language supports a fundamental 8-bit data type.
Which is just as it should be, considering that there are some
computers where the machine architecture has no 8-bit data type.  (I'm
thinking, for example, of some of the machines with 36-bit words.)
--

                               --matt




Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Mon, 21 Nov 1994 21:44:43 GMT
Raw View
In article <smeyersCzJB7w.Ipo@netcom.com> smeyers@netcom.com (Scott Meyers) writes:
>Page 56 of the ARM points out that there is no such thing as a "byte" in
>C++;  all we know is that sizeof(char) == 1.  As I understand it, this is
>different from ANSI C, which does define what a byte is.  I was recently
>told by a member of the ISO committee that the type "unsigned char" is
>defined to be a "byte" in C++.  I don't yet have a copy of the latest WP
>(I'm waiting until the actions from Valley Forge are added), so could
>someone confirm that C++ now defines a byte?

 There is an intent, enacted or not, in BOTH C and C++,
that at least unsigned char "acts" as a byte. In particular,
if you can somehow legally alias some object with an array of
unsigned char, then copying stuff around with unsigned char
is the same as using memcpy -- other types may not have bits
in them that unsigned chars cannot "see".

 For example, a 7 bit unsigned char (size 1) on a system with
a 15 bit integer  (size 2) would not be conforming.

 As far as I know, there is no guarrantee in C++ that this
be true of "char". It certainly isn't for signed char.

 Exactly what this means depends entirely on what aliasing
is well defined: if none, it makes no difference anyhow.

 However, an uninitialised unsigned char may be copied
without error and will have an unspecified value.
(I.e. it WILL be one of the values an unsigned char can take).
That means all possible bitpatterns of its store must correspond to
legal values. As such, it is the only type with that property:

 void f() {
  char x[100];
  strcpy(x,"Hello");
  char y[100];
  for(int i=0; i<100; ++i) y[i]=x[i];
 };

This fragment is undefined! Note also that ALL members of structs
which are not unsigned chars MUST be initialised if a compiler
generated copy constructor (or assignment) is executed -- or that too will
be undefined.

 struct X { int x, y; } a, b;
 a.x=1;
 b = a; // error: undefined behaviour copying a.y

This makes sense: copying

 struct P { T* t; float f; }

might be optimised by using registers -- and fail if the members
t and f were not properly initialised.  This means either initialise
all members -- or write a copy constructor that calls memcpy().


--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,
        81A Glebe Point Rd, GLEBE   Mem: SA IT/9/22,SC22/WG21
        NSW 2037, AUSTRALIA     Phone: 61-2-566-2189




Author: Chuck Allison <72640.1507@CompuServe.COM>
Date: 22 Nov 1994 20:17:57 GMT
Raw View
>I don't have the ARM, so I look this up myself.  But, what
>happens on a system that uses unicode by default?  Or is this a
>non-problem for programming languages?
>
>*pondering what happens if we try to read in an array of char of
>2 byte characters*

Sounds like an array of wchar_t to me.

-- Chuck Allison

--
Chuck Allison
Compuserve: 72640,1507
INTERNET: 72640.1507@compuserve.com