Thread

Topic: Big-endian/little-endian identifier

Author: Alex Bache <bache@sil.com>
Date: 1997/02/28 Raw View

Paul Black wrote:
>
> Hmmmm... You need to know the endianness of the target platform to
> write portable code? With shifting and masking, code to deal
> with extracting bytes from ints and longs can be made portable.
> For reassembly, use shifting and adding. It probably isn't as fast
> as knowing the endianness and accessing the data through a "char *"
> but it is portable.
>

Yes.  That was my first thought, but then I thought that if I knew the
way the numbers are arranged I could speed things up a bit.  It was with
reference to a CRC function I was writing.

Anyway, I suppose the alternative to shifting and masking would be to
convert to ASN.1 standard using htonl(), perform the operations using
char * types, then convert back to host standard using ntohl().
However, if the bit ordering of the machine is reversed, you couldn't do
this, could you?  I mean compare a char * from the ASN.1 notation to a
number such as

if (value[1] == 0xff)

So I suppose the only way of making it truly portable would be a
draconian use of htonl() and htons() calls on everything?

Anyway, enough on this topic.  It's pretty off-subject.

Alex.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: herbs@cntc.com (Herb Sutter)
Date: 1997/02/28 Raw View

"Paul Black" <paul.black@vf.vodafone.co.uk> wrote:
>With shifting and masking, code to deal
>with extracting bytes from ints and longs can be made portable.

I know you specified integral types, but note first that this is definitely
untrue for floating-point types (mainly because of other representational
differences, such as #bits in mantissa).  It is almost as untrue for
integral types, because even though current platforms use pretty compatible
representations, you still need to deal with size issues which are
inherently nonportable.

If you want portability, why reinvent the wheel?  Use ASN.1 or equivalent.

---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp.
2695 North Sheridan Way, Suite 150, Mississauga ON Canada   L5K 2N6
Tel 416-805-9088   Fax 905-822-3824
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: "Sean L. Palmer" <seanpalmer@mindspring.com>
Date: 1997/03/03 Raw View

> I think it would be nice to have some sort of way of determining at
> compile time if the machine stores numbers internally as big or little
> endian numbers.  E.g. the number 0x01FF would be stored thus:

Of you ask me, the best addition would be a standard header which
implemented reading/storing various kinds of "portable" data types to/from
the machine-specific data types.  It would allow me to save an int as an
int32 for instance if 32 bits was sufficient to hold it, and read it back
as an int.  I could specify whether it should be little-endian or
big-endian, or another format if one is known.  Similar simple conversions
for ieee floats and strings would help too.

For instance,

//this header is typical of one found on an intel platform. Other platforms
might support a different conversion set.
//Many of these simple formats should be supported on all platforms if
possible so that simple binary files would
//be portable.

class StorageWriter {
  void* data;  //other platforms might need a bit index as well
  unsigned bytesremaining;
public:
  StorageWriter(void* data, unsigned bytes);
  enum ikind { unknown, little_endian, big_endian };
  store_int64(__int64 val, ikind=little_endian);
  store_int32(long val, ikind=little_endian);
  store_int16(short val, ikind=little_endian);
  store_int8(signed char val, ikind=little_endian);
  store_unsigned64(unsigned __int64 val, ikind=little_endian);
  store_unsigned32(unsigned long val, ikind=little_endian);
  store_unsigned16(unsigned short val, ikind=little_endian);
  store_unsigned8(unsigned char val, ikind=little_endian);
  enum fkind { unknown, ieee, bcd4 };
  store_real32(float val, ikind=ieee);
  store_real64(double val, ikind=ieee);
  store_real80(long double val, ikind=ieee);
  enum skind { asciiz, unsigned8lenprefix,
unsigned16lenprefix,unsigned32lenprefix };
  store_string8(string val, skind=asciiz);
  store_string16(wstring val, skind=asciiz);
};
class StorageReader {
  const void* data;
  unsigned bytesremaining;
public:
  StorageReader(const void* data);
  enum kind { unknown, little_endian, big_endian };
  __int64 load_int64(kind=little_endian);
  long load_int32(kind=little_endian);
  short load_int16(kind=little_endian);
  signed char load_int8(kind=little_endian);
  unsigned __int64 load_unsigned64(kind=little_endian);
  unsigned long load_unsigned32(kind=little_endian);
  unsigned short load_unsigned16(kind=little_endian);
  unsigned char load_unsigned8(kind=little_endian);
  enum fkind { unknown, ieee };
  float load_real32(ikind=ieee);
  double load_real64(ikind=ieee);
  long double load_real80(ikind=ieee);
  enum skind { asciiz, unsigned8lenprefix,
unsigned16lenprefix,unsigned32lenprefix };
  string load_string8(skind=asciiz);
  wstring load_string16(skind=asciiz);
};



Of course, the actual type returned would vary over platform, but would be
sufficient to hold the required number of bits.

This would probably come with some typedefs which would map platform types
which had sufficient storage capability onto specific int sizes.

For instance, on the intel platform:

typedef unsigned char unsigned8;
typedef unsigned short unsigned16;
typedef unsigned int unsigned32;
typedef unsigned __int64 unsigned64;

typedef signed char signed8;
typedef signed short signed16;
typedef signed int signed32;
typedef signed __int64 signed64;

typedef float real32;
typedef double real64;
typedef long double real80;

Of course if the platform has no way to actually hold a kind of value,
their headers would be missing those types.  I presume the standard could
allow this so that they would still be standard compliant even though they
had no int64 type or real80 type for instance.  This way if the hardware
had no ieee float type, and the vendor decided not to supply an ieee
library, then they could omit the real32, real64, real80 types and
conversions and any programs that tried to do i/o with those types would
not compile, which is good because they wouldn't work in any case.

The main purpose of this file would be to allow a program to use
semi-standard data types for reading and writing binary files for
conversion to other platforms.  If the header supported the format, all
would be well.  I think for little and big-endian integer formats and ieee
flavors, this should not be a problem for most platforms to write
conversion functions for, no matter what the hardware int size is, unless
of course there just is no type available which will hold the data in
question in which case it could use a struct of some kind or force the user
to supply a type.


I think most people here would agree that the need for the ability to
portably handle binary files is real.  For simple formats such as these
which are in widespread use, it seems silly to not even allow their
conversion, even though hardware limits might preclude their actual use
directly.

The above code was just a quick hack and probably needs help, but I'd
appreciate any comments about the idea.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: d96-mst@nada.kth.se (Mikael St ldal)
Date: 1997/03/03 Raw View

In article <3316fbc2.70158312@herbs>, herbs@cntc.com (Herb Sutter) wrote:

>If you want portability, why reinvent the wheel?  Use ASN.1 or equivalent.

Maybe one needs to write code according to an already-existing
specification that uses specified byte-order and not ASN.1. For example
read/write a binary file, or a binary network protocol.
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: john@interlog.com (John R MacMillan)
Date: 1997/02/26 Raw View

|Define this structure in a union with a long and there you have it - a
|portable way of accessing parts of a long without having to do shift
|operations explicitly.
|
|Would there be any problems with this approach?

Well, it's not really portable, since accessing a union through a
different member than was last stored is not portable (implementation
defined in C with one exception that's not relevant here; it's not as
clear to me what the effect is in C++, though my reading of the April95
DWP implies it is not allowed).

Also, it rather restricts the sizes of the integers (what do I call the
``3rd left from middle'' byte in my 16 byte long?  what if I refer to
that member name on an implementation that only has 4 bytes?).

The current restrictions are not really all that restrictive, and a
good, portable solution seems difficult or impossible.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: "Paul Black" <paul.black@vf.vodafone.co.uk>
Date: 1997/02/26 Raw View

Alex Bache <bache@sil.com> wrote:
> I think it would be nice to have some sort of way of determining at
> compile time if the machine stores numbers internally as big or little
> endian numbers.  E.g. the number 0x01FF would be stored thus:
>
> Offset      0    1
>
> Big E      01   ff
> Little E   ff   01
>
>
> There are cases where you need to know this, in order to write portable
> code.  Sure, you can write a function to work it out, but you can't
> determine it at compile time.

Hmmmm... You need to know the endianness of the target platform to
write portable code? With shifting and masking, code to deal
with extracting bytes from ints and longs can be made portable.
For reassembly, use shifting and adding. It probably isn't as fast
as knowing the endianness and accessing the data through a "char *"
but it is portable.

Paul
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: Branko Cibej <branko.cibej@hermes.si>
Date: 1997/02/27 Raw View

Alex Bache wrote:
> Okay then, if there are such complexities as 2134 or something like
> that, why not
> have some sort of structure you could use to extract the bytes making up
> a long/short?
[definitions of struct byte_order deleted]
> This could even be extended to machines with byte sizes other than 8
> bits, using bit fields.
>
> Define this structure in a union with a long and there you have it - a
> portable way of accessing parts of a long without having to do shift
> operations explicitly.
>
> Would there be any problems with this approach?

1) AFAIK, the result of writing one type into a union and reading
another type out of it is undefined, which means the following code
fragment is not portable:

    union { long n; struct byte_order bo; } unendify;
    unendify.bo.lsb = 7;
    unendify.bo.msb = unendify.bo.lsb;  // ok, but...
    ++unendify.n;   // ...undefined

2) The representation of bitfields within a struct is
implementation-defined, therefore code that depends on a certain
representation is not portable.


That's not even taking into account type size, bit order, etc. which may
vary considerably, as other posters have already pointed out.
--
------------------------------------------------------------------------
Branko Cibej      HERMES SoftLab, Litijska 51, 1000 Ljubljana,  Slovenia
brane@hermes.si   phone: (++386 61) 186 53 49  fax: (++386 61) 186 52 70
------------------------------------------------------------------------
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: Alex Bache <bache@sil.com>
Date: 1997/02/24 Raw View

I think it would be nice to have some sort of way of determining at
compile time if the machine stores numbers internally as big or little
endian numbers.  E.g. the number 0x01FF would be stored thus:

Offset      0    1

Big E      01   ff
Little E   ff   01


There are cases where you need to know this, in order to write portable
code.  Sure, you can write a function to work it out, but you can't
determine it at compile time.

If you're processing a lot of data in a small loop (as in a CRC check
for instance) it's quite a bit of extra cost to do something like

if (big_endian)
{
   process_big_endian();
}
else
{
   process_little_endian();
}

For each iteration of the loop.  How about having a standard #define in
one of the
libraries so you can do

#ifdef BIG_ENDIAN
  process_big_endian();
#else
  process_little_endian();
#endif

It's probably not much to do with the C++ language, more the
preprocessor, but it would be nice to have it in the standard.

Alex Bache.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: Stephen.Clamage@Eng.Sun.COM (Steve Clamage)
Date: 1997/02/24 Raw View

In article 2111DAF8@sil.com, Alex Bache <bache@sil.com> writes:
>I think it would be nice to have some sort of way of determining at
>compile time if the machine stores numbers internally as big or little
>endian numbers.  ...
>There are cases where you need to know this, in order to write portable
>code.  Sure, you can write a function to work it out, but you can't
>determine it at compile time.

The suggestion assumes that numbers are either big-endian or little-
endian. That is not a requirement. I once used a system where 4-byte
integers were stored not as 1-2-3-4 or 4-3-2-1, but as 2-1-4-3. Also,
some floating-point on the VAX stores the most-significant byte near
the middle of the storage allocated for it, much like those funny
integers. Once loaded into registers, the bytes get straightened
out -- load/store always does the right thing, as does arithmetic,
whether in registers or directly in memory.

You only get into trouble when accessing individual bytes directly
in memory, in effect casting from int* or double* to char*.

The relationship between the address order of bytes in memory and
the significance of the bytes is not limited to two possibilities.
Add to this the fact that it is common for bytes to be 9, 32, or 36
bits. I think that finding a way to give you all the information you
really need in the high-level language is not going to succeed.

The alternatives would seem to be
1. status quo -- you are on your own;
2. restrict C++ to platforms with "nice" characteristics (as Java does);
3. provide language support for only part of the job.

Actually #3 is also status quo, but we could argue about how much
additional support is appropriate. I suppose we could add something
that described byte order as one of "big-endian, little-endian, neither".

Hmmm. What if short, int, and long did not all have the same endianness?
In my example above, shorts were "little-endian" but longs were "neither".
Should we include floating-point in the endianness specification? VAX
integers are all "little-endian", but floating-point varies -- floats
are "little-endian", doubles are "neither", if I remember correctly.

Having "neither" as a possibility means that portable code will still
have to use other means if "neither" is true. It seems to me the
endianness addition doesn't buy very much.
---
Steve Clamage, stephen.clamage@eng.sun.com
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: jgamble@ripco.com (John M. Gamble)
Date: 1997/02/25 Raw View

In article <331192A0.2111DAF8@sil.com>, Alex Bache  <bache@sil.com> wrote:
>I think it would be nice to have some sort of way of determining at
>compile time if the machine stores numbers internally as big or little
>endian numbers.  E.g. the number 0x01FF would be stored thus:
>
>Offset      0    1
>
>Big E      01   ff
>Little E   ff   01
>
[snip]
>
>For each iteration of the loop.  How about having a standard #define in
>one of the
>libraries so you can do
>
>#ifdef BIG_ENDIAN
>  process_big_endian();
>#else
>  process_little_endian();
>#endif
>

This is probably more appropriate for a "Standard C" newsgroup.
Also, simple big/little endian distinction is no longer sufficient.
We have 4-byte integers now, with 8-byte integers looming.
A "byte-order constant" might be more useful.  E.g,

#define END16 21 /* Little endian */
#define END32 2143

which would give the order of the bytes.  This could be extended
for longer integer types.

 -john

jgamble@ripco.com

February 28 1997: Last day libraries can order catalogue cards
from the Library of Congress.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: Branko Cibej <branko.cibej@hermes.si>
Date: 1997/02/25 Raw View

Alex Bache wrote:
> I think it would be nice to have some sort of way of determining at
> compile time if the machine stores numbers internally as big or little
> endian numbers.  E.g. the number 0x01FF would be stored thus:
>
> Offset      0    1
>
> Big E      01   ff
> Little E   ff   01

The problem of byte (and bit) order can't be solved with a simple
preprocessor symbol. The standard doesn't even require that bytes are
octets. Consider how the number could be stored on a machine with 9-bit
bytes:

    Offset      0    1

    Big E      1ff  N/A
    Little E   1ff  N/A

or on one with 7-bit bytes:

    Offset      0    1

    Big E      03   7f
    Little E   7f   03

There is also a question of generality here -- i.e., would a language or
library extension for 'ordering' bytes be useful in a large number of
applications? Personally I don't believe so. IMHO this problem falls
into the domain of low-level programming, where portability depends on
much more than just the definition of C++ (e.g., operating system
requirements, hardware dependencies, etc.)

--
------------------------------------------------------------------------
Branko Cibej      HERMES SoftLab, Litijska 51, 1000 Ljubljana,  Slovenia
brane@hermes.si   phone: (++386 61) 186 53 49  fax: (++386 61) 186 52 70
------------------------------------------------------------------------
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: herbs@cntc.com (Herb Sutter)
Date: 1997/02/25 Raw View

Alex Bache <bache@sil.com> wrote:
>I think it would be nice to have some sort of way of determining at
>compile time if the machine stores numbers internally as big or little
>endian numbers.

As you note, the reason you'd want this is for binary value portability.
However, note that for floating-point numbers even knowing endianness is
not enough... there can be arbitrary representational differences beyond
simple byte ordering.  Is it really useful to have just an "endian" flag?

Seems to me it's better just to stick with ASN1 or other standards.

---
Herb Sutter (herbs@cntc.com)

Current Network Technologies Corp.
3100 Ridgeway, Suite 42, Mississauga ON Canada L5L 5M5
Tel 416-805-9088  Fax 905-608-2611
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: "Joe Halpin" <jhalpin@nortel.ca>
Date: 1997/02/25 Raw View

In article <331192A0.2111DAF8@sil.com>, Alex Bache  <bache@sil.com> wrote:
>I think it would be nice to have some sort of way of determining at
>compile time if the machine stores numbers internally as big or little
>endian numbers.  E.g. the number 0x01FF would be stored thus:

[...]

>There are cases where you need to know this, in order to write portable
>code.  Sure, you can write a function to work it out, but you can't
>determine it at compile time.

There are already ways to do this, although they're not part of the
language. For example, the makefile should know what kind of machine
it's building things on, and can define a macro as needed. We
sometimes wrap the makefile in a script which determines the machine
architecture, and defines the macro as needed before running make.

Joe
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: Alex Bache <bache@sil.com>
Date: 1997/02/25 Raw View

Okay then, if there are such complexities as 2134 or something like
that, why not
have some sort of structure you could use to extract the bytes making up
a long/short?

Big-endian Example:

// For a long (1,2,3,4)

struct byte_order
{
   unsigned char msb;
   unsigned char top_middle;
   unsigned char bottom_middle;
   unsigned char lsb;
};


Wierd architecture (2,1,4,3)

struct byte_order
{
   unsigned char top_middle;
   unsigned char msb;
   unsigned char lsb;
   unsigned char bottom_middle;
};


This could even be extended to machines with byte sizes other than 8
bits, using bit fields.

Define this structure in a union with a long and there you have it - a
portable way of accessing parts of a long without having to do shift
operations explicitly.

Would there be any problems with this approach?

Alex Bache.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: David R Tribble <david.tribble@central.beasys.com>
Date: 1997/02/25 Raw View

Alex Bache <bache@sil.com> suggested:
> I think it would be nice to have some sort of way of determining at
> compile time if the machine stores numbers internally as big or little
> endian numbers.
> ...
>
> #ifdef BIG_ENDIAN
>   process_big_endian();
> #else
>   process_little_endian();
> #endif

I suggested that very thing a few years ago (Jun 95) to the ANSI C committee.
I suggested that several macros be added to some of the standard headers
(such as <limits.h>) that specify characteristics of the implementation,
such as byte order, native character set, bitfield fill order, CPU model
name, compiler vendor name, number of preprocessor macros allowed, etc.
I called this proposal "Machine Characteristics".

It was rejected (actually it was shelved), because "no champion (sponsor)
could be found for it".

You can download the original proposal from this FTP location:
    ftp://ftp.dmk.com/DMK/sc22wg14/c9x/To-be-considered/
The filename is "machine-characteristics.txt.gz", which is an ASCII text
file in 'gzip' form.
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: herbs@cntc.com (Herb Sutter)
Date: 1997/02/25 Raw View

Alex Bache <bache@sil.com> wrote:
>Okay then, if there are such complexities as 2134 or something like
>that, why not

The problem isn't as easy to solve as you may think.  ANYTHING may be
different... not just the byte order of the internal representation, but
the bitwise meaning of the internal representation (esp. for floating-point
types) and particularly the size of the builtin type.

>have some sort of structure you could use to extract the bytes making up
>a long/short?
>
>Big-endian Example:
>
>// For a long (1,2,3,4)
>
>struct byte_order
>{
>   unsigned char msb;
>   unsigned char top_middle;
>   unsigned char bottom_middle;
>   unsigned char lsb;
>};

This assumes that a long is four bytes.  It's not, necessarily.  And even
if it is, the ordering of the bytes may not be the only difference... for
example, what if my particular implementation decided to have the sign bit
last, not first?  For floating-point types, what if my implementation uses
two more bits for the mantissa than yours does?

>Wierd architecture (2,1,4,3)
>
>struct byte_order
>{
>   unsigned char top_middle;
>   unsigned char msb;
>   unsigned char lsb;
>   unsigned char bottom_middle;
>};

Ditto.

>This could even be extended to machines with byte sizes other than 8
>bits, using bit fields.
>
>Define this structure in a union with a long and there you have it - a
>portable way of accessing parts of a long without having to do shift
>operations explicitly.
>
>Would there be any problems with this approach?

Yes. :-)

---
Herb Sutter (mailto:herbs@cntc.com)

Current Network Technologies Corp.
2695 North Sheridan Way, Suite 150, Mississauga ON Canada   L5K 2N6
Tel 416-805-9088   Fax 905-822-3824
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: jefolts@oasis.novia.net (Jeff Folts)
Date: 1997/02/25 Raw View

Steve Clamage (Stephen.Clamage@Eng.Sun.COM) wrote:
> In article 2111DAF8@sil.com, Alex Bache <bache@sil.com> writes:
> >I think it would be nice to have some sort of way of determining at
> >compile time if the machine stores numbers internally as big or little
> >endian numbers.  ...
> >There are cases where you need to know this, in order to write portable
> >code.  Sure, you can write a function to work it out, but you can't
> >determine it at compile time.
>
> The suggestion assumes that numbers are either big-endian or little-
> endian. That is not a requirement. I once used a system where 4-byte
> integers were stored not as 1-2-3-4 or 4-3-2-1, but as 2-1-4-3. Also,
> some floating-point on the VAX stores the most-significant byte near
> the middle of the storage allocated for it, much like those funny
> integers. Once loaded into registers, the bytes get straightened
> out -- load/store always does the right thing, as does arithmetic,
> whether in registers or directly in memory.
>
> You only get into trouble when accessing individual bytes directly
> in memory, in effect casting from int* or double* to char*.
>
> The relationship between the address order of bytes in memory and
> the significance of the bytes is not limited to two possibilities.
> Add to this the fact that it is common for bytes to be 9, 32, or 36
> bits. I think that finding a way to give you all the information you
> really need in the high-level language is not going to succeed.
>
> The alternatives would seem to be
> 1. status quo -- you are on your own;
> 2. restrict C++ to platforms with "nice" characteristics (as Java does);
> 3. provide language support for only part of the job.
>
> Actually #3 is also status quo, but we could argue about how much
> additional support is appropriate. I suppose we could add something
> that described byte order as one of "big-endian, little-endian, neither".
>
> Hmmm. What if short, int, and long did not all have the same endianness?
> In my example above, shorts were "little-endian" but longs were "neither".
> Should we include floating-point in the endianness specification? VAX
> integers are all "little-endian", but floating-point varies -- floats
> are "little-endian", doubles are "neither", if I remember correctly.
>
> Having "neither" as a possibility means that portable code will still
> have to use other means if "neither" is true. It seems to me the
> endianness addition doesn't buy very much.

. . . and the answer to your question may be . . .

Your goal to optimize performance with an implentation that happens to
be portable as well is probably an unrealistic goal in this case.

Do the best you can to implement your design independent of influences
such as hardware and compilers, etc.  This, of course, usually means a
slower implementation compared to dependent, non-portable code, but
only worry about performance after you have determined it to be an issue.

If performance of your portable version of the implementation is a problem,
you are probably stuck handling it yourself via platform/compiler
dependent makefiles and #ifdef's and all that mess.  There are too many
different influences out there to standardize anything on this
problem.   B-)
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]