Thread

Topic: B. Stroustrup, sizeof(int) and sizeof(char); DEFECT IN C++98

Author: "David Thompson" <david.thompson1@worldnet.att.net>
Date: Wed, 1 May 2002 17:14:03 GMT Raw View

James Kuyper Jr. <kuyper@wizard.net> wrote :
> Daniel Miller wrote:
....
> >    byte#2: one opinion is that when C++98 and C standards refer to "byte",
they
> > are referring to an implementation defined unit somehow related to
representing
> > a minimum-sized char-type on that processor architecture. ...
>
> Almost correct. It's the minimum-sized addressable unit, and it must be
> able to hold every element of the basic execution character set.

Yes.

> For instance, there have been machines with a word size of 36 bits, and
> configurable byte sizes; the byte size could be set as low a 5 bits,
> allowing 7 bytes per word. That mode allowed only for capital letters
> and punctuation - there was no room even for digits, much less lower
> case. By your description of byte#2, a C++ implementation on such a
> machine would be required to use it in the 5-bit mode.

If you are (and even if you aren't) thinking of the PDP-10, which was
the most widely used machine with this characteristic, at least in
academia in the early days of C, it supports all byte sizes 1-36 bits
(though 19-35 always take a full word and hence aren't very useful).
The only then-common 5-bit code was IA2 "Baudot", had one
(uppercase) alphabet, digits, a dozen plus punctuation, and a
handful of controls, using two shift states, LTRS (letters) and
FIGS (figures), whose effect should be obvious.

> However, what the
> standard actually requires is that char must be able to hold at least 96
> different values, and that unsigned char have a range which implicitly
> requires at least 8 bits per byte.

That's the minimum basic _source_ char set (for C++; one less for C).
The minimum basic _execution_ char set (for both) is 99 plus null = 100.
This doesn't alter your conclusion.

> I've never heard anyone indicate
> whether there was ever a C implementation for that machine, and there
> almost certaintly was not a C++ implementation. However, there could
> have been. For such an implementation, the mode which put 4 9-bit bytes
> in a word would have been the most logical configuration.
>
I believe there was never a DEC C compiler for the -10, but was a
user-community one (I think University of Utah??).  Per reports on
alt.sys.pdp10 however there is _now_ (as of a few months ago)
a gcc port -- and three(!) emulators you can run on, if you don't
have one of the very few remaining real or hardware-clone -10s.

....
> Historically, for as long as there's been a C standard, it's explicitly
> defined a byte in a way that allows for it to be larger than 8 bits. The
> C++ standard merely continued that tradition.
>
And, as I think someone has noted, even before the C standard;
indeed even before K&R_1_ was published, on the GE 635.

--
- David.Thompson 1 now at worldnet.att.net




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ron Natalie <ron@sensor.com>
Date: Wed, 1 May 2002 20:19:52 GMT Raw View

David Thompson wrote:
>

> > For instance, there have been machines with a word size of 36 bits, and
> > configurable byte sizes;

> If you are (and even if you aren't) thinking of the PDP-10, which was
> the most widely used machine with this characteristic, at least in
> academia in the early days of C, it supports all byte sizes 1-36 bits
> (though 19-35 always take a full word and hence aren't very useful).
>

The IBM 7094 begat both the PDP 10 (DEC 10/20) and the Univac 1100
series.  Both of these had the rather flexible partial word sizees.
As a matter of fact, not only could you bust up the 36 bit word into
arbitrary sized bytes (if you want to call them that), you didn't
even have to make them all the same size.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: James Kanze <kanze@gabi-soft.de>
Date: Fri, 19 Apr 2002 17:11:29 GMT Raw View

Daniel Miller <daniel.miller@tellabs.com> writes:

|>     The debate on this thread is resembling a purely academic
|>  debating society regarding the ontology versus phenomenology of
|>  certain words.  Let us pragmatically refocus on identifying
|>  defects/ambiguities in the C++98 standard and how to fix them.

|>     Please allow me to summarize where we stand thus far:

|>     byte#1: one opinion is that when C++98 and C standards refer to
|>  "byte", they are referring to a strictly 8-bit byte

|>     byte#2: one opinion is that when C++98 and C standards refer to
|>  "byte", they are referring to an implementation defined unit somehow
|>  related to representing a minimum-sized char-type on that processor
|>  architecture.  Some variants of this opinion would permit a byte
|>  (i.e., char) of 6-bits to 15-bits.  Other variants of this opinion
|>  would permit a byte (i.e., char) of nearly any size: 16-bits, 32-bits=
,
|>  64-bits, 128-bits, ad infinitum.

I've not actually seen either of these opinions, except at the very
beginning of the thread.  Both the C and the C++ standards explicitly
define what they mean by byte, and both explicitly say that it is NOT
necessarily 8 bits.  (See ISO 9899, 3.6 and ISO 14882, 1.7.)

Having established what the C and the C++ standards mean by byte, the
thread has thus drifted to the question of what the word "normally"
means, outside of the standard.

|>     Obviously the C++98 standard can be read in two drastically
|>  different and substantially incongruent ways: byte#1 and byte#2.
|>  The C++98 standard does not explicitly define the term "byte" nor
|>  does it normatively reference a standard which itself in turn
|>  explicitly defines the term "byte".

This is simply false.  See the sections mentionned above.  In
particular, from the C++ standard (second sentence of 1.7): "A byte
[...] is a contiguous sequence of bits, the number of which is
implementation-defined."  I don't see what could be clearer.
(Elsewhere, the standard states that the sizeof operator returns the
size in bytes, that sizeof(unsigned char) must be 1, and that an
unsigned char must be able to hold all of the values in the range
0...255.  These requirements, taken together, mean that a byte must have
at least 8 bits.)

|>     DEFECT: C++98's ambiguous use of the term "byte" without
|>  providing an explicit definition which selects exactly one of the
|>  alternative definitions of "byte" is itself a fundamental defect
|>  from which a series of troublesome alternative interpretations (and
|>  thus troublesome alternative compiler implementations) may flow.

No defect.  You just haven't bothered reading the standard, or even the
preceding posts in this thread.

|>     I see at least two ways of resolving this:

|>     resolution#1: Omit any & all mention of the word "byte".  In C++0x
|>  and in any C++98 corrigenda, strictly use only the word "octet"
|>  instead of C++98's "byte".

Except that the intention is precisely NOT to require strictly 8 bits,
but 8 or more bits.  For whatever reasons, the intent of the C and the
C++ standard is to allow efficient implementations on any conceivable
hardward, including one with 36 bit words and 9 bit bytes (a hardware
which has actually existed).

|>     resolution#2: Explicitly pick one of the two alternative
|>  definitions of "byte": byte#1 or byte#2.  Explicitly definite "byte"
|>  in C++0x and in any C++98 corrigenda.

|>     Obviously some people who staunchly subscribe to byte#2 would
|>  consider resolution#1 as moving away from their position.  Likewise,
|>  if byte#1 were to be chosen as part of implementing resolution#2,
|>  some of those people who staunchly subscribe to byte#2 would
|>  consider resolution#2=3Dbyte#1 as moving away from their position.

At least within this thread, I don't think that there have been any
arguments that the C/C++ should change so that it would not allow an
effective implementation on a machines which don't directly support 8
bit bytes.

This is a separate argument.  It is, IMHO, a reasonable argument -- I
don't think that there are any machines capable of a hosted environment
sold today that have anything but 8 bit bytes.  I think that the last
one sold was probably long enough ago that it need not be considered.
(But I am far from sure about this.)  On the other hand, there ARE DSP's
today which define the size of a byte as 32 bits (and sizeof(int) as 1);
any change should allow these to continue to exist.  And I think that
there would be considerable resistance to a change which made the basic
language requirements different for hosted and free-standing
environments.

|>     Because of this thread's volume of seemingly-endless debate about
|>  what the word "byte" is, I expect to see this defect added to the
|>  official C++98 defect list.  I expect to see this defect resolved in
|>  a C++98 corrigendum which then is folded into C++0x.  If C++98 meant
|>  for "byte" (char) to be an 8-bit byte =3D octet, then explicitly
|>  define "byte" with such strictness.  If C++98 meant for "byte"
|>  (char) to be 6-bits to 15-bits, up to 16-bits, up to 32-bits, up to
|>  64-bits, up to 128-bits, or so forth, then explicitly define "byte"
|>  with such rich semantics.

|>     Note that some byte#2-oriented postings on this thread have been
|>  tantamount to redefining/hijacking C's/C++'s historically 8(ish)-bit
|>  char to be UTF16/UTF32/UCS2/UCS4-capable for non-UTF8 Unicode.
|>  Character encoding schemes composed of value-sets whose size is
|>  greater than 255 graphemes (e.g., Unicode, ISO/IEC 10646) is the
|>  purpose for which wchar_t has always been intended.

I'm curious as to where you got the ideas about C's "historically 8-bit
char".  In Kernighan and Richie, "The C Programming Language", 1978
(page 34), the authors explicitely state that the sizes of the data
types are not defined by the language, and include a table of some
current implementations which includes a 9 bit byte.  In 1978, the word
byte certainly did not have any implications of 8 bits, as there were
still many machines on the market which had other size bytes.

Note that this is all irrelevant to Tom Plunket's points, to which I was
responding.  He and I do not disagree about what the standard says, or
should say, but about the state of the *evolution* of the general
meaning of byte.  I think we both agree that it didn't originally mean 8
bits, and we both agree that in 50 years or more, it will definitly mean
a unit of 8 bits -- barring some unforeseeable historical quirk.  I
think we also both agree that given this evolution, it would be best if
the C/C++ found another term.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Thomas Krog" <rick@kampsax.dtu.dk>
Date: Tue, 23 Apr 2002 17:23:51 GMT Raw View

>This is simply false.  See the sections mentionned above.  In
>particular, from the C++ standard (second sentence of 1.7): "A byte
>[...] is a contiguous sequence of bits, the number of which is
>implementation-defined."  I don't see what could be clearer.
>(Elsewhere, the standard states that the sizeof operator returns the
>size in bytes, that sizeof(unsigned char) must be 1, and that an
>unsigned char must be able to hold all of the values in the range
>0...255.  These requirements, taken together, mean that a byte must have
>at least 8 bits.)

Does this mean that it is impossible to write a portable c++ code which
prints how many bits the program has allocated with the new operator?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ron Natalie <ron@sensor.com>
Date: Tue, 23 Apr 2002 17:38:35 GMT Raw View


Thomas Krog wrote:

> Does this mean that it is impossible to write a portable c++ code which
> prints how many bits the program has allocated with the new operator?

    void PrintAllocation(size_t n) {
 try {
    char* p = new char[n];
    delete [] p;
    cout << "Allocated " << (n*CHAR_BIT) << " bits.\n";
 } catch(bad_alloc) {
    cout << "Failed to allocate " << (n*CHAR_BIT) << " bits.\n";
 }
    }

Perfectly poratable, but what it prints is implementation specific (depends
on CHAR_BIT).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Thomas Krog" <rick@kampsax.dtu.dk>
Date: Tue, 23 Apr 2002 18:50:51 GMT Raw View

thanks - I have not heard about the CHAR_BIT constant before


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Thu, 18 Apr 2002 21:40:40 GMT Raw View

   The debate on this thread is resembling a purely academic debating society
regarding the ontology versus phenomenology of certain words.  Let us
pragmatically refocus on identifying defects/ambiguities in the C++98 standard
and how to fix them.

   Please allow me to summarize where we stand thus far:

   byte#1: one opinion is that when C++98 and C standards refer to "byte", they
are referring to a strictly 8-bit byte

   byte#2: one opinion is that when C++98 and C standards refer to "byte", they
are referring to an implementation defined unit somehow related to representing
a minimum-sized char-type on that processor architecture.  Some variants of this
opinion would permit a byte (i.e., char) of 6-bits to 15-bits.  Other variants
of this opinion would permit a byte (i.e., char) of nearly any size: 16-bits,
32-bits, 64-bits, 128-bits, ad infinitum.

   Obviously the C++98 standard can be read in two drastically different and
substantially incongruent ways: byte#1 and byte#2.  The C++98 standard does not
explicitly define the term "byte" nor does it normatively reference a standard
which itself in turn explicitly defines the term "byte".

   DEFECT:  C++98's ambiguous use of the term "byte" without providing an
explicit definition which selects exactly one of the alternative definitions of
"byte" is itself a fundamental defect from which a series of troublesome
alternative interpretations (and thus troublesome alternative compiler
implementations) may flow.

   I see at least two ways of resolving this:

   resolution#1: Omit any & all mention of the word "byte".  In C++0x and in any
C++98 corrigenda, strictly use only the word "octet" instead of C++98's "byte".

   resolution#2: Explicitly pick one of the two alternative definitions of
"byte": byte#1 or byte#2.  Explicitly definite "byte" in C++0x and in any C++98
corrigenda.

   Obviously some people who staunchly subscribe to byte#2 would consider
resolution#1 as moving away from their position.  Likewise, if byte#1 were to be
chosen as part of implementing resolution#2, some of those people who staunchly
subscribe to byte#2 would consider resolution#2=byte#1 as moving away from their
position.

   Because of this thread's volume of seemingly-endless debate about what the
word "byte" is, I expect to see this defect added to the official C++98 defect
list.  I expect to see this defect resolved in a C++98 corrigendum which then is
folded into C++0x.  If C++98 meant for "byte" (char) to be an 8-bit byte =
octet, then explicitly define "byte" with such strictness.  If C++98 meant for
"byte" (char) to be 6-bits to 15-bits, up to 16-bits, up to 32-bits, up to
64-bits, up to 128-bits, or so forth, then explicitly define "byte" with such
rich semantics.

   Note that some byte#2-oriented postings on this thread have been tantamount
to redefining/hijacking C's/C++'s historically 8(ish)-bit char to be
UTF16/UTF32/UCS2/UCS4-capable for non-UTF8 Unicode.  Character encoding schemes
  composed of value-sets whose size is greater than 255 graphemes (e.g.,
Unicode, ISO/IEC 10646) is the purpose for which wchar_t has always been intended.

   Or equivalently, the "supreme court" of C++ needs to normatively decide how
the C++98 "constitution" is to be interpreted regarding how "byte" is to
interpreted regarding char and sizeof(char).

James Kanze wrote:

> Tom Plunket <tomas@fancy.org> writes:
>
> |>  James Kanze wrote:
>
> |>  > The word "byte" has never meant eight bits.  Historically...
>
> |>  Ironically, words mean whatever they get used to mean, and as long
> |>  as a word has a definition that is understood by the involved
> |>  parties, that definition is valid regardless of what is "official".
>
> True, but words are used within distinct communities.  Here, we are
> talking of a specialized technical community; how the man on the street
> uses the word (or if he has even heard of it) is irrelevant: when we use
> the word stack, or loop, in this forum, it generally also has a meaning
> quite different from that used by the man on the street.
>
>     [...]
> |>  Popular usage of the word "byte" does mean "eight bits" or "octet",
> |>  regardless of what ISO says and regardless of what IBM once did 40
> |>  or 50 years ago.
>
> I'm not sure that there is a popular usage of the word "byte".  If so,
> it is very recent, and probably is 8 bits.  But that is separate from
> the technical usage, just as the use of stack or loop with regards to
> programming is different from other uses.
>
> |>  Merriam-Webster currently defines a byte to be "a group of eight
> |>  binary digits...", and since dictionaries get definitions from
> |>  popular usage, we can assume that this definition is what most
> |>  people use as their definition of "byte".  This does not mean that
> |>  the ISO is wrong, of course, it just means that they are defining
> |>  byte to be something other than the popular usage.
>
> And that Merriam-Webster is giving a general definition, and not a
> technical one.  IMHO, if they don't mention it's use with a meaning of
> other than 8 bytes, they are wrong; the two uses are related, and
> presenting one without the other is highly misleading, since the
> definition they do give "sounds" technical.  They might, of course,
> label my usage as "technical", or give some other indication that it is
> not the everyday usage.
>
> With regards to the technical meaning, it is significant to note that
> technical documents in which the unit must be 8 bits (descriptions of
> netword protocols, etc.) do NOT use the word byte, but octet.
>
> |>  As an example, a "nice" girl referred to a prostitute in Victorian
> |>  England.  The meaning of "nice" has morphed over the years; the only
> |>  thing defining it was popular usage and understanding of what the
> |>  word meant.
>
> A good dictionary will still give this meaning, indicating, of course,
> that it is archaic.
>
> I would agree that we are in a situation where the word byte is changing
> meaning, and 50 years from now, it probably will mean 8 bits.  For the
> moment, even if many people assume 8 bits, the word is still
> occasionally used for other sizes, and still retains to some degree its
> older meaning.  (This is, of course, *why* it isn't used in protocol
> descriptions.)
>
> |>  > The fact that machines with bytes of other than 8 bits have become
> |>  > rare doesn't negate the fact that when you do talk of them, the
> |>  > word "byte" doesn't mean 8 bits.  And the distinction is still
> |>  > relevant.  -- look at any of the RFC's, for example, and you'll
> |>  > find that when 8 bits is important, the word used is octet, and
> |>  > not byte.
>
> |>  Yes; the distinction is still relevant in that they need to define
> |>  these words to something other than the popular definition.  This
> |>  doesn't make the standards and RFCs wrong, just anachronistic.  ;)
>
> Not even anachronistic.  Just more precise and more technical than
> everyday usage.
>
> In the case of the C/C++, the use is a bit special, even with regards to
> the older meaning.  I'd actually favor a different word here, but I
> don't have any suggestions.
>
> And what about the use in the library section, where there is a question
> of multi-byte characters -- I've never heard anyone use anything else
> but "multi-byte characters" when referring to the combining codes in 16
> bit Unicode, for example.  So at least in this compound word, byte has
> retained a more general meaning.
>
> In the case of the RFC's and the various standards for the OSI
> protocols, I see no reason to switch from "octet" to "byte".  The word
> "octet" is well established, and is precise, and makes it 100% clear
> that exactly 8 bits are involved.  Even if "byte" is generally
> understood to be 8 bits, why choose the less precise word?
>
>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Ron Natalie <ron@sensor.com>
Date: Thu, 18 Apr 2002 23:08:34 GMT Raw View

Daniel Miller wrote:

>
>    Note that some byte#2-oriented postings on this thread have been tantamount
> to redefining/hijacking C's/C++'s historically 8(ish)-bit char to be
> UTF16/UTF32/UCS2/UCS4-capable for non-UTF8 Unicode.  Character encoding schemes
>   composed of value-sets whose size is greater than 255 graphemes (e.g.,
> Unicode, ISO/IEC 10646) is the purpose for which wchar_t has always been intended.

You would do better to avoid making inflamatory statements like "hijack" in
proposals that you are trying to push on people.

Frankly, if char's were just designed to hold characters, then allowing them
to be something larger if you were on a native, lets say, UTF16 machine
would be reasonable.

However, char plays double duty as the "minimal addressable memory unit."
As much as one would wish to redefine char to a larger size, one can not
do so without losing the ability to address something smaller.

If you want to fix the terminology to allow the same latitude currently
allowed (16 bit chars lets say), thats fine.  If you want to somehow
restrict chars to 8 bits you ave two problems:

1. You then need to fix the fact that wchar_t is not fully supported in C++.
2. You are still deciding that a certain class of machines that has had C/C++
   compilers implemetned for them are no longer allowed a conforming implementaion
   because of the infeasibility of exactly 8 bit char size on them.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 19 Apr 2002 01:05:39 GMT Raw View

Daniel Miller wrote:
>
>    The debate on this thread is resembling a purely academic debating society
> regarding the ontology versus phenomenology of certain words.  Let us
> pragmatically refocus on identifying defects/ambiguities in the C++98 standard
> and how to fix them.
>
>    Please allow me to summarize where we stand thus far:
>
>    byte#1: one opinion is that when C++98 and C standards refer to "byte", they
> are referring to a strictly 8-bit byte

Keep in mind that people can only sustain that opinion by ignoring the
explicit definitions provided in those standards.

>    byte#2: one opinion is that when C++98 and C standards refer to "byte", they
> are referring to an implementation defined unit somehow related to representing
> a minimum-sized char-type on that processor architecture. ...

Almost correct. It's the minimum-sized addressable unit, and it must be
able to hold every element of the basic execution character set.
However, it needn't be a character type as far as the architecture is
concerned, and it can't be the minimum-sized char-type on that
architecture, if the minimum-sized char-type is less than 8 bits.

For instance, there have been machines with a word size of 36 bits, and
configurable byte sizes; the byte size could be set as low a 5 bits,
allowing 7 bytes per word. That mode allowed only for capital letters
and punctuation - there was no room even for digits, much less lower
case. By your description of byte#2, a C++ implementation on such a
machine would be required to use it in the 5-bit mode. However, what the
standard actually requires is that char must be able to hold at least 96
different values, and that unsigned char have a range which implicitly
requires at least 8 bits per byte. I've never heard anyone indicate
whether there was ever a C implementation for that machine, and there
almost certaintly was not a C++ implementation. However, there could
have been. For such an implementation, the mode which put 4 9-bit bytes
in a word would have been the most logical configuration.

> ... Some variants of this
> opinion would permit a byte (i.e., char) of 6-bits to 15-bits. ...

6 or 7 bit bytes would violate requirements specifically layed out by
the standards. Note: the 8-bit limit is not explcit; it's derived from
the requirements on UCHAR_MAX. Those requirements are not explicitly
part of the C++ standard, but are instead incorporated by reference from
the C standard.

> ... Other variants
> of this opinion would permit a byte (i.e., char) of nearly any size: 16-bits,
> 32-bits, 64-bits, 128-bits, ad infinitum.

Yes; the C and C++ standard explicitly allow for an unspecified, and
therefore arbitrarily large, number of bits per byte.

>    Obviously the C++98 standard can be read in two drastically different and
> substantially incongruent ways: byte#1 and byte#2.

No - it cannot be read to match byte#1; the people who support that
point of view have failed to read the relevant clauses. Modulo the
corrections I've given above, byte#2 is pretty much exactly what the
standard actually says.

What's at issue here is not whether the standards mean what they
explicitly say about what a byte is; what's at issue is whether they
should say something different, or use different terminology to say it.

> ... The C++98 standard does not
> explicitly define the term "byte" nor does it normatively reference a standard
> which itself in turn explicitly defines the term "byte".

Completely false. See section 1.7p1 in the C++ standard. In particular,
pay close attention to the last part of the second sentence, which makes
the number of bits in a byte explicitly implementation-defined. See
section 3.6 of the C99 standard. In particular, pay close attention to
Note 2, in 3.6p3. The note is, of course, non-normative, but it
explicitly and correctly points out the absence of a size specification
for a byte in the normative section of the text, making it clear that
this absence was intentional. See section 5.2.4.2.1 of the C99 standard,
for the limits on the valid ranges of character types, which implicitly
require that a char be at least 8 bits. Pay particular attention to
paragraph 2 of that section.

....
>    resolution#2: Explicitly pick one of the two alternative definitions of
> "byte": byte#1 or byte#2.  Explicitly definite "byte" in C++0x and in any C++98
> corrigenda.

Already achieved, without any change to the standard.

....
>    Note that some byte#2-oriented postings on this thread have been tantamount
> to redefining/hijacking C's/C++'s historically 8(ish)-bit char to be

Historically, for as long as there's been a C standard, it's explicitly
defined a byte in a way that allows for it to be larger than 8 bits. The
C++ standard merely continued that tradition.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]