Topic: Number of bits in a char?


Author: James Kuyper <jameskuyper@verizon.net>
Date: Sat, 21 Mar 2009 14:06:07 CST
Raw View
ejstans wrote:
>
> Thanks for the clarifications, I understand now! I didn't have access
> to a C standard so I was fumbling in the dark...

You can get n1256.pdf for free. You can find it in
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/pre-kona-2007.tar.gz>.

N1256 is a combination of C99 plus all three technical corrigenda to
C99 that have been approved so far. All of it's base documents are
official, but N1256 is only a committee draft. As a result, it is
freely available to the general public, whereas the official standard
costs money. However, since it merges all four base documents, N1256
is actually more useful than the official standard.

Technically, the fact that N1256 is a draft means that you need to
cross  check anything it says against the base documents. However,
according to the committee's editor, Lawrence Jones:

> For the record, there is one known error in N1256 (other than
> "September" being misspelled in the header): The predefined macro
> __STDC_MB_MIGHT_NEQ_WC__ should be in 6.10.8p2 (conditional macros)
> rather than p1 (required macros).

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: ejstans <j.l.olsson@gmail.com>
Date: Mon, 16 Mar 2009 17:47:29 CST
Raw View
Hello,

Reading the C++ FAQ lite ( http://www.parashift.com/c++-faq-lite ) it
is said in [26.4] that C++ guarantees a byte must always have at least
8 bits. Reading through the ISO/IEC 14882 standard, I don't find an
explicit reference to this. What I do find is that a byte is defined
to be able to contain any member of the basic execution character set
(1.7), which I interpret as containing 100 characters, and hence would
require at least 7 bits.
Is there a definition that states 8 bits somewhere that I have missed?
One guess I have is that maybe the requirement comes from eg UCHAR_MAX
being defined that way?
But I don't see where UCHAR_MAX is defined, is this from the C
standard?

Also, is char and byte equivalent?

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: James Kuyper <jameskuyper@verizon.net>
Date: Tue, 17 Mar 2009 14:46:35 CST
Raw View
ejstans wrote:
> Hello,
>
> Reading the C++ FAQ lite ( http://www.parashift.com/c++-faq-lite ) it
> is said in [26.4] that C++ guarantees a byte must always have at least
> 8 bits. Reading through the ISO/IEC 14882 standard, I don't find an
> explicit reference to this.

The relevant bit isn't actually in the C++ standard. The C standard
cross-references the C standard library, which specified that <limits.h>
must #define a macro named CHAR_BIT whose meaning is given as "number of
bits for smallest object that is not a bit-field (byte)", and whose
minimum permitted value is specified to be 8.

The C++ standard cross-references the C standard, requiring that
<limits.h> must be supported, and that <climits> must defined/declare
the same identifiers. The C++ standard mentions that CHAR_BIT is one of
the macros that those headers must define, but leaves its specification
up to the C standard.

   What I do find is that a byte is defined
> to be able to contain any member of the basic execution character set
> (1.7),

It goes on to say (in n2723.pdf, at least - I don't know whether it says
this in the current standard): "and the eight-bit code units of the
Unicode UTF-8 encoding form".

....
> But I don't see where UCHAR_MAX is defined, is this from the C
> standard?

Like CHAR_BIT, the details of UCHAR_MAX are incorporated from the C
standard by reference, rather than in explicit detail.

> Also, is char and byte equivalent?

No, though the two concepts are closely related. The meaning of the term
"byte" in C++ is defined by the very same section of the C++ standard
that you have already cited; the fact that the word "byte" is given in
italics is an ISO convention indicating that the context in which it
occurs constitutes the definition of that term.

The key distinction is that a char is a C++ data type, while a byte is a
unit for measuring storage requirements. The sizeof of that unit is
defined to be the amount of storage space required by a 'char'. This
important connection between those two concepts is not provided in the
definition of 'byte', nor in the definition of 'char'. It is, oddly
enough, provided in the definition of the sizeof operator:

5.3.3p1: "The sizeof operator yields the number of bytes in the object
representation of its operand. ... sizeof(char), sizeof(signed char) and
sizeof(unsigned char) are 1."

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Pavel Minaev <int19h@gmail.com>
Date: Tue, 17 Mar 2009 14:45:47 CST
Raw View
On Mar 16, 4:47 pm, ejstans <j.l.ols...@gmail.com> wrote:
> Hello,
>
> Reading the C++ FAQ lite (http://www.parashift.com/c++-faq-lite) it
> is said in [26.4] that C++ guarantees a byte must always have at least
> 8 bits. Reading through the ISO/IEC 14882 standard, I don't find an
> explicit reference to this. What I do find is that a byte is defined
> to be able to contain any member of the basic execution character set
> (1.7), which I interpret as containing 100 characters, and hence would
> require at least 7 bits.
> Is there a definition that states 8 bits somewhere that I have missed?
> One guess I have is that maybe the requirement comes from eg UCHAR_MAX
> being defined that way?
> But I don't see where UCHAR_MAX is defined, is this from the C
> standard?

UCHAR_MAX (and CHAR_BITS) are mentioned as being defined in <climits>,
for which the C++ standard says (18.2.2[lib.c.limits]):

"The contents are the same as the Standard C library header
<limits.h>."

Looking in the C Standard, we find (5.2.4.2.1):

"... implementation-defined values shall be equal or greater in
magnitude (absolute value) to those shown, with the same sign.

#define CHAR_BIT 8
#define SCHAR_MIN -127
#define SCHAR_MAX 127
#define UCHAR_MAX 255"


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Bart van Ingen Schenau <bart@ingen.ddns.info>
Date: Tue, 17 Mar 2009 15:36:34 CST
Raw View
ejstans wrote:

> Hello,
>
> Reading the C++ FAQ lite ( http://www.parashift.com/c++-faq-lite ) it
> is said in [26.4] that C++ guarantees a byte must always have at least
> 8 bits. Reading through the ISO/IEC 14882 standard, I don't find an
> explicit reference to this. What I do find is that a byte is defined
> to be able to contain any member of the basic execution character set
> (1.7), which I interpret as containing 100 characters, and hence would
> require at least 7 bits.
> Is there a definition that states 8 bits somewhere that I have missed?
> One guess I have is that maybe the requirement comes from eg UCHAR_MAX
> being defined that way?
> But I don't see where UCHAR_MAX is defined, is this from the C
> standard?

It seems that there is indeed no direct requirement for the minimum of 8
bits for a byte. This requirement follows from the requirements that
are inherited from C that a byte contains CHAR_BIT bits and that
CHAR_BIT is at least 8.

>
> Also, is char and byte equivalent?
>
As far as the C and C++ standards are concerned, yes, even if char is
more than 8 bits wide.

However, in common usage, the term byte is often used to refer to an
octet (an entity with exactly 8 bits).

Bart v Ingen Schenau
--
a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://c-faq.com/
c.l.c++ FAQ: http://www.parashift.com/c++-faq-lite/

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: litb <Schaub-Johannes@web.de>
Date: Tue, 17 Mar 2009 15:36:04 CST
Raw View
On 17 Mrz., 00:47, ejstans <j.l.ols...@gmail.com> wrote:

> One guess I have is that maybe the requirement comes from eg UCHAR_MAX
> being defined that way?
> But I don't see where UCHAR_MAX is defined, is this from the C
> standard?

Yes, C89 documents it. Saying UCHAR_MAX is at least 255 in 2.2.4.2
(according to the draft at http://flash-gordon.me.uk/ansi.c.txt). That
of course requires 8 bits at least.

> Also, is char and byte equivalent?
>
The size of 1 char is 1 byte, which is the smallest addressable unit
in C++. See 5.3.3 [expr.sizeof]

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: James Kanze <james.kanze@gmail.com>
Date: Tue, 17 Mar 2009 15:35:56 CST
Raw View
On Mar 17, 12:47 am, ejstans <j.l.ols...@gmail.com> wrote:

> Reading the C++ FAQ lite
> (http://www.parashift.com/c++-faq-lite) it is said in [26.4]
> that C++ guarantees a byte must always have at least 8 bits.
> Reading through the ISO/IEC 14882 standard, I don't find an
> explicit reference to this. What I do find is that a byte is
> defined to be able to contain any member of the basic
> execution character set (1.7), which I interpret as containing
> 100 characters, and hence would require at least 7 bits.  Is
> there a definition that states 8 bits somewhere that I have
> missed?  One guess I have is that maybe the requirement comes
> from eg UCHAR_MAX being defined that way?  But I don't see
> where UCHAR_MAX is defined, is this from the C standard?

You've more or less got it.  The contents of <climits> are in
fact included from the C standard, where UCHAR_MAX is required
to be at least 255.  In addition, both the C and the C++
standard have text which guarantees that integral types
(including character types) use a pure binary representation, so
you can't have a base 3 implementation, with each bit (or would
that be trit) having three states, char's having 6 bits, and
UCHAR_MAX equal to 728.

> Also, is char and byte equivalent?

Sort of.  The standard uses the word byte exclusively as a
measure of size, but there is a guarantee that sizeof( char ) is
one.

This is in the standard, of course.  In it's more generally
accepted meaning, the first "bytes" were 6 bits, the DEC 10
traditionally put five 7 bit bytes in a 36 bit word (although at
the hardware level, the byte size was programmable), and a lot
of machines didn't have bytes at all.

--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient   e objet/
                  Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: ejstans <j.l.olsson@gmail.com>
Date: Fri, 20 Mar 2009 02:08:35 CST
Raw View
Thanks for the clarifications, I understand now! I didn't have access
to a C standard so I was fumbling in the dark...

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]