Topic: Usage of sizeof(char) == 1
Author: dwalker07@snet.net.invalid (Daryle Walker)
Date: 2000/01/03 Raw View
Don't the C and C++ standards define sizeof(char) to be 1, with all
other types having a multiple of this size? Don't they also expect that
the char type to used as the normal character set? After a while,
especially after seeing the first point mentioned a few times, I have a
problem with this.
Couldn't the most basic type and the normal character type be _mutually
exclusive_?!
In fact, you don't have to go far to find a case that breaks C/C++'s
assumption. The Java Virtual Machine has this mixed case. (I'm
assuming that a C or C++ compiler, and not Java, is being used to
compile a program to the JVM.) The most basic type, byte, is of a size
of one octet. The normal character size, char, takes two octets. How
is this solved? Do you just consider the byte type to be 'char,' even
though the machine doesn't consider it such?
A related problem is that of types that have sizes that aren't direct
multiples of another. What if a machine has its smallest type be 3
octets, and its next type be 5 octets in size. How can you set up the
basic sizeof(char) for this one, since you probably shouldn't have
fractional sizes. Do you set up a char pusedo-type that is the greatest
common factor of all the real types' sizes?
--
Daryle Walker
Video Game, Mac, and Internet Junkie
dwalker07 AT snet DOT net
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Steve Clamage <stephen.clamage@sun.com>
Date: 2000/01/03 Raw View
Daryle Walker wrote:
>
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size? Don't they also expect that
> the char type to used as the normal character set?
Yes to both. It has always been the case in C, and is continued (for
C compatibility) in C++. C and C++ also support a "wide character"
type (wchar_t) for characters sets that won't fit in a char.
>
> Couldn't the most basic type and the normal character type be _mutually
> exclusive_?!
Making the types independent is certainly a valid language design
possibility, but it is not the case in C or C++.
>
> In fact, you don't have to go far to find a case that breaks C/C++'s
> assumption. The Java Virtual Machine has this mixed case. (I'm
> assuming that a C or C++ compiler, and not Java, is being used to
> compile a program to the JVM.) The most basic type, byte, is of a size
> of one octet. The normal character size, char, takes two octets. How
> is this solved? Do you just consider the byte type to be 'char,' even
> though the machine doesn't consider it such?
Perhaps you are under the impression that there is some necessary
correspondence between Java and C or C++. That is not the case.
Java is a different language with some superficial similarities to
C and C++.
If you want to mix C or C++ with Java, you have to write C or C++
versions of the Java declarations (and Java versions of C and C++
declarations). Some constructs in one language won't have any exact
correspondence in other languages. You have to avoid those things
in the interfaces. (All of this is generally true when mixing source
languages in one program.)
If the C/C++ type wchar_t corresponds to the Java chararacter type,
you can use that. (The exact definition of wchar_t depends on the
implementation.) Otherwise, you need to declare some suitable type.
>
> A related problem is that of types that have sizes that aren't direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the greatest
> common factor of all the real types' sizes?
The only requirements in C or C++ on sizes are
- sizeof(char) == 1 (by definition),
- sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long),
- the sizes be big enough to represent the required ranges,
- the signed and unsigned versions of types have the same size.
There is no requirement that (for example) sizeof(long) be
a multiple of sizeof(short). With a 1-byte char, you could have
3-byte ints and 5-byte longs if you want.
If you use a 16-bit char (as opposed to wchar_t) on an 8-bit byte
machine, all the other basic types must have an even number of bytes.
On your machine with 2-byte chars and 3-byte registers, type
"int" would need to have 2, 4, or 6 bytes. Type "long" would need
to have 4 or 6 bytes. The implementer of C/C++ would make the
choice, balancing convenience and efficiency considerations.
--
Steve Clamage, stephen.clamage@sun.com
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "John E. Gwyn" <JEGwyn@compuserve.com>
Date: 2000/01/03 Raw View
Daryle Walker wrote:
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size?
Yes.
> Don't they also expect that the char type to used as the normal
> character set?
No. "char" for characters is now a historical ("legacy") artifact.
Use wchar_t for the most general character handling.
> Couldn't the most basic type and the normal character type be
> _mutually exclusive_?!
Yes.
Until C89, there was still a chance for C to clearly distinguish
between bytes and characters (I did propose a way to do so while
we were attacking the "internationalization" issue), but
X3J11/WG14 missed that opportunity, not wanting to break existing
code that relied on sizeof(char)==1 (which was not specified in
the base document, but was widely assumed, and supported by a
comment from Dennis Ritchie). If C had had, e.g., "short char"
meaning "byte", then "char" could have continued to be used for
general character encoding. But as it transpired, we had to
introduce a new type(def) "wchar_t" for general character
encoding. The plain "char" character support in Standard C is
there to support legacy (pre-wchar_t) code and should not be used
for character coding in new portable programs. Alas.
> [Java's] most basic type, byte, is of a size of one octet.
> The normal character size, char, takes two octets. How
> is this solved?
What's to "solve"? Java did what C should have done in this
regard, except that allotting just 16 bits for a universal
character encoding was stupid.
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Holger Eitzenberger <eitz@weh.rwth-aachen.de>
Date: 2000/01/03 Raw View
In comp.lang.c Daryle Walker <dwalker07@snet.net.invalid> wrote:
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size? Don't they also expect that
> the char type to used as the normal character set? After a while,
> especially after seeing the first point mentioned a few times, I have a
> problem with this.
According to the FAQ sizeof(char) is _by definition_ 1. And _no_,
there may be implementations where character constants are stored in
an int by default (so sizeof('a') > 1).
Holger
--
+ PGP || GnuPG key -> finger eitz@jonathan.weh.rwth-aachen.de +
+++ Debian/GNU Linux <octavian@debian.org> +++ ICQ: 2882018 +++
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Jim Fischer" <jfischer@polymail.cpunix.calpoly.edu>
Date: 2000/01/03 Raw View
Daryle Walker <dwalker07@snet.net.invalid> wrote in message
news:1e3t5pg.15igveh120d6psN%dwalker07@snet.net.invalid...
> Don't the C and C++ standards define sizeof(char) to be 1
Yes [ref: ANSI/ISO/IEC 14882:1998, 5.3.3(1)].
> with all other types having a multiple of this size?
Essentially, yes.
> Don't they also expect that the char type to used as the normal
> character set?
No. The ANSI/ISO C++ language define two basic data types for use with
character data: 'char' and 'wchar_t' (including signed & unsigned
versions). The wchar_t type provides support for multi-byte (a.k.a.,
wide) character sets. ANSI/ISO/IEC 14882:1998 states, "'wchar_t' is a
distinct type whose values can represent distinct codes for members of
the largest extended character set specified among the supported
locales(22.1.1). Type wchar_t shall have the same size, signedness,
and alignment requirements (3.9) as one of the other integral types,
called its underlying type" [3.9.1(5)]. Also, sizeof(wchar_t) is
implementation defined [5.3.3(1)].
> After a while, especially after seeing the first point mentioned
> a few times, I have a problem with this.
>
> Couldn't the most basic type and the normal character type be
> _mutually exclusive_?!
No -- at least, not on the vast majority of computer systems that
exist today (i.e., some oddball machine in a research lab somewhere
might have properties that cannot physically/logically realize this
definition).
> In fact, you don't have to go far to find a case that breaks C/C++'s
> assumption. The Java Virtual Machine has this mixed case. (I'm
> assuming that a C or C++ compiler, and not Java, is being used to
> compile a program to the JVM.) The most basic type, byte, is of a
> size of one octet. The normal character size, char, takes two
octets.
> How is this solved? Do you just consider the byte type to be
'char,'
> even though the machine doesn't consider it such?
You'll have to take this up with the Java folks...
> A related problem is that of types that have sizes that aren't
direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up
the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the
> greatest common factor of all the real types' sizes?
On such a machine, a 'char' type object is (apparently) an octet. If
it's not, you're dealing with some oddball machine, and a compiler
vendor for that machine will need to define some implementation
specific features to support that oddball architecture.
Jim
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/03 Raw View
dwalker07@snet.net.invalid (Daryle Walker) writes:
|> Don't the C and C++ standards define sizeof(char) to be 1, with all
|> other types having a multiple of this size?
Yes.
|> Don't they also expect that
|> the char type to used as the normal character set?
The basic character set. One could argue that today, the "normal"
character set uses wchar_t. (In fact, of course, there is no such thing
as "normal" in this context. I generally only use char, because to
date, I've only needed portability to western Europe.)
|> After a while,
|> especially after seeing the first point mentioned a few times, I have a
|> problem with this.
|> Couldn't the most basic type and the normal character type be _mutually
|> exclusive_?!
I would depend on your definition of the normal character type. char is
the most basic type, by definition.
|> In fact, you don't have to go far to find a case that breaks C/C++'s
|> assumption. The Java Virtual Machine has this mixed case. (I'm
|> assuming that a C or C++ compiler, and not Java, is being used to
|> compile a program to the JVM.) The most basic type, byte, is of a size
|> of one octet. The normal character size, char, takes two octets. How
|> is this solved? Do you just consider the byte type to be 'char,' even
|> though the machine doesn't consider it such?
Neither Java nor C nor C++ have a character type, so I don't see where
the problem lies. Java has a somewhat un-orthogonal situation in which
there is an unsigned integral type corresponding to short, but not to
any other type. It is also different from C or C++ in that it only has
one string type, whereas C has both char* and wchar_t*, and C++ string
and wstring.
In deciding how to map the C++ types to the JVM types, I'd have to
consider the goals of the port. For some uses, mapping char to byte
would be a reasonable solution, but to what do you map unsigned char?
|> A related problem is that of types that have sizes that aren't
|> direct multiples of another. What if a machine has its smallest
|> type be 3 octets, and its next type be 5 octets in size. How can
|> you set up the basic sizeof(char) for this one, since you probably
|> shouldn't have fractional sizes. Do you set up a char pusedo-type
|> that is the greatest common factor of all the real types' sizes?
A conforming implementation would probably be impossible on such
machines. Do you know of any? My experience is that all of the
machines that use anything other than 8 bit bytes, and integral types of
a power of two bytes, have all been word addressed machines -- a typical
C implementation for such machines will only use two distinct sizes:
char for single "bytes", and short, int and long all of word size. (If
necessary, long may be simulated by software with two words.)
--
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orient e objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Ron Natalie <ron@sensor.com>
Date: 2000/01/03 Raw View
Holger Eitzenberger wrote:
. And _no_,
> there may be implementations where character constants are stored in
> an int by default (so sizeof('a') > 1).
>
Not in C++.
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Mark McIntyre <mark@garthorn.demon.co.uk>
Date: 2000/01/03 Raw View
On 3 Jan 2000 17:35:50 GMT, dwalker07@snet.net.invalid (Daryle Walker)
wrote:
>
>Don't the C and C++ standards define sizeof(char) to be 1, with all
>other types having a multiple of this size? Don't they also expect that
>the char type to used as the normal character set? After a while,
>especially after seeing the first point mentioned a few times, I have a
>problem with this.
>
>Couldn't the most basic type and the normal character type be _mutually
>exclusive_?!
AFAIR C gets round this by defining "char" by statiing
a) sizeof char == 1
b) its AT LEAST large enough to hold the basic character set.
c) its uniquely addressable in the machine architecture.
For all ANSI C cares, a char could be 11 bits, 256 or 1Gb. The basic
character set is also defined I think and has < 127 characters.
>In fact, you don't have to go far to find a case that breaks C/C++'s
>assumption. The Java Virtual Machine has this mixed case. (I'm
>assuming that a C or C++ compiler, and not Java, is being used to
>compile a program to the JVM.) The most basic type, byte, is of a size
>of one octet. The normal character size, char, takes two octets. How
>is this solved?
C defines char as above, and wchar as a two-byte type.
>Do you just consider the byte type to be 'char,' even
>though the machine doesn't consider it such?
If "byte" can be addressed, and can hold the character set, then it
can be used as "char".
>A related problem is that of types that have sizes that aren't direct
>multiples of another. What if a machine has its smallest type be 3
>octets, and its next type be 5 octets in size.
I suspect that such an architecture would be impossible to engineer.
>How can you set up the
>basic sizeof(char) for this one, since you probably shouldn't have
>fractional sizes. Do you set up a char pusedo-type that is the greatest
>common factor of all the real types' sizes?
*shrug* one solution is to define char as having 3 octets and ignore
the 5-octet size. You then define int as say 6 octets, long as 12,
double as 12, etc etc. Whatever you like. If 3 octets is addressable
then its acceptable to use.
Mark McIntyre
C- FAQ: http://www.eskimo.com/~scs/C-faq/top.html
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: James Kuyper <kuyper@wizard.net>
Date: 2000/01/04 Raw View
Daryle Walker wrote:
>
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size? Don't they also expect that
Yes.
> the char type to used as the normal character set? After a while,
Not necessarily. Proponents of internationalization would say that
wchar_t is the normal character set.
....
> A related problem is that of types that have sizes that aren't direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the greatest
> common factor of all the real types' sizes?
The implementor decides what size to use for char. All other types must
have sizes that are integer multiples of that size. That might require
that they contain padding. The implementation must decide for each of
the other types whether or not to pad it even further, out to a multiple
of the addressable size. If an implementation chooses to make 'char'
smaller than the smallest addressable unit of memory, then pointers to
types which are not exact multiples of the addressable unit will have to
contain both an address and a byte offset. This is not a mere
theoretical construct; it's been used for 'char *' pointers on real
machines where the addressable unit is 32 bits, and char was chosen to
be 8 bits.
Your 3/5 machine is already an extremely peculiar machine; if the
addressable unit doesn't divide both 3 octets and 5 octets evenly, then
it would be an even more bizarre machine. 1 octet is the largest size
that fits evenly in both 3 octets and 5 octets, so I'll assume that it
is addressable on octet boundaries. In that case, If the 5 octet type is
the largest type with native support, I'd make the following choices:
char: 8 bits
short: 24 bits
int: 24 bits
long: 40 bits
long long: 80 bits, implemented internally as long[2]
Another half-way plausible alternative: the addressable unit is 40 bits,
but the machine has support for 24 bit registers. In that case, I'd do
everything the same except that 'int' would be 40 bits, rather than 24.
This means that 'char *', 'void *', and 'short *' would all have more
complicated implementations than 'int *'.
Note: these are just my choices; the standard allows infinitely many
other ways of implementing it. In fact, while I was revising this
message I tried out three different ones.
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/04 Raw View
Mark McIntyre <mark@garthorn.demon.co.uk> writes:
|> >In fact, you don't have to go far to find a case that breaks
|> >C/C++'s assumption. The Java Virtual Machine has this mixed case.
|> >(I'm assuming that a C or C++ compiler, and not Java, is being used
|> >to compile a program to the JVM.) The most basic type, byte, is of
|> >a size of one octet. The normal character size, char, takes two
|> >octets. How is this solved?
|> C defines char as above, and wchar as a two-byte type.
On my machine, wchar_t is four bytes (which rather surprised me -- I
expected two). C doesn't say anything about wchar_t, except that it
must be a typedef to an integral type; it *can* legally be a typedef to
char.
|> >Do you just consider the byte type to be 'char,' even
|> >though the machine doesn't consider it such?
|> If "byte" can be addressed, and can hold the character set, then it
|> can be used as "char".
For all practical purposes, byte is as good as char, except that it is
signed. (But most C implementations have signed char anyway.)
|> >A related problem is that of types that have sizes that aren't direct
|> >multiples of another. What if a machine has its smallest type be 3
|> >octets, and its next type be 5 octets in size.
|> I suspect that such an architecture would be impossible to engineer.
|> >How can you set up the
|> >basic sizeof(char) for this one, since you probably shouldn't have
|> >fractional sizes. Do you set up a char pusedo-type that is the greatest
|> >common factor of all the real types' sizes?
|> *shrug* one solution is to define char as having 3 octets and ignore
|> the 5-octet size. You then define int as say 6 octets, long as 12,
|> double as 12, etc etc. Whatever you like. If 3 octets is addressable
|> then its acceptable to use.
As Steve Clamage pointed out, there is no requirement that sizeof( int )
be an integral multiple of sizeof( short ). The *only* requirement is
that sizeof( char ) == 1.
--
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orient e objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Ron Natalie <ron@sensor.com>
Date: 2000/01/05 Raw View
Daryle Walker wrote:
>
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size?
True.
> Don't they also expect that > the char type to used as the normal character set?
The term the standard use is basic character set.
> Couldn't the most basic type and the normal character type be _mutually
> exclusive_?!
Nope.
> In fact, you don't have to go far to find a case that breaks C/C++'s
> assumption. The Java Virtual Machine has this mixed case. (I'm
> assuming that a C or C++ compiler, and not Java, is being used to
> compile a program to the JVM.) The most basic type, byte, is of a size
> of one octet. The normal character size, char, takes two octets. How
> is this solved? Do you just consider the byte type to be 'char,' even
> though the machine doesn't consider it such?
I don't know what Java has to do with C++ here. There really isn't
any requirement on the 'basic character set.' It doesn't need to hold
the range of symbols that the implementation supports, there are
extended
(multibyte/wide) character types for this.
> A related problem is that of types that have sizes that aren't direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the greatest
> common factor of all the real types' sizes?
That's a pretty far stretch. Most machines that I've seen that have
goofy
sized things have implementations that simulate octet sized chars within
the implementation. A good example of this is a CRAY. It has 24 bit
addresses
and 64 bit words, but just about everything else needs to be simulated
in
software.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Jack Klein <jackklein@att.net>
Date: 2000/01/05 Raw View
On 3 Jan 2000 17:35:50 GMT, dwalker07@snet.net.invalid (Daryle Walker)
wrote in comp.lang.c:
>
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size? Don't they also expect that
> the char type to used as the normal character set? After a while,
> especially after seeing the first point mentioned a few times, I have a
> problem with this.
>
> Couldn't the most basic type and the normal character type be _mutually
> exclusive_?!
Not in C they can't. A case in point is common 32 bit DSPs, such as
those from Texas Instruments and Analog Devices. All of the C90
integral data types are 32 bits in size, and have identical minimum
and maximum ranges.
> In fact, you don't have to go far to find a case that breaks C/C++'s
> assumption. The Java Virtual Machine has this mixed case. (I'm
> assuming that a C or C++ compiler, and not Java, is being used to
> compile a program to the JVM.) The most basic type, byte, is of a size
> of one octet. The normal character size, char, takes two octets. How
> is this solved? Do you just consider the byte type to be 'char,' even
> though the machine doesn't consider it such?
Why would you assume that a C or C++ compiler is being used to compile
byte code for a JVM? In any case, it makes no difference. Just
because the hardware architecture can address single octets does not
mean that C must. There is no reason that a C compiler can't be made
where CHAR_BIT is 16, even the processor can directly address 8 bit
octets.
> A related problem is that of types that have sizes that aren't direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the greatest
> common factor of all the real types' sizes?
This seems an unlikely situation in any processor architecture, but
there is no problem there. If CHAR_BIT is 24 and char contains three
octets, the integral data type using five octets (call it long,
because a long can't fit in 24 bits) would occupy two bytes (48 bits,
6 octets) in memory. 40 bits would contain value and sign in the
signed version of the type, the remaining 8 would be padding which
would not contribute to the value. sizeof (long) would be 2.
After all, this has been working for more than 25 years now.
Jack Klein
--
Home: http://jackklein.home.att.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Eric Petroelje" <petroele@csis.gvsu.edu>
Date: 2000/01/05 Raw View
Daryle Walker <dwalker07@snet.net.invalid> wrote in message
news:1e3t5pg.15igveh120d6psN%dwalker07@snet.net.invalid...
>
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size? Don't they also expect that
> the char type to used as the normal character set? After a while,
> especially after seeing the first point mentioned a few times, I have a
> problem with this.
>
> Couldn't the most basic type and the normal character type be _mutually
> exclusive_?!
I suppose they could, but who says that char is the most basic type? It is
simply coincidence
that computers count things in bytes, and a char is one byte long. They are
in no way dependant on one
another.
>
> In fact, you don't have to go far to find a case that breaks C/C++'s
> assumption. The Java Virtual Machine has this mixed case. (I'm
> assuming that a C or C++ compiler, and not Java, is being used to
> compile a program to the JVM.) The most basic type, byte, is of a size
> of one octet. The normal character size, char, takes two octets. How
> is this solved? Do you just consider the byte type to be 'char,' even
> though the machine doesn't consider it such?
I don't think it matters, Java probably uses Unicode characters, which is a
16-bit encoding sceme, and
corresponds to the C/C++ w_char type.
>
> A related problem is that of types that have sizes that aren't direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the greatest
> common factor of all the real types' sizes?
>
A char is still only one byte in size, and why would a machine's smallest
type be 3 bytes long.. all that
sizeof() means is "how many bytes of memory does this type take up?"...
maybe i'm mis-understanding the question
> --
> Daryle Walker
> Video Game, Mac, and Internet Junkie
> dwalker07 AT snet DOT net
>
>
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/05 Raw View
"John E. Gwyn" <JEGwyn@compuserve.com> writes:
|> > [Java's] most basic type, byte, is of a size of one octet.
|> > The normal character size, char, takes two octets. How
|> > is this solved?
|> What's to "solve"? Java did what C should have done in this
|> regard, except that allotting just 16 bits for a universal
|> character encoding was stupid.
What did Java do that C/C++ didn't? Neither have a character type.
Both require storing characters in small integers. About all that Java
did was to provide the routines for handling wchar_t's from the start,
and only those routines -- Java's library support for
internationalization is better than that of C or C++ (but only
marginally), but I don't see any fundamental differences in this regard
in the language.
--=20
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/05 Raw View
Holger Eitzenberger <eitz@weh.rwth-aachen.de> writes:
|> In comp.lang.c Daryle Walker <dwalker07@snet.net.invalid> wrote:
|> > Don't the C and C++ standards define sizeof(char) to be 1, with all
|> > other types having a multiple of this size? Don't they also expect=
that
|> > the char type to used as the normal character set? After a while,
|> > especially after seeing the first point mentioned a few times, I ha=
ve a
|> > problem with this.
|> According to the FAQ sizeof(char) is _by definition_ 1. And _no_,
|> there may be implementations where character constants are stored in
|> an int by default (so sizeof('a') > 1).
In C, 'a' has type int, and so sizeof('a') =3D=3D sizeof(int). In all
conforming implementations. In C++, 'a' has type char, and so
sizeof('a') =3D=3D sizeof( char ) =3D=3D 1 in all conforming implementati=
ons.
In general, of course, if you are programming for an international
market, you will use L'a', whose type is wchar_t, instead of 'a'. In
C++, this guarantees that sizeof('a') <=3D sizeof(L'a'). In C, there is
nothing you can say about the relationship; I would not be surprised by
an implementation of C where sizeof('a') > sizeof(L'a').
--=20
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: James Kuyper <kuyper@wizard.net>
Date: 2000/01/05 Raw View
Mark McIntyre wrote:
....
> AFAIR C gets round this by defining "char" by statiing
> a) sizeof char == 1
> b) its AT LEAST large enough to hold the basic character set.
> c) its uniquely addressable in the machine architecture.
Not quite: it only needs to be uniquely addressable within C. The
addressable piece of memory could be multiple bytes long. In that case,
a C pointer would consist of a machine address and a byte offset.
In fact, an implementation could completely ignore the hardware's
natural efficiencies. It could define char as 10 bits, and store 4
10-bit characters in 5 octets on a machine where the addressable unit
was the octet. It could even store each of the 10 bits in a different
octet. A pointer would have to contain the information needed to know
where those bits were stored; the only requirement is that a multi-byte
object would have to be made up of the same bits as those pointed to by
(char *)&object, 1+(char *)&object, ...
(sizeof(object)-1)+(char*)&object.
Such an implementation would be highly inefficient, but perfectly legal.
(sig_atomic_t would be difficult to implement correctly, but that's not
unusual).
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "John E. Gwyn" <JEGwyn@compuserve.com>
Date: 2000/01/05 Raw View
Mark McIntyre wrote:
> C defines char as above, and wchar as a two-byte type.
No. In a conforming implementation, wchar_t must be at least one
"byte" (measured in chars), but is not necessarily two bytes wide.
In the minimalist implementation I contributed to Jutta's C site,
wchar_t is a single byte, but more usually (from compiler vendors
who need to support international users) it ranges from 16 bits
to 32 bits. - Douglas
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: David Schwartz <davids@webmaster.com>
Date: 2000/01/05 Raw View
Daryle Walker wrote:
> A related problem is that of types that have sizes that aren't direct
> multiples of another. What if a machine has its smallest type be 3
> octets, and its next type be 5 octets in size. How can you set up the
> basic sizeof(char) for this one, since you probably shouldn't have
> fractional sizes. Do you set up a char pusedo-type that is the greatest
> common factor of all the real types' sizes?
The sizes of every type have to be expressible as a multiple of
something. Otherwise, 'sizeof' could not exist. Whatever that thing is
that there are multiples of is what C calls a 'char'. If that's not
convenient for a character set, don't use it. In fact, most C++ string
code does just this.
DS
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Paul Jarc <prj@po.cwru.edu>
Date: 2000/01/05 Raw View
Holger Eitzenberger <eitz@weh.rwth-aachen.de> writes:
> there may be implementations where character constants are stored in
> an int by default (so sizeof('a') > 1).
In C, character constants are always of type int. (But I would shy
away from saying they are "stored" in anything at all, to avoid
confusion.)
paul
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Paul Jarc <prj@po.cwru.edu>
Date: 2000/01/05 Raw View
Mark McIntyre <mark@garthorn.demon.co.uk> writes:
> C defines char as above, and wchar as a two-byte type.
wchar_t need not be two bytes. AFAIK, there is no specified minimum
range for wchar_t, so it could be char - even 8-bit char - if that's
big enough to hold the largest character set supported by that
implementation. And, of course, it can also be larger than two bytes.
paul
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: corbett@lupa.Sun.COM (Robert Corbett)
Date: 2000/01/05 Raw View
In article <86u2kt60cw.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
>
>On my machine, wchar_t is four bytes (which rather surprised me -- I
>expected two). C doesn't say anything about wchar_t, except that it
>must be a typedef to an integral type; it *can* legally be a typedef to
>char.
UCS-4 requires 31 bits. Since at least one amendment to ISO/IEC 10646-1
goes outside the BMP, representing all characters in ISO/IEC 10646 in
16 bits requires horrible kludges such as UTF-16.
Sincerely,
Bob Corbett
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Paul Jarc <prj@po.cwru.edu>
Date: 2000/01/05 Raw View
dwalker07@snet.net.invalid (Daryle Walker) writes:
> What if a machine has its smallest type be 3 octets, and its next
> type be 5 octets in size. How can you set up the basic sizeof(char)
> for this one, since you probably shouldn't have fractional sizes.
It would depend on the addressibility and alignment constraints of the
hardware. Perhaps the most likely outcome is that C would not be
implemented on such a platform.
paul
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Paul Jarc <prj@po.cwru.edu>
Date: 2000/01/05 Raw View
dwalker07@snet.net.invalid (Daryle Walker) writes:
> Don't the C and C++ standards define sizeof(char) to be 1, with all
> other types having a multiple of this size?
Yes, but that doesn't mean char is 8 bits. Interpret that definition
as providing the meaning of 1, rather than the value of sizeof(char).
paul
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/05 Raw View
corbett@lupa.Sun.COM (Robert Corbett) writes:
|> In article <86u2kt60cw.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
|> >On my machine, wchar_t is four bytes (which rather surprised me -- I
|> >expected two). C doesn't say anything about wchar_t, except that it
|> >must be a typedef to an integral type; it *can* legally be a typedef to
|> >char.
|> UCS-4 requires 31 bits.
But neither C nor C++ require UCS-4.
--
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orient e objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/05 Raw View
"Eric Petroelje" <petroele@csis.gvsu.edu> writes:
|> I don't think it matters, Java probably uses Unicode characters,
|> which is a 16-bit encoding sceme, and corresponds to the C/C++
|> w_char type.
The C standard (and the C++ standard as well, I think) makes absolutly
no guarantee as to what a wchar_t holds. On my machine (Sun Sparc,
Solaris 7.0), it is typedef'ed to either a long or an int -- both 32
bits. If I remember correctly, under Windows and VC++, it was
typedef'ed to a short (or an unsigned short) -- 16 bits. And a
conforming implementation could legally typedef it to a char.
--
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orient e objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: corbett@lupa.Sun.COM (Robert Corbett)
Date: 2000/01/06 Raw View
In article <86vh58gsj1.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
>
>corbett@lupa.Sun.COM (Robert Corbett) writes:
>
>|> In article <86u2kt60cw.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
>
>|> >On my machine, wchar_t is four bytes (which rather surprised me -- I
>|> >expected two). C doesn't say anything about wchar_t, except that it
>|> >must be a typedef to an integral type; it *can* legally be a typedef to
>|> >char.
>
>|> UCS-4 requires 31 bits.
>
>But neither C nor C++ require UCS-4.
And your point is?
Sincerely,
Bob Corbett
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Matt Austern <austern@sgi.com>
Date: 2000/01/06 Raw View
kanze@gabi-soft.de writes:
> "Eric Petroelje" <petroele@csis.gvsu.edu> writes:
>
> |> I don't think it matters, Java probably uses Unicode characters,
> |> which is a 16-bit encoding sceme, and corresponds to the C/C++
> |> w_char type.
>
> The C standard (and the C++ standard as well, I think) makes absolutly
> no guarantee as to what a wchar_t holds. On my machine (Sun Sparc,
> Solaris 7.0), it is typedef'ed to either a long or an int -- both 32
> bits. If I remember correctly, under Windows and VC++, it was
> typedef'ed to a short (or an unsigned short) -- 16 bits. And a
> conforming implementation could legally typedef it to a char.
A conforming C implementation yes, but not a conforming C++
implementation. (I see that this is crossposted to both groups.) In
C++ wchar_t is a distinct type in its own right, not a typedef for
some other integer type. See 3.9.1/5.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Paul Jarc <prj@po.cwru.edu>
Date: 2000/01/06 Raw View
"Eric Petroelje" <petroele@csis.gvsu.edu> writes:
> Daryle Walker <dwalker07@snet.net.invalid> wrote in message
> news:1e3t5pg.15igveh120d6psN%dwalker07@snet.net.invalid...
> > Couldn't the most basic type and the normal character type be _mutually
> > exclusive_?!
>
> I suppose they could, but who says that char is the most basic type?
> It is simply coincidence that computers count things in bytes, and a
> char is one byte long. They are in no way dependant on one another.
In the C standard, the word "byte" refers to the amount of storage
needed to hold an object of type char, however much that may be. It
need not be 8 bits. C bytes must be at least 8 bits, but can be 9 or
12 or 16 or 333667 bits if the implementor chooses.
> > What if a machine has its smallest type be 3 octets, and its next
> > type be 5 octets in size.
>
> A char is still only one byte in size, and why would a machine's
> smallest type be 3 bytes long..
He didn't say 3 bytes, he said 3 octets. There are indeed some
machines whose smallest addressable amount of storage is not 8 bits.
> all that sizeof() means is "how many bytes of memory does this type
> take up?"...
Correct, if you mean "how many times the storage required for 1 char"
and not "how many octets".
> maybe i'm mis-understanding the question
You seem to be misunderstanding "byte", but I'm not sure.
paul
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Steve Clamage <stephen.clamage@sun.com>
Date: 2000/01/06 Raw View
Robert Corbett wrote:
>
> In article <86vh58gsj1.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
> >
> >corbett@lupa.Sun.COM (Robert Corbett) writes:
> >
> >|> In article <86u2kt60cw.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
> >
> >|> >On my machine, wchar_t is four bytes (which rather surprised me -- I
> >|> >expected two). C doesn't say anything about wchar_t, except that it
> >|> >must be a typedef to an integral type; it *can* legally be a typedef to
> >|> >char.
> >
> >|> UCS-4 requires 31 bits.
> >
> >But neither C nor C++ require UCS-4.
>
> And your point is?
I think the point is that in this thread some people have been mixing
together requirements of the C or C++ standards with requirements of
other systems.
The C and C++ standards leave it up to the implementer to pick a size
and representation for wide characters. One valid choice (according to
those standards) is for wchar_t to be 8 bits and identical to type char.
If you had a separate requirement to use (for example) UCS-4, you would
not be able to use that C or C++ implementation. But that's a separate
subject; the implementation can still conform to the C or C++ standard.
--
Steve Clamage, stephen.clamage@sun.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: James Kuyper <kuyper@wizard.net>
Date: 2000/01/06 Raw View
Robert Corbett wrote:
>
> In article <86vh58gsj1.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
> >
> >corbett@lupa.Sun.COM (Robert Corbett) writes:
> >
> >|> In article <86u2kt60cw.fsf@gabi-soft.de>, <kanze@gabi-soft.de> wrote:
> >
> >|> >On my machine, wchar_t is four bytes (which rather surprised me -- I
> >|> >expected two). C doesn't say anything about wchar_t, except that it
> >|> >must be a typedef to an integral type; it *can* legally be a typedef to
> >|> >char.
> >
> >|> UCS-4 requires 31 bits.
> >
> >But neither C nor C++ require UCS-4.
>
> And your point is?
Follow the thread backwards - you've quoted the relevant text: the point
was that "C doesn't say anything about [the size of] wchar_t". That
comment was made in reply to the incorrect claim that "C defines ...
wchar as two bytes".
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: kanze@gabi-soft.de
Date: 2000/01/07 Raw View
Matt Austern <austern@sgi.com> writes:
|> kanze@gabi-soft.de writes:
|> > "Eric Petroelje" <petroele@csis.gvsu.edu> writes:
|> > |> I don't think it matters, Java probably uses Unicode characters,
|> > |> which is a 16-bit encoding sceme, and corresponds to the C/C++
|> > |> w_char type.
|> > The C standard (and the C++ standard as well, I think) makes absolutly
|> > no guarantee as to what a wchar_t holds. On my machine (Sun Sparc,
|> > Solaris 7.0), it is typedef'ed to either a long or an int -- both 32
|> > bits. If I remember correctly, under Windows and VC++, it was
|> > typedef'ed to a short (or an unsigned short) -- 16 bits. And a
|> > conforming implementation could legally typedef it to a char.
|> A conforming C implementation yes, but not a conforming C++
|> implementation. (I see that this is crossposted to both groups.) In
|> C++ wchar_t is a distinct type in its own right, not a typedef for
|> some other integer type. See 3.9.1/5.
It's not a typedef, but it's not totally independant either: "Type
wchar_t shall have the same size, signedness, and alignment requirements
as one of the other integral types, called its underlying type." And
although the C++ standard requires that its "values can represent
distinct codes for all members of the largest extended character set
specified among the supported locales", the only required locale is "C",
which only requires the basic character set (of roughly 100 characters),
so wchar_t can be identical to a char.
--
James Kanze mailto:James.Kanze@gabi-soft.de
Conseils en informatique orientie objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelh|ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]