Topic: B. Stroustrup, sizeof(int) and sizeof(char)


Author: Allan_W@my-dejanews.com (Allan W)
Date: Wed, 15 May 2002 17:48:50 GMT
Raw View
James Kanze <kanze@gabi-soft.de> wrote
> ...DWORD is an assembler concept...
which has always been twice the size of a WORD. The earliest Windows
programs used computers with an 8-bit BYTE and a 16-bit WORD.

In C++, an int used to be the same as a short int on all Windows
platforms. When int moved to 32 bits, an unusual burst of clarity
convinced someone to keep short at 16 bits.

> ...I don't see what something like DWORD is doing
> in a C++, or even a C, interface. Somebody must have seriously muffed
> the design, a long time ago.

The earliest Windows programs were written in C and assembly language.
For communication purposes, the two need to understand the same data
structures in the same way... if assembly says that something is a
DWORD, it makes sense to do the same in C (and later in C++).


======================================= MODERATOR'S COMMENT:
 This is drifting off-topic for comp.std.c++.
Further debate over whether DWORD is a sensible name
for a 32-bit quantity on a 32-bit computer should be
taken to a more suitable forum.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Anthony Williams"<anthwil@nortelnetworks.com>
Date: Mon, 13 May 2002 17:21:48 GMT
Raw View
"Gabriel Dos_Reis" <gdosreis@sophia.inria.fr> wrote in message
news:xaj1yckgg7p.fsf@perceval.inria.fr...
> "Anthony Williams"<anthwil@nortelnetworks.com> writes:
>
> [...]
>
> | > | > We're not discussing a signed integer type in general.  We're
> | > | > discussing 'signed char' and the possiblities allowed by the
standard
> | > | > text.
> |
> | > | Those possibilities include the 8-bit version of the above example.
> |
> | > On which basis can you say that a bit pattern is illegal for a 'signed
> | > char'? You can't certainly rely on the standard text since it says
> | > that no matter what bit pattern you put in a 'signed char', accessing
> | > the object what trap trap, therefore it gives you some value of that
> | > type.
> |
> | It is illegal in the sense that you cannot perform an operation (other
than
> | assignment from such a value) on a char that will yield such a value.
>
> You can read from it.

Hence "other than assignment from such a value" --- assignment requires
reading the source and writing the destination. If the source and
destination are different types, then overflow/underflow is allowed. It is
only where they are the same type (plain 'char') that there is an issue.

> | The only way to come by such a value is by accessing the representation
of a
> | non-char POD type through an lvalue of char type,
>
> More specifically, you can put any bit pattern in an 'unsigned char',
> then copy it in a 'char', and read the value of the resulting char-object.

... if you cast the 'unsigned char' lvalue to a 'char' lvalue before
reading.

> [...]
>
> | If the platform I was using had such illegal values for 'char', then I
would
> | expect any operation on such an illegal value to yield undefined
behaviour,
>
> Accessing a char object cannot produce an undefined behaviour.

I should have added "with the exception of assignment", but I thought that
was implied from my comment above.

> | in much the same way that overflow and underflow on signed integer types
> | yields undefined behaviour (5p5).
>
> The point here is that that analogy doesn't hold.

I was just pointing out that operations on signed integer types can yield
undefined behaviour in order. I believe that other than being able to read
such a value from one 'char' object, and write it to another 'char' object,
there are no operations you are guaranteed to be able to perform with
defined results on an 'illegal'  value of 'char' --- the standard only
guarantees the copying (so you can copy the value of another POD type as a
char array), not that you are allowed to even compare such objects.

Of course this only applies where 'char' is signed, since 'signed char' is
just a 'signed integer type', so should you prove to be right, it just means
that 'char' must be unsigned on platforms that use a signed integer
representation that has 'illegal' bit patterns.

Anthony


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos_Reis <gdosreis@sophia.inria.fr>
Date: Fri, 10 May 2002 15:54:10 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| >
| > "James Kuyper Jr." <kuyper@wizard.net> writes:
| >
| > | Gabriel Dos Reis wrote:
| > | > "Anthony Williams"<anthwil@nortelnetworks.com> writes:
| ....
| > | > | "All bits participate" does not meen all bit patterns are legal. e.g. (3
| > | > | bits)
| > | >
| > | > The point here is that no matter what bit pattern you put in a
| > | > character object you'll get out a character value and the standard
| > | > guarantees that all bits will participate in the resulting value.
| > | > Therefore my sentence.
| > |
| > | Yes. That is what your sentence says. It's not what the standard's
| > | sentence says, however.
| >
| > That is what you get by straight inference from general dispositions
| > in the standard.
|
| Yes, but not by correct inference from the standard.

The inference was correct since it relies on specific dispositions of
the standard.

| We've reached the point when we're in perfect agreement about what the
| relevant words in the standard are, and in unreconcilable disagreement
| about the correct reading of those words.

The salient thing is that there are not so minor difference betwen C
and C++ wordings.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos_Reis <gdosreis@sophia.inria.fr>
Date: Fri, 10 May 2002 17:18:27 GMT
Raw View
"Anthony Williams"<anthwil@nortelnetworks.com> writes:

[...]

| > | > We're not discussing a signed integer type in general.  We're
| > | > discussing 'signed char' and the possiblities allowed by the standard
| > | > text.
|
| > | Those possibilities include the 8-bit version of the above example.
|
| > On which basis can you say that a bit pattern is illegal for a 'signed
| > char'? You can't certainly rely on the standard text since it says
| > that no matter what bit pattern you put in a 'signed char', accessing
| > the object what trap trap, therefore it gives you some value of that
| > type.
|
| It is illegal in the sense that you cannot perform an operation (other than
| assignment from such a value) on a char that will yield such a value.

You can read from it.

| The only way to come by such a value is by accessing the representation of a
| non-char POD type through an lvalue of char type,

More specifically, you can put any bit pattern in an 'unsigned char',
then copy it in a 'char', and read the value of the resulting char-object.

[...]

| If the platform I was using had such illegal values for 'char', then I would
| expect any operation on such an illegal value to yield undefined behaviour,

Accessing a char object cannot produce an undefined behaviour.

| in much the same way that overflow and underflow on signed integer types
| yields undefined behaviour (5p5).

The point here is that that analogy doesn't hold.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Thu, 2 May 2002 20:04:51 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> [...]
>
> | There is a requirement (3.9.1p1) that "for character types, all bits of
> | a the object represeentation participate in the value representation."
> | However, the requirement that "all posssible bit patters of the value
> | representation represent numbers" applies only "For unsigned character
> | types". 'char' is allowed to be a signed character type, and is
> | therefore allowed to have bit patters that don't represent numbers.
>
> Huh? What can they represent then?

Nothing. They're invalid bit patterns for that type. This shouldn't seem
odd; it's commonplace for pointers and floating point types. While it's
less common for integer types, it's perfectly legal.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Thu, 2 May 2002 20:04:55 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
....
> | She didn't quoted any text which gurantees that all possible bit
>
> [ I know you're using what is now known as a post-modern style, where
>   "she" is used for everything.  But I'll appreciate if you could
>   refrain from using "she" when you're referring to me (Gabriel Dos
>   Reis) ]

No, I wasn't using post-modern style. I use gender-specific pronouns
only when I (think that I) know the applicable gender (and usually not
even then). My apologies - for some reason, every time I see your name,
I think "Gabriela", even though what I'm actually reading is "Gabriel".
I've been making this mistake for years now, and I first noticed the
mistake years ago, but I still keep on making it. Sorry!

....
> Note that the standard explicitly says:
>
>   For character types, all bits of the object representation participate
>   in the value representation.
>
> so by that sentence all bits patterns in a character representation
> represent a *character*.  So you've got your quote. In particular,
> there is no trap representation for "char"s.

A bit participates in the representation so long as there are even two
valid bit patterns, that differ only in that bit, which represent
different values. You can't conclude from the participation requirement
that all bit patterns are legal.

> The next sentence is
>
>   For unsigned character types, all possible bit patterns of the value
>   representation represent numbers.
>
> The precision here is whether all bit patterns in a character
> representation represent a *number*.  The standard is making a
> distinction between a character and a number.
>
> | The only such requirement that I'm aware of is the one that is
> | restricted to unsigned char.
>
> I'm under the impression that you're confusing numbers and
> characters.  See above.

Only in the same sense that the standard does. 'a' is a number, as far
as the C/C++ standards are concerned.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Anthony Williams"<anthwil@nortelnetworks.com>
Date: Thu, 2 May 2002 20:05:28 GMT
Raw View
"Gabriel Dos Reis" <dosreis@cmla.ens-cachan.fr> wrote in message
news:fl4rhritmu.fsf@jambon.cmla.ens-cachan.fr...
> Note that the standard explicitly says:
>
>   For character types, all bits of the object representation participate
>   in the value representation.
>
> so by that sentence all bits patterns in a character representation
> represent a *character*.  So you've got your quote. In particular,
> there is no trap representation for "char"s.

"All bits participate" does not meen all bit patterns are legal. e.g. (3
bits)

000 == 0
001 == 1
010 == 2
011 == 3
100 == illegal, trap representation
101 == -1
110 == -2
111 == -3

All 3 bits participate (otherwise you couldn't get the negative numbers),
but 100 is not a valid bit pattern.

> The next sentence is
>
>   For unsigned character types, all possible bit patterns of the value
>   representation represent numbers.
>
> The precision here is whether all bit patterns in a character
> representation represent a *number*.  The standard is making a
> distinction between a character and a number.

I think the standard is distinguishing between unsigned types (for which all
8 bit patterns of my 3-bit type above would have to be legal), and signed
types (which can have invalid bit patterns, such as 100 above)

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optical Components Ltd
The opinions expressed in this message are not necessarily those of my
employer


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: MINTIspamblock@YAHOO.COM (Minti)
Date: Thu, 2 May 2002 20:06:29 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message news:<3CD09229.FBAEBD4D@wizard.net>...
> Gabriel Dos Reis wrote:
> >
> > "James Kuyper Jr." <kuyper@wizard.net> writes:
> >
> > | James Kanze wrote:
> > | ....
> > | > I'm not sure I understand this.  How can a bit pattern not represent a
> > | > number? ...
> > |
> > | C99 has a name for such bit patterns: trap representations.
> >
> > There is no trap representation for 'char'.
>
> The C99 standard does not say that 'char' can't have trap
> representations.

char can't have trap representations.

http://groups.google.com/groups?q=g:thl826152737d&dq=&hl=en&selm=20020214.0011.47833snz%40genesis.demon.co.uk

> It says that if that value of an object is accessed
> through an lvalue for which that object's bit pattern is  a trap
> repsentation, the behavior is undefined, except when that lvalue has a
> character type.

As has been raised elsethread that in 1's complement machine we can
have ~0 to be trap representation. So it does not have to be a locater
value.

> But it doesn't say that character types don't have trap
> representations.

Read above.

> The defining characteristic of trap representations is
> that they are bit patterns an object can possess, which don't represent
> values of that object's type.

>They aren't actually required to 'trap'.

Make your mind trap or no trap.

> The key thing about them is that they aren't included in the range of
> valid values; as such they don't get included in the range CHAR_MIN,
> CHAR_MAX.

The values greater than CHAR_MAX may be vaild. Consider a 10 bit rep.
of char, with 2 padding bit, it could be having
3FF, to be a legal value.

>
> In any event, C++ doesn't define the term "trap representation", and
> none of the corresponding specifications, so none of the complex issues
> raised above matter. It does, however, have the concept that the term
> refers to:  bit patterns that don't represent valid values of the
> objects type. It has that concept, by way of expressing a guarantee that
> 'unsigned char' doesn't have any such bit patterns. However, that's the
> only type the guarantee applies to.

P.S. As for the matter of the types, then even Windows "calc.exe"
considers Word to be 16 bit.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alex Dicks <mjvww498as001@sneakemail.com>
Date: Thu, 2 May 2002 18:24:46 CST
Raw View
"Anthony Williams"<anthwil@nortelnetworks.com> writes:

> "All bits participate" does not meen all bit patterns are legal. e.g. (3
> bits)
>
> 000 == 0
> 001 == 1
> 010 == 2
> 011 == 3
> 100 == illegal, trap representation
> 101 == -1
> 110 == -2
> 111 == -3
>
> All 3 bits participate (otherwise you couldn't get the negative numbers),
> but 100 is not a valid bit pattern.
>
> I think the standard is distinguishing between unsigned types (for which all
> 8 bit patterns of my 3-bit type above would have to be legal), and signed
> types (which can have invalid bit patterns, such as 100 above)

I'd just like to add that this example is true on a signed magnitude
system.  There are three ways of doing signed integers:

- signed magnitude: MSB is 0 for positive, 1 for negative, and other
  bits are distance from 0: 100 is an illegal alternative for 0

- one's complement:
    011 = 3
    010 = 2
    001 = 1
    000 = 0
    111 = illegal
    110 = -1
    101 = -2
    100 = -3

- two's complement:
    011 = 3
    010 = 2
    001 = 1
    000 = 0
    111 = -1
    110 = -2
    101 = -3
    100 = -4

The standard doesn't mandate any of these, and allows not all bit
patterns to be legal to permit the first two techniques to be used.
However, most modern machnines do use two's complement representation,
as it allows for slightly simpler addition and subtraction routines,
that just ignore overflow.

--
Alex

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Fri, 3 May 2002 17:32:08 GMT
Raw View
Alex Dicks <mjvww498as001@sneakemail.com> writes:

| "Anthony Williams"<anthwil@nortelnetworks.com> writes:
|
| > "All bits participate" does not meen all bit patterns are legal. e.g. (3
| > bits)
| >
| > 000 == 0
| > 001 == 1
| > 010 == 2
| > 011 == 3
| > 100 == illegal, trap representation
| > 101 == -1
| > 110 == -2
| > 111 == -3
| >
| > All 3 bits participate (otherwise you couldn't get the negative numbers),
| > but 100 is not a valid bit pattern.
| >
| > I think the standard is distinguishing between unsigned types (for which all
| > 8 bit patterns of my 3-bit type above would have to be legal), and signed
| > types (which can have invalid bit patterns, such as 100 above)
|
| I'd just like to add that this example is true on a signed magnitude
| system.

We're not discussing a signed integer type in general.  We're
discussing 'signed char' and the possiblities allowed by the standard
text.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Fri, 3 May 2002 20:02:31 GMT
Raw View
"Anthony Williams"<anthwil@nortelnetworks.com> writes:

| "Gabriel Dos Reis" <dosreis@cmla.ens-cachan.fr> wrote in message
| news:fl4rhritmu.fsf@jambon.cmla.ens-cachan.fr...
| > Note that the standard explicitly says:
| >
| >   For character types, all bits of the object representation participate
| >   in the value representation.
| >
| > so by that sentence all bits patterns in a character representation
| > represent a *character*.  So you've got your quote. In particular,
| > there is no trap representation for "char"s.
|
| "All bits participate" does not meen all bit patterns are legal. e.g. (3
| bits)

The point here is that no matter what bit pattern you put in a
character object you'll get out a character value and the standard
guarantees that all bits will participate in the resulting value.
Therefore my sentence.

| 000 == 0
| 001 == 1
| 010 == 2
| 011 == 3
| 100 == illegal, trap representation

That doesn't hold for a character type.  See the standard text.  For
an unsigned char, there is no dispute.  For a signed char, the above
can't trap, therefore it represent some value.  That might be subtile,
but it is a logical consequence of the wordings in the standard text.

[...]

| > The next sentence is
| >
| >   For unsigned character types, all possible bit patterns of the value
| >   representation represent numbers.
| >
| > The precision here is whether all bit patterns in a character
| > representation represent a *number*.  The standard is making a
| > distinction between a character and a number.
|
| I think the standard is distinguishing between unsigned types (for which all
| 8 bit patterns of my 3-bit type above would have to be legal), and signed
| types (which can have invalid bit patterns, such as 100 above)

The distinction is primary for character type.  While your trap
representation may be valid for a non character integer type, it can't
hold for a character type.  There is a definite distinction.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Fri, 3 May 2002 20:47:36 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| >
| > "James Kuyper Jr." <kuyper@wizard.net> writes:
| >
| > [...]
| >
| > | There is a requirement (3.9.1p1) that "for character types, all bits of
| > | a the object represeentation participate in the value representation."
| > | However, the requirement that "all posssible bit patters of the value
| > | representation represent numbers" applies only "For unsigned character
| > | types". 'char' is allowed to be a signed character type, and is
| > | therefore allowed to have bit patters that don't represent numbers.
| >
| > Huh? What can they represent then?
|
| Nothing. They're invalid bit patterns for that type. This shouldn't seem
| odd; it's commonplace for pointers and floating point types.

As used to be said "Proof by analogy is fraud".

I have no problem with trap representation for pointers or any
non-character type.  But by the standard words, there is no meaningful
of saying that "they represent nothing".

You can access any bit patterns stored in char object.  The standard
guarantees that accessing that object with an lvalue of character type
won't trap, therefore it gives you a value of the character type used
to access the object. Therefore that bit patterns *does* represent
some value of the character type.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 3 May 2002 20:52:37 GMT
Raw View
Minti wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> wrote in message news:<3CD09229.FBAEBD4D@wizard.net>...
....
> > The C99 standard does not say that 'char' can't have trap
> > representations.
>
> char can't have trap representations.
>
> http://groups.google.com/groups?q=g:thl826152737d&dq=&hl=en&selm=20020214.0011.47833snz%40genesis.demon.co.uk

That citation simply contains the bald assertion that the C standard
says char can't have trap representations; it contains no citations
backing up that claim. I've looked at every single sentence in that
standard which contains the word 'trap', none of them say any such
thing. I may have misinterpreted what I read, but if so, you'll need to
provide relevant citations to convince me.

....
> > The defining characteristic of trap representations is
> > that they are bit patterns an object can possess, which don't represent
> > values of that object's type.
>
> >They aren't actually required to 'trap'.
>
> Make your mind trap or no trap.

My mind's made up: in C99, "trap" representations are defined by
6.2.6.1p5 as ones that don't represent valid values of the specified
type, they aren't actually required to "trap" when read. This is no
odder than the fact that, in C++, null pointer constants cannot have a
pointer type. It's just a matter of jargon. Trap representations may
trap, they're just not required to do so; while non-trap representations
are not allowed to trap. Therefore, it's actually not quite as bad as
the NPC mis-naming.

> > The key thing about them is that they aren't included in the range of
> > valid values; as such they don't get included in the range CHAR_MIN,
> > CHAR_MAX.
>
> The values greater than CHAR_MAX may be vaild. Consider a 10 bit rep.
> of char, with 2 padding bit, it could be having
> 3FF, to be a legal value.

If 3FF is a valid value for a char, then CHAR_MAX must be >= 3FF. That's
part of what CHAR_MAX means.
The value bits are the bits that a type uses to represent valid values.
If char has 10 bits, of which 2 are padding bits, then it can only have
8 value bits. That's not compatible with 3FF being a valid value.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 4 May 2002 00:35:44 GMT
Raw View
Gabriel Dos Reis wrote:
>
> Alex Dicks <mjvww498as001@sneakemail.com> writes:
>
> | "Anthony Williams"<anthwil@nortelnetworks.com> writes:
> |
> | > "All bits participate" does not meen all bit patterns are legal. e.g. (3
> | > bits)
> | >
> | > 000 == 0
> | > 001 == 1
> | > 010 == 2
> | > 011 == 3
> | > 100 == illegal, trap representation
> | > 101 == -1
> | > 110 == -2
> | > 111 == -3
> | >
> | > All 3 bits participate (otherwise you couldn't get the negative numbers),
> | > but 100 is not a valid bit pattern.
> | >
> | > I think the standard is distinguishing between unsigned types (for which all
> | > 8 bit patterns of my 3-bit type above would have to be legal), and signed
> | > types (which can have invalid bit patterns, such as 100 above)
> |
> | I'd just like to add that this example is true on a signed magnitude
> | system.
>
> We're not discussing a signed integer type in general.  We're
> discussing 'signed char' and the possiblities allowed by the standard
> text.

Those possibilities include the 8-bit version of the above example.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 4 May 2002 06:54:55 CST
Raw View
Gabriel Dos Reis wrote:
>
> "Anthony Williams"<anthwil@nortelnetworks.com> writes:
>
> | "Gabriel Dos Reis" <dosreis@cmla.ens-cachan.fr> wrote in message
> | news:fl4rhritmu.fsf@jambon.cmla.ens-cachan.fr...
> | > Note that the standard explicitly says:
> | >
> | >   For character types, all bits of the object representation participate
> | >   in the value representation.
> | >
> | > so by that sentence all bits patterns in a character representation
> | > represent a *character*.  So you've got your quote. In particular,
> | > there is no trap representation for "char"s.
> |
> | "All bits participate" does not meen all bit patterns are legal. e.g. (3
> | bits)
>
> The point here is that no matter what bit pattern you put in a
> character object you'll get out a character value and the standard
> guarantees that all bits will participate in the resulting value.
> Therefore my sentence.

Yes. That is what your sentence says. It's not what the standard's
sentence says, however.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Sat, 4 May 2002 11:55:52 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| >
| > Alex Dicks <mjvww498as001@sneakemail.com> writes:
| >
| > | "Anthony Williams"<anthwil@nortelnetworks.com> writes:
| > |
| > | > "All bits participate" does not meen all bit patterns are legal. e.g. (3
| > | > bits)
| > | >
| > | > 000 == 0
| > | > 001 == 1
| > | > 010 == 2
| > | > 011 == 3
| > | > 100 == illegal, trap representation
| > | > 101 == -1
| > | > 110 == -2
| > | > 111 == -3
| > | >
| > | > All 3 bits participate (otherwise you couldn't get the negative numbers),
| > | > but 100 is not a valid bit pattern.
| > | >
| > | > I think the standard is distinguishing between unsigned types (for which all
| > | > 8 bit patterns of my 3-bit type above would have to be legal), and signed
| > | > types (which can have invalid bit patterns, such as 100 above)
| > |
| > | I'd just like to add that this example is true on a signed magnitude
| > | system.
| >
| > We're not discussing a signed integer type in general.  We're
| > discussing 'signed char' and the possiblities allowed by the standard
| > text.
|
| Those possibilities include the 8-bit version of the above example.

On which basis can you say that a bit pattern is illegal for a 'signed
char'? You can't certainly rely on the standard text since it says
that no matter what bit pattern you put in a 'signed char', accessing
the object what trap trap, therefore it gives you some value of that
type.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "David Thompson" <david.thompson1@worldnet.att.net>
Date: Wed, 1 May 2002 12:16:03 GMT
Raw View
James Kanze <kanze@gabi-soft.de> wrote :
(and for no apparent reason OE won't quote properly, sorry)
>>>
Witless <witless@attbi.com> writes:

|>  > In the meantime, the "natural" size of an int has grown to a
|>  > 32-bit DWORD on most machines,

|>  No it hasn't.  Most machines do not have DWORDs.  32-bit machines
|>  often have words and half words.

IBM 360's (the prototypical 32 bit machine) certainly have DWORDS.  A
DWORD is a 8 byte quantity, often initialized with 16 BCD digits.  (The
IBM 360 had machine instructions for all four operations on such
quantities, as well as instructions for 4 bit left and right shifts over
DWORDs.  Very useful for Cobol, or other languages that used decimal
arithmetic.
<<<

You've got that mixed up slightly.  S/360 et seq doubleword is 8
bytes, yes, but never to my knowledge called 'dword', and as such
(a D item in DC) is initialized by up to _19_ decimal digits plus
optional sign with 16 not particularly likely.  The 4+1 arithmetic
(and 3+1 boolean) operations are available on word registers from
S/360, and doublewords in the latest generation, z/Architecture.
Shifts (only) were available on a word pair = doubleword all the way
back to S/360, and of any number of bits (up to 63).

Like other >8bit items, a storage doubleword can be accessed
as bytes (and the word=4byte or now doubleword=8byte GPRs can
to a limited extent), and any 8 bytes (even unaligned, except special
cases like CDS) can be accessed as a doubleword.  _This_ would
include character, packed decimal (with 8 bytes storing _15_ digits
plus optional sign, initialized by up to 15 digits, but often 14 or 15
since fewer would have allowed use of a smaller field), and hex data
(up to, and often exactly, 16 hex digits, similarly).  Packed _is_ BCD
and has the 4 binary arithmetic operations (plus a few halves<G>)
and digitwise shift, which is probably what you are remembering.
(And yes, was used for COBOL, and PL/1, and undoubtedly more.)

(I concur with the rest)

--
- David.Thompson 1 now at worldnet.att.net





---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 1 May 2002 17:13:15 GMT
Raw View
James Kanze wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> |>  James Kanze wrote:
....
> |>  As long as 'int' is a 2's complement type with the same number of
> |>  bits as unsigned char, it's big enough to map each unsigned char
> |>  to a unique value.
>
> As long as we stick with the conventional architectures, we don't have
> a problem, because sizeof(int) != 1 :-).  And if we consider less

I wasn't trying to stick with conventional architectures, I was just
trying to describe the precise validity requirements for my statement.
Which don't include sizeof(int)==1.

....
> |>  > The question is whether this way is conforming.  Gabriel dos
> |>  > Reis pointed out a sentence which very strongly suggested that
> |>  > *all* possible values of char_type must be considered legal
> |>  > characters.  I'd
>
> |>  There's a requirement to that effect for unsigned char, but not
> |>  for char. Note in particular, footnote 217, which says "if eof()
> |>  can be held in char_type then some iostreams operations may give
> |>  surprising results". That seems to imply that having eof() return
> |>  a value that can be held in char_type is legal.
>
> As you probably know, in standardization committees, the right hand
> doesn't always know what the left hand is doing.  The text that Gaby
> quoted refers concerns "character", not char or unsigned char, and
> specifically says that it concerns any type which provides the
> definitions specified in chapters 12, 22 and 27.

She didn't quoted any text which gurantees that all possible bit
patterns of any character type must represent legal values for that
type. The only such requirement that I'm aware of is the one that is
restricted to unsigned char.
17.1.2 only talks about "... any value that can be represented by a type
that ..."; it says nothing about bit patterns that don't represent
values, because they're invalid bit patterns. Only unsigned char is
guaranteed to have no such bit patterns.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Wed, 1 May 2002 21:07:41 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

[...]

| There is a requirement (3.9.1p1) that "for character types, all bits of
| a the object represeentation participate in the value representation."
| However, the requirement that "all posssible bit patters of the value
| representation represent numbers" applies only "For unsigned character
| types". 'char' is allowed to be a signed character type, and is
| therefore allowed to have bit patters that don't represent numbers.

Huh? What can they represent then?

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Wed, 1 May 2002 21:07:44 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| James Kanze wrote:
| ....
| > I'm not sure I understand this.  How can a bit pattern not represent a
| > number? ...
|
| C99 has a name for such bit patterns: trap representations.

There is no trap representation for 'char'.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Wed, 1 May 2002 21:08:06 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

[...]

| > |>  There's a requirement to that effect for unsigned char, but not
| > |>  for char. Note in particular, footnote 217, which says "if eof()
| > |>  can be held in char_type then some iostreams operations may give
| > |>  surprising results". That seems to imply that having eof() return
| > |>  a value that can be held in char_type is legal.
| >
| > As you probably know, in standardization committees, the right hand
| > doesn't always know what the left hand is doing.  The text that Gaby
| > quoted refers concerns "character", not char or unsigned char, and
| > specifically says that it concerns any type which provides the
| > definitions specified in chapters 12, 22 and 27.
|
| She didn't quoted any text which gurantees that all possible bit

[ I know you're using what is now known as a post-modern style, where
  "she" is used for everything.  But I'll appreciate if you could
  refrain from using "she" when you're referring to me (Gabriel Dos
  Reis) ]

| patterns of any character type must represent legal values for that
| type.

Here is what the standard says:

3.9.1/1

  Objects declared as characters (char) shall be large enough to store
  any member of the implementation s basic character set. If a
  character from this set is stored in a character object, the
  integral value of that char-acter object is equal to the value of
  the single character literal form of that character. It is
  implementation-defined whether a char object can hold negative
  values. Characters can be explicitly declared unsigned or
  signed. Plain char, signed char, and unsigned char are three
  distinct types. A char, a signed char, and an unsigned char occupy
  the same amount of storage and have the same align-ment requirements
  (3.9); that is, they have the same object representation. For
  character types, all bits of the object representation participate
  in the value representation. For unsigned character types, all
  possible bit patterns of the value representation represent
  numbers. These requirements do not hold for other types. In any
  particular implementation, a plain char object can take on either
  the same values as a signed char or an unsigned char; which one is
  implementation-defined.

Note that the standard explicitly says:

  For character types, all bits of the object representation participate
  in the value representation.

so by that sentence all bits patterns in a character representation
represent a *character*.  So you've got your quote. In particular,
there is no trap representation for "char"s.

The next sentence is

  For unsigned character types, all possible bit patterns of the value
  representation represent numbers.

The precision here is whether all bit patterns in a character
representation represent a *number*.  The standard is making a
distinction between a character and a number.


| The only such requirement that I'm aware of is the one that is
| restricted to unsigned char.

I'm under the impression that you're confusing numbers and
characters.  See above.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Thu, 2 May 2002 08:42:11 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> | James Kanze wrote:
> | ....
> | > I'm not sure I understand this.  How can a bit pattern not represent a
> | > number? ...
> |
> | C99 has a name for such bit patterns: trap representations.
>
> There is no trap representation for 'char'.

The C99 standard does not say that 'char' can't have trap
representations. It says that if that value of an object is accessed
through an lvalue for which that object's bit pattern is  a trap
repsentation, the behavior is undefined, except when that lvalue has a
character type. But it doesn't say that character types don't have trap
representations. The defining characteristic of trap representations is
that they are bit patterns an object can possess, which don't represent
values of that object's type. They aren't actually required to 'trap'.
The key thing about them is that they aren't included in the range of
valid values; as such they don't get included in the range CHAR_MIN,
CHAR_MAX.

In any event, C++ doesn't define the term "trap representation", and
none of the corresponding specifications, so none of the complex issues
raised above matter. It does, however, have the concept that the term
refers to:  bit patterns that don't represent valid values of the
objects type. It has that concept, by way of expressing a guarantee that
'unsigned char' doesn't have any such bit patterns. However, that's the
only type the guarantee applies to.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 7 May 2002 15:34:33 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| >
| > "Anthony Williams"<anthwil@nortelnetworks.com> writes:
| >
| > | "Gabriel Dos Reis" <dosreis@cmla.ens-cachan.fr> wrote in message
| > | news:fl4rhritmu.fsf@jambon.cmla.ens-cachan.fr...
| > | > Note that the standard explicitly says:
| > | >
| > | >   For character types, all bits of the object representation participate
| > | >   in the value representation.
| > | >
| > | > so by that sentence all bits patterns in a character representation
| > | > represent a *character*.  So you've got your quote. In particular,
| > | > there is no trap representation for "char"s.
| > |
| > | "All bits participate" does not meen all bit patterns are legal. e.g. (3
| > | bits)
| >
| > The point here is that no matter what bit pattern you put in a
| > character object you'll get out a character value and the standard
| > guarantees that all bits will participate in the resulting value.
| > Therefore my sentence.
|
| Yes. That is what your sentence says. It's not what the standard's
| sentence says, however.

That is what you get by straight inference from general dispositions
in the standard.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Anthony Williams"<anthwil@nortelnetworks.com>
Date: Wed, 8 May 2002 02:24:00 GMT
Raw View
"Gabriel Dos Reis" <dosreis@cmla.ens-cachan.fr> wrote in message
news:floffw8go5.fsf@jambon.cmla.ens-cachan.fr...
> "James Kuyper Jr." <kuyper@wizard.net> writes:
> | Gabriel Dos Reis wrote:
> | > Alex Dicks <mjvww498as001@sneakemail.com> writes:
> | > | "Anthony Williams"<anthwil@nortelnetworks.com> writes:
> | > | > "All bits participate" does not meen all bit patterns are legal.
e.g. (3
> | > | > bits)
> | > | >
> | > | > 000 == 0
> | > | > 001 == 1
> | > | > 010 == 2
> | > | > 011 == 3
> | > | > 100 == illegal, trap representation
> | > | > 101 == -1
> | > | > 110 == -2
> | > | > 111 == -3
> | > | >
> | > | > All 3 bits participate (otherwise you couldn't get the negative
numbers),
> | > | > but 100 is not a valid bit pattern.

> | > | I'd just like to add that this example is true on a signed magnitude
> | > | system.

> | > We're not discussing a signed integer type in general.  We're
> | > discussing 'signed char' and the possiblities allowed by the standard
> | > text.

> | Those possibilities include the 8-bit version of the above example.

> On which basis can you say that a bit pattern is illegal for a 'signed
> char'? You can't certainly rely on the standard text since it says
> that no matter what bit pattern you put in a 'signed char', accessing
> the object what trap trap, therefore it gives you some value of that
> type.

It is illegal in the sense that you cannot perform an operation (other than
assignment from such a value) on a char that will yield such a value.

The only way to come by such a value is by accessing the representation of a
non-char POD type through an lvalue of char type, when 'char' is the same as
'signed char' (3.9p2 and 3.10p15 don't allow for access through 'signed
char'
types). Given that the value representation of e.g. float is entirely
unspecified, I cannot see any benefit to actually performing operations
other than assignment on a char value obtained through such means --- you
cannot even rely on the value obtained from the representation of an
'unsigned char' if the original value was outside the positive range of
'signed char'.

In fact, given such a representation for 'signed char', if 'char' is
'unsigned char', then there is no _defined_ way in which you can create a
'signed char' with an illegal bit combination --- copying a bit combination
using a 'char' or 'unsigned char' lvalue will only produce a defined result
if the original was copied from another 'signed char', which cannot have an
illegal value (just as copying a bit pattern into a double isn't guaranteed
to produce a defined result unless that bit pattern was copied from another
double).

If the platform I was using had such illegal values for 'char', then I would
expect any operation on such an illegal value to yield undefined behaviour,
in much the same way that overflow and underflow on signed integer types
yields undefined behaviour (5p5). I would expect the "undefined behaviour"
would in most cases just yield some unspecified value.

It is also guaranteed that assignment works, if 'char' has such a
representation, so that 'char' values can be used to store the value
representation of other POD types, and fulfil 3.9p2.

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optical Components Ltd
The opinions expressed in this message are not necessarily those of my
employer




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 8 May 2002 08:25:17 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> | Gabriel Dos Reis wrote:
> | > "Anthony Williams"<anthwil@nortelnetworks.com> writes:
....
> | > | "All bits participate" does not meen all bit patterns are legal. e.g. (3
> | > | bits)
> | >
> | > The point here is that no matter what bit pattern you put in a
> | > character object you'll get out a character value and the standard
> | > guarantees that all bits will participate in the resulting value.
> | > Therefore my sentence.
> |
> | Yes. That is what your sentence says. It's not what the standard's
> | sentence says, however.
>
> That is what you get by straight inference from general dispositions
> in the standard.

Yes, but not by correct inference from the standard.

We've reached the point when we're in perfect agreement about what the
relevant words in the standard are, and in unreconcilable disagreement
about the correct reading of those words. In the absence of a new
argument, (and I certainly don't have anything new), I don't think
there's any point in discussing this any farther.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 29 Apr 2002 18:28:45 GMT
Raw View
James Kanze wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
....
> |>  I'd hate to have convinced you, when I've become convinced of the
> |>  opposite. The C++ requirement is:
> |>  " X::eof() yields: a value e such that
> |>  X::eq_int_type(e,X::to_int_type(c)) is false for all values c." I
> |>  translated the relevant requirement from C++ terms into C terms as
> |>  "EOF!=(int)c for all values of c", solely for the purpose of
> |>  pointing out that C has no corresponding requirement.
>
> Is this formally recognized (that C has no corresponding requirement).
> The last I'd heard, it seemed to be an open point.  C has no formal
> statement of this sort, but it does require that fgetc return a value
> in the range 0...UCHAR_MAX or EOF; at least some people have
> interpreted this to mean that the return value of fgetc must support
> all of the values in the range 0...UCHAR_MAX, which wouldn't be the
> case if sizeof(int) were 1.  (I don't totally accept this argument
> myself, but it has been put forward.  And I'm not the person who
> decides.)

The requirements on fgetc() are not described in terms of UCHAR_MAX.
Section 7.19.7.1p2 says that it "obtains that character as an unsigned
char converted to an int". If INT_MAX<UCHAR_MAX, that will produce some
results that many people would consider unexpected, but those results
wouldn't violate any actual requirements of the C standard. Now, when
you write data to a binary stream, and read it back in using the same
implmentation of C, it must be unchanged. This implies that each
distinct unsigned char value read in must be mapped to a distinct 'int'
value (something the standard doesn't say directly). As long as 'int' is
a 2's complement type with the same number of bits as unsigned char,
it's big enough to map each unsigned char to a unique value.

However, the C standard doesn't say anything requiring EOF to be
different from the value that would be read in for some character. That
was obviously the intent, otherwise there's no point to using EOF as an
error/end-of-file indicator. However, the C standard doesn't actually
say that's the case; this is particularly telling, since it does have
such wording for WEOF (7.24.1p3). You'd think that if the intent was to
have parrallel requirements, the EOF section (7.19.1p3) would have been
updated when they added the section describing WEOF; but they didn't.
And, since the feof() and ferror() macros exist, having EOF as a
distinct value isn't a practical necessity, either. There is, however,
one key problem if(int)c == EOF for some character c: you can't
unget(c)!

Incidentally, the C standard explicitly allows wchar_t and wint_t to be
the same type. So any argument that says char and int must be different
for C, has to break down when applied to wide character types.

....
> |>  The translated requirement is incompatible with sizeof(int)==1;
> |>  but the untranslated requirement is not, as you've shown. That's
> |>  because eq_int_type() and to_int_type() are member functions, and
> |>  are therefore not strictly equivalent to "==" and "(int)",
> |>  respectively. I think you've established that they can be defined
> |>  in a way that makes sizeof(int)==1 legal.
>
> The question is whether this way is conforming.  Gabriel dos Reis
> pointed out a sentence which very strongly suggested that *all*
> possible values of char_type must be considered legal characters.  I'd

There's a requirement to that effect for unsigned char, but not for
char. Note in particular, footnote 217, which says "if eof() can be held
in char_type then some iostreams operations may give surprising
results". That seems to imply that having eof() return a value that can
be held in char_type is legal.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alexander Terekhov <terekhov@web.de>
Date: Mon, 29 Apr 2002 18:29:29 GMT
Raw View
James Kanze wrote:
[...]
> |>  You can wrap uint_32 in a class, and then specialize it.
                   ^^^^^^^

Is this actually meant to be something coming from <stdint.h>?

He he. Consider:

http://groups.google.com/groups?selm=3CC6D78C.CF4E562F%40web.de

"....
 The restriction that a byte is now exactly eight
 bits was a conscious decision by the standard
 developers. It came about due to a combination
 of factors, primarily the use of the type
 int8_t within the networking functions and the
 alignment with the ISO/IEC 9899:1999 standard,
 where the intN_t types are now defined.

 According to the ISO/IEC 9899:1999 standard:

 The [u]intN_t types must be two's complement
 with no padding bits and no illegal values.

 All types (apart from bit fields, which are not
 relevant here) must occupy an integral number of
 bytes.

 If a type with width W occupies B bytes with C
 bits per byte ( C is the value of {CHAR_BIT}),
 then it has P padding bits where P+ W= B* C.

 Therefore, for int8_t P=0, W=8. Since B>=1, C>=8,
 the only solution is B=1, C=8.
 ...."

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 29 Apr 2002 18:30:15 GMT
Raw View
James Kanze wrote:
....
> I'm not sure I understand this.  How can a bit pattern not represent a
> number? ...

C99 has a name for such bit patterns: trap representations. Attempting
to read the value of an object using an lvalue of a type for which that
object's bit pattern is a trap representation, has undefined behavior.
C++ does not name those bit patterns, and isn't as explicit as C99 about
what they might mean, but I can't see any way to interpret 3.9.1p1 that
doesn't allow numeric types (other than  unsigned character types) to
have invalid value representations. Otherwise, why restrict the
guarantee that "all possible bit patterns of the value representation
represent numbers", by saying that it's only "For unsigned character
types,"?

A classic example is a 1's complement integer type, for which the bit
pattern that would normally represent negative zero is instead treated
as an invalid value, causing the abort() of any program that attempts to
read such a value. Using 1's complement signed types at all would be
rather odd nowadays. Using a 1's complement type as 'char', and a 2's
complement 'int' would be even odder. Making them both the same size
would be extremely odd. But it would appear to be legal.

> ... And what does "participate in the value representation" mean
> if some bit patterns do not represent numbers?

If there are some valid, distinct numbers which differ only in the
setting of a single bit, then that bit participates in the
representation. Every single bit of 'char' in the implementation I
described is used to distinguish valid values from each other.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 30 Apr 2002 06:03:38 GMT
Raw View
Alexander Terekhov wrote:
>
> James Kanze wrote:
> [...]
> > |>  You can wrap uint_32 in a class, and then specialize it.
>                    ^^^^^^^
>
> Is this actually meant to be something coming from <stdint.h>?

I doubt it. There's lots of existing typedefs that serve the same
purpose as the new C99 typedefs. For instance, I use a library named HDF
on a daily basis, and it uses typedefs of the form "uint32" extensively.

> He he. Consider:
>
> http://groups.google.com/groups?selm=3CC6D78C.CF4E562F%40web.de
>
> "....
>  The restriction that a byte is now exactly eight
>  bits was a conscious decision by the standard
>  developers. It came about due to a combination

Note that the standard being referred to here is NOT the C99 standard
(nor the C++ one). And, judging from the statements you cite, it's a
standard being produced by people who don't quite understand what the
C99 standard says about the exact sized types.

>  of factors, primarily the use of the type
>  int8_t within the networking functions and the
>  alignment with the ISO/IEC 9899:1999 standard,
>  where the intN_t types are now defined.
>
>  According to the ISO/IEC 9899:1999 standard:
>
>  The [u]intN_t types must be two's complement
>  with no padding bits and no illegal values.
>
>  All types (apart from bit fields, which are not
>  relevant here) must occupy an integral number of
>  bytes.
>
>  If a type with width W occupies B bytes with C
>  bits per byte ( C is the value of {CHAR_BIT}),
>  then it has P padding bits where P+ W= B* C.
>
>  Therefore, for int8_t P=0, W=8. Since B>=1, C>=8,
>  the only solution is B=1, C=8.
>  ...."

Keep in mind that int8_t is not a mandatory type under C99; nor are any
other of the exact-sized types. Therefore, this conclusion doesn't hold
up. It's certainly a legal implementation, and the OpenGroup is free to
require it as part of their standard. However, they're mistaken if
they're claiming, as they appear to be, that this is required by the C99
standard.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Tue, 30 Apr 2002 18:44:47 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  Alexander Terekhov wrote:

|>  > James Kanze wrote:
|>  > [...]
|>  > > |>  You can wrap uint_32 in a class, and then specialize it.
|>  >                    ^^^^^^^

|>  > Is this actually meant to be something coming from <stdint.h>?

|>  I doubt it. There's lots of existing typedefs that serve the same
|>  purpose as the new C99 typedefs. For instance, I use a library
|>  named HDF on a daily basis, and it uses typedefs of the form
|>  "uint32" extensively.

My last project used them extensively as well.  I don't know which
header actually contained them, but it was something in /usr/include
under Solaris (probably /usr/include/sys/types.h).  And definitely
something non standard.

But I wasn't concerned about the name.  I was just being lazy, and I
assumed that everyone would interpret uint_32 to mean "an unsigned
integral type with 32 bits" (which you'll have to admit, is a lot
longer to write).

|>  > He he. Consider:

|>  > http://groups.google.com/groups?selm=3D3CC6D78C.CF4E562F%40web.de

|>  > "....
|>  >  The restriction that a byte is now exactly eight
|>  >  bits was a conscious decision by the standard
|>  >  developers. It came about due to a combination

|>  Note that the standard being referred to here is NOT the C99
|>  standard (nor the C++ one). And, judging from the statements you
|>  cite, it's a standard being produced by people who don't quite
|>  understand what the C99 standard says about the exact sized types.

Note that Alexander Terekhov is quoting out of context another netnews
article which quotes out of context the Open Group specifications.

The people who actually wrote the original text do understand the C99
standard very well.  When all of the missing context is reestablished,
it is very clear that they are intentionally making additional
restrictions compared to the standard.  (From the original: "The
definition of byte from the ISO C standard is broader than the above
and might accommodate hardware architectures with different sized
addressable units than octets.")

    [...]

|>  Keep in mind that int8_t is not a mandatory type under C99; nor
|>  are any other of the exact-sized types. Therefore, this conclusion
|>  doesn't hold up. It's certainly a legal implementation, and the
|>  OpenGroup is free to require it as part of their
|>  standard. However, they're mistaken if they're claiming, as they
|>  appear to be, that this is required by the C99 standard.

It's the lost context.  All the Open Group claims is that if you want
to use the trademark Unix (which they own) in the name of your system,
its implementation of C must use 8 bit bytes and provide these types.

The "standard" in the quoted passages is "The Open Group Base
Specifications Issue 6", which is IEEE Std 1008.1-2001.  (Which is, in
itself, interesting.  Are Posix and Unix becoming the same thing?
Will Linux suddenly wake up one day and find that it is Unix, after
all, because it is Posix conform?)  The text actually quoted is
largely part of a rationale as to why this standard is deviating from
the C standard in this regard.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: terekhov@web.de (Alexander Terekhov)
Date: Tue, 30 Apr 2002 18:46:09 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message news:<3CCDEE5C.6D986CEA@wizard.net>...
[...]
> > http://groups.google.com/groups?selm=3CC6D78C.CF4E562F%40web.de
> >
> > "....
> >  The restriction that a byte is now exactly eight
> >  bits was a conscious decision by the standard
> >  developers. It came about due to a combination
>
> Note that the standard being referred to here is NOT the C99 standard
> (nor the C++ one).

Yep. The standard being referred to here is IEEE/The Open
Group *and* ISO/IEC JTC/SC22-WG15 *POSIX* standard (C is
WG14 and C++ is WG21, according to:

http://std.dkuug.dk/JTC1/SC22

BTW, how about a) merging WG14 and WG21 first and adding
WG21 to them sometime later; in the next phase of the
"fusion-process" providing consolidated standards (threads,
C AND C++ bindings for all options, etc) in the NOT so
distant future (C++0X would be a good candidate/timeframe
for the first step, I guess)? ;-) ;-))

> And, judging from the statements you cite, it's a
> standard being produced by people who don't quite understand what the
> C99 standard says about the exact sized types.

Well, I'm NOT sure. I guess, "the problem" is that they
decided to rely on non-mandatory (in C99) exact-width
integer type int8_t ("within the networking functions")
instead of mandatory minimum-width integer type(s)
int_least8_t/uint_least8_t/etc. or perhaps even some
"fastest minimum-width integer types" int_fastN_t/
uint_fastN_t (N:8,16,32,64 mandatory as well) ;-).

> >  of factors, primarily the use of the type
> >  int8_t within the networking functions and the
> >  alignment with the ISO/IEC 9899:1999 standard,
> >  where the intN_t types are now defined.
> >
> >  According to the ISO/IEC 9899:1999 standard:
> >
> >  The [u]intN_t types must be two's complement
> >  with no padding bits and no illegal values.
> >
> >  All types (apart from bit fields, which are not
> >  relevant here) must occupy an integral number of
> >  bytes.
> >
> >  If a type with width W occupies B bytes with C
> >  bits per byte ( C is the value of {CHAR_BIT}),
> >  then it has P padding bits where P+ W= B* C.
> >
> >  Therefore, for int8_t P=0, W=8. Since B>=1, C>=8,
> >  the only solution is B=1, C=8.
> >  ...."
>
> Keep in mind that int8_t is not a mandatory type under C99; nor are any
> other of the exact-sized types. Therefore, this conclusion doesn't hold
> up. It's certainly a legal implementation, and the OpenGroup

and ISO/IEC:WG15 http://std.dkuug.dk/JTC1/SC22/WG15 ;-)

> is free to require it as part of their standard.

Yep, I guess.

> However, they're mistaken if they're claiming,
> as they appear to be, that this is required by
> the C99 standard.

Well, to me, they are merely claiming that on any implementation
with int8_t defined, "the only solution is B=1, C=8". Are they
really mistaken?! It does NOT seem so to me. Or am I just missing
something?

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Tue, 30 Apr 2002 18:51:49 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  James Kanze wrote:

|>  > "James Kuyper Jr." <kuyper@wizard.net> writes:
|>  ....
|>  > |>  I'd hate to have convinced you, when I've become convinced of t=
he
|>  > |>  opposite. The C++ requirement is:
|>  > |>  " X::eof() yields: a value e such that
|>  > |>  X::eq_int_type(e,X::to_int_type(c)) is false for all values
|>  > |>  c." I translated the relevant requirement from C++ terms
|>  > |>  into C terms as "EOF!=3D(int)c for all values of c", solely
|>  > |>  for the purpose of pointing out that C has no corresponding
|>  > |>  requirement.

|>  > Is this formally recognized (that C has no corresponding
|>  > requirement).  The last I'd heard, it seemed to be an open
|>  > point.  C has no formal statement of this sort, but it does
|>  > require that fgetc return a value in the range 0...UCHAR_MAX or
|>  > EOF; at least some people have interpreted this to mean that the
|>  > return value of fgetc must support all of the values in the
|>  > range 0...UCHAR_MAX, which wouldn't be the case if sizeof(int)
|>  > were 1.  (I don't totally accept this argument myself, but it
|>  > has been put forward.  And I'm not the person who decides.)

|>  The requirements on fgetc() are not described in terms of
|>  UCHAR_MAX.  Section 7.19.7.1p2 says that it "obtains that
|>  character as an unsigned char converted to an int".

I can see my memory is failing me.  This text is unchanged from C90,
too, so I can't even say I was thinking of the old version.

This raises an interesting sidelight.  The conversion of an unsigned
char to an int is allowed to raise a signal (6.3.1.3/3).

|>  If INT_MAX<UCHAR_MAX, that will produce some results that many
|>  people would consider unexpected, but those results wouldn't
|>  violate any actual requirements of the C standard. Now, when you
|>  write data to a binary stream, and read it back in using the same
|>  implmentation of C, it must be unchanged. This implies that each
|>  distinct unsigned char value read in must be mapped to a distinct
|>  'int' value (something the standard doesn't say directly).

Does it.  I would understand the reverse: a distinct int value must
map to a distinct unsigned char.  Which is more or less guaranteed
anyway if sizeof( int ) =3D=3D 1.  And what about 7.19.7.3/3: "If a write
error occurs, the error indicator for the stream is set, and fputc
returns EOF."  The standard doesn't say much about what can
potentially be a write error -- presumably, an attempt to write a
"character" which cannot be written may be considered a write error.

Note too that I'm not quite sure about the guarantee that data written
can be read.  I know that the standard says so explicitly in 7.19.2/3,
but there is always the question of what is meant by "data written".
Obviously, it cannot be the int passed to fputc, since it is converted
to an unsigned char, and on most implementations, this conversion is
lossy -- on my Sparc and on my Linux PC (and on every other machine
I've actually used), fputc( -1, file ) will return 255 (as an int)
when read.  And -1 certainly doesn't compare equal to 255.  If it is
the unsigned char after conversion which must compare equal, then it
must compare equal to the unsigned char which is read, before
conversion.  Which is a pretty weak guarantee, since the conversion
may be lossy, or even raise a signal.

|>  As long as 'int' is a 2's complement type with the same number of
|>  bits as unsigned char, it's big enough to map each unsigned char
|>  to a unique value.

As long as we stick with the conventional architectures, we don't have
a problem, because sizeof(int) !=3D 1 :-).  And if we consider less
conventional architectures, we have to consider 1's complement, signed
magnitude, and conversions which throw.

My own interpretation would be that the comparison referred to in
7.19.2/3 takes place on the unsigned char actually read and written.
And that what happens during the conversion happens; fgetc can raise
an implementation defined signal if it reads strange data.  Including
data that it can write, since the conversion int to unsigned char is
well defined for all possible values.

|>  However, the C standard doesn't say anything requiring EOF to be
|>  different from the value that would be read in for some
|>  character. That was obviously the intent, otherwise there's no
|>  point to using EOF as an error/end-of-file indicator. However, the
|>  C standard doesn't actually say that's the case; this is
|>  particularly telling, since it does have such wording for WEOF
|>  (7.24.1p3). You'd think that if the intent was to have parrallel
|>  requirements, the EOF section (7.19.1p3) would have been updated
|>  when they added the section describing WEOF; but they didn't.
|>  And, since the feof() and ferror() macros exist, having EOF as a
|>  distinct value isn't a practical necessity, either. There is,
|>  however, one key problem if(int)c =3D=3D EOF for some character c: yo=
u
|>  can't unget(c)!

Again, this has always been (more or less) my position.  But I think
it is open to interpretation, and I would feel better if there were a
formal ruling about it.

|>  Incidentally, the C standard explicitly allows wchar_t and wint_t
|>  to be the same type. So any argument that says char and int must
|>  be different for C, has to break down when applied to wide
|>  character types.

Well, as I've said before, narrow character streams and wide character
streams have different semantics.  So there is really no reason to
expect the same rules to apply to both.  (And using a common template
to implement both is probably a case of very poor design.)

|>  ....
|>  > |>  The translated requirement is incompatible with
|>  > |>  sizeof(int)=3D=3D1; but the untranslated requirement is not, as
|>  > |>  you've shown. That's because eq_int_type() and to_int_type()
|>  > |>  are member functions, and are therefore not strictly
|>  > |>  equivalent to "=3D=3D" and "(int)", respectively. I think you'v=
e
|>  > |>  established that they can be defined in a way that makes
|>  > |>  sizeof(int)=3D=3D1 legal.

|>  > The question is whether this way is conforming.  Gabriel dos
|>  > Reis pointed out a sentence which very strongly suggested that
|>  > *all* possible values of char_type must be considered legal
|>  > characters.  I'd

|>  There's a requirement to that effect for unsigned char, but not
|>  for char. Note in particular, footnote 217, which says "if eof()
|>  can be held in char_type then some iostreams operations may give
|>  surprising results". That seems to imply that having eof() return
|>  a value that can be held in char_type is legal.

As you probably know, in standardization committees, the right hand
doesn't always know what the left hand is doing.  The text that Gaby
quoted refers concerns "character", not char or unsigned char, and
specifically says that it concerns any type which provides the
definitions specified in chapters 12, 22 and 27.

I suspect that there is a defect report lurking in here somewhere, but
I'm not at all sure how to formulate it.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Fri, 26 Apr 2002 19:31:30 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  James Kanze wrote:

|>  > "James Kuyper Jr." <kuyper@wizard.net> writes:

|>  ....
|>  >   - And now, I cannot use the standard streams because I'm not allo=
wed
|>  >     to provide a specialization for char_traits<uint_32>.

|>  You can wrap uint_32 in a class, and then specialize it.

And pay an unacceptable performance penalty with most compilers, since
none of the compilers I use will pass or return a class, no matter how
simple, in a register, whereas on a Sparc, at least, 32 bit basic
types are always passed in registers.

I think we have a serious problem here, because in practice, if there
is any possible use of the fact that iostreams and string are
templates, it involves specialising them on fundamental types.
Practically speaking, in fact, I cannot conceive of any use other than
with an integral type.

Maybe we need a special rule for char_traits?  (I really don't know
what the best solution is.  I'm just guessing.)

|>  ....
|>  > The second type is a stream of narrow, possibly multi-byte or
|>  > state encoded characters.  A stream which is read with no
|>  > translation on input and output.  IMHO, this type is really only
|>  > of interest in a limited number of cases (Americas, Europe and
|>  > black Africa, when internationalization is not an issue), and
|>  > the number is decreasing.

|>  I've spent my entire career within that "limited" area, so "only"
|>  seems a little inappropriate to me. I believe in
|>  internationalization, but there are huge markets out that can get
|>  away without it.

Quite frankly, I've spent all of my professional career in such a
limited environment as well.  Internationalization was important, but
only to other western European languages (all using ISO 8859-1).  But
the tendancy is to break out of this mold -- since the opening of
eastern Europe, for example, I find that customers are concerned about
supporting e.g. Czeck.  And to support Czeck and the western European
languages at the same time, the classical 8 bits isn't sufficient.

I'm certainly not suggesting that we drop support for narrow
characters in the next version of the standard:-).  On the other hand,
I suspect that I will probably have to deal with wide characters in
most future C++ projects, at least if human interface is involved.

The only text in my last project was in log files -- in which, of
course, internationalisation is completely irrelevant.  On the other
hand, the latest version of the RADIUS protocol, which we implemented,
requires UTF-8, and there were long term plans to support it.  So even
in a purely technical application -- IP address allocation -- wide
characters are starting to rear their head.

|>  ....
|>  > (and you've convinced me that a hosted implementation is not
|>  > possible on machines where sizeof(int)=3D=3D1 -- something that I
|>  > don't think was

|>  I'd hate to have convinced you, when I've become convinced of the
|>  opposite. The C++ requirement is:
|>  " X::eof() yields: a value e such that
|>  X::eq_int_type(e,X::to_int_type(c)) is false for all values c." I
|>  translated the relevant requirement from C++ terms into C terms as
|>  "EOF!=3D(int)c for all values of c", solely for the purpose of
|>  pointing out that C has no corresponding requirement.

Is this formally recognized (that C has no corresponding requirement).
The last I'd heard, it seemed to be an open point.  C has no formal
statement of this sort, but it does require that fgetc return a value
in the range 0...UCHAR_MAX or EOF; at least some people have
interpreted this to mean that the return value of fgetc must support
all of the values in the range 0...UCHAR_MAX, which wouldn't be the
case if sizeof(int) were 1.  (I don't totally accept this argument
myself, but it has been put forward.  And I'm not the person who
decides.)

|>  The translated requirement is incompatible with sizeof(int)=3D=3D1;
|>  but the untranslated requirement is not, as you've shown. That's
|>  because eq_int_type() and to_int_type() are member functions, and
|>  are therefore not strictly equivalent to "=3D=3D" and "(int)",
|>  respectively. I think you've established that they can be defined
|>  in a way that makes sizeof(int)=3D=3D1 legal.

The question is whether this way is conforming.  Gabriel dos Reis
pointed out a sentence which very strongly suggested that *all*
possible values of char_type must be considered legal characters.  I'd
rather this not be the case, but I'm not really sure what the actual
intent was.

For the moment, I'm still very much experimenting.  Since my current
goal is not to write 100% conforming code, but just to find out where
the problems might be, I will continue to specialize char_traits on a
basic type, even if it is illegal.  And use my posted definitions,
although there seems to be some doubt as to whether they are conform
or not.  For that matter, since for the moment I've not started
looking into IO issues in detail, I've not even defined pos_type,
off_type and state_type -- the implementations of std::basic_string
that I am using doesn't seem to need them.  As I user, I'm not sure
how to define them: I suppose just taking off_type from
char_traits<char> would do the trick, but logically, state_type should
be locale dependant (and pos_type, of course, depends on state_type).

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sun, 28 Apr 2002 00:17:39 GMT
Raw View
James Kanze wrote:
....
> The question is whether this way is conforming.  Gabriel dos Reis
> pointed out a sentence which very strongly suggested that *all*
> possible values of char_type must be considered legal characters.  I'd
> rather this not be the case, but I'm not really sure what the actual
> intent was.

I'm not aware of any requirements that would be violated by an
implementation with the following characteristics:

 CHAR_MIN    -32767
 CHAR_MAX    32767
 CHAR_BIT    16
 UCHAR_MAX   65535
 INT_MIN     -32768
 INT_MAX     32767
 sizeof(int) 1
 EOF         -32768

There is a requirement (3.9.1p1) that "for character types, all bits of
a the object represeentation participate in the value representation."
However, the requirement that "all posssible bit patters of the value
representation represent numbers" applies only "For unsigned character
types". 'char' is allowed to be a signed character type, and is
therefore allowed to have bit patters that don't represent numbers.

std::ungetc() would be impossible for some values returned by
std::fgetc() with such an implementation. However,
std::streambuf::putback() and std::streambuf::sungetc() should work with
any valid char value.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Mon, 29 Apr 2002 11:32:20 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  James Kanze wrote:
|>  ....
|>  > The question is whether this way is conforming.  Gabriel dos
|>  > Reis pointed out a sentence which very strongly suggested that
|>  > *all* possible values of char_type must be considered legal
|>  > characters.  I'd rather this not be the case, but I'm not really
|>  > sure what the actual intent was.

|>  I'm not aware of any requirements that would be violated by an
|>  implementation with the following characteristics:

|>   CHAR_MIN    -32767
|>   CHAR_MAX    32767
|>   CHAR_BIT    16
|>   UCHAR_MAX   65535
|>   INT_MIN     -32768
|>   INT_MAX     32767
|>   sizeof(int) 1
|>   EOF         -32768

|>  There is a requirement (3.9.1p1) that "for character types, all
|>  bits of a the object represeentation participate in the value
|>  representation."  However, the requirement that "all posssible bit
|>  patters of the value representation represent numbers" applies
|>  only "For unsigned character types". 'char' is allowed to be a
|>  signed character type, and is therefore allowed to have bit
|>  patters that don't represent numbers.

I'm not sure I understand this.  How can a bit pattern not represent a
number?  And what does "participate in the value representation" mean
if some bit patterns do not represent numbers?

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 19 Apr 2002 17:07:55 GMT
Raw View
Markus Mauhart wrote:
>
> "Gennaro Prota" <gennaro_prota@yahoo.com> wrote ...
> >
> > P.S.: the only thing that leaves me perplexed is the apparent circular
> > definition constituted by 5.3.3 and 3.9p4. Does anybody know if it is
> > resolved in an other part of the standard?
>
> IMO thats worth a defect report; any other opinions ?
>
> 3.9p4:
> -----------
> The object representation of an object of type T is the sequence of
> N unsigned char objects taken up by the object of type T, where N
> equals sizeof(T). The value representation of an object is the set
> of bits that hold the value of type T. For POD types, the value
> representation is a set of bits in the object representation that
> determines a value, which is one discrete element of an
> implementationdefined set of values.37)
> 37) The intent is that the memory model of C++ is compatible with
> that of ISO/IEC 9899 Programming Language C.
> -----------
> .... so this definition of "object representation of an object of type T"
> relies on "sizeof(T)".

The comment about sizeof(T) is a side issue; the comment is true, and a
useful thing to know, but does not play a part in defining the what the
object representation is, nor how big it is. It might be better to make
that comment non-normative text, since it's redundant with 5.3.3, and as
currently written might give the mistaken impression that N is
determined by sizeof(), rather than simply being reported by it.

> 5.3.3
> -----------
> The sizeof operator yields the number of bytes in the object representation
> of its operand. .....
> -----------
> .... and "sizeof(T)" seems to rely on T's "object representation"

There's no circularity in the actual meaning. The size of a byte is
implementation-defined. The representation of an object takes up an
implementation-defined amount of memory space, which must be a positive
integral number of bytes. sizeof(object) reports how many bytes that is.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 19 Apr 2002 17:10:34 GMT
Raw View
NotFound wrote:
>
> > > What is the correct use? In others contexts a word is the minimun
> > > addressable unit, then a word on all the x86 family will be an octet.
> > The definition I'm familiar with can be paraphrased by saying that if
> > it's correctly described as a 32-bit machine, then the word size is 32
> > bits.
>
> Then win32 it's not correctly described as a 32-bit machine?

Or it's not correctly described as having a word size other than 32
bits. Take your choice.

> > > ISO has no authority do define the universal mean of a word. No more
> > No one has that authority. However, ISO does have the authority to
> > define the usage within ISO documents, and the usage by anyone who cares
> > about ISO standards. Which includes me.
>
> In the context of this newsgroup the relevant standard does not define
> WORD.

The C++ standard makes no use of the term with this meaning, and
therefore doesn't need to define it; therefore it's meaning is indeed
off-topic for this newsgroup. It came up only because of quotation from
Stroustrup that used the term. However, I'm sure there are other ISO
standards do define it. Hopefully, all ISO standards that define the
term give it mutually compatible defintions, but I wouldn't be surprised
to hear otherwise.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Markus Mauhart" <markus.mauhart@chello.at>
Date: Fri, 19 Apr 2002 19:16:15 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote ...
>
> Markus Mauhart wrote:
> >
> > "Gennaro Prota" <gennaro_prota@yahoo.com> wrote ...
> > >
> > > P.S.: the only thing that leaves me perplexed is the apparent circular
> > > definition constituted by 5.3.3 and 3.9p4. Does anybody know if it is
> > > resolved in an other part of the standard?
> >
> > IMO thats worth a defect report; any other opinions ?
> >
> > 3.9p4:
> > -----------
> > The object representation of an object of type T is the sequence of
> > N unsigned char objects taken up by the object of type T, where N
> > equals sizeof(T). The value representation of an object is the set
> > of bits that hold the value of type T. For POD types, the value
> > representation is a set of bits in the object representation that
> > determines a value, which is one discrete element of an
> > implementationdefined set of values.37)
> > 37) The intent is that the memory model of C++ is compatible with
> > that of ISO/IEC 9899 Programming Language C.
> > -----------
> > .... so this definition of "object representation of an object of type T"
> > relies on "sizeof(T)".
>
> The comment about sizeof(T) is a side issue; the comment is true, and a
> useful thing to know, but does not play a part in defining the what the
> object representation is, nor how big it is. It might be better to make
> that comment non-normative text, since it's redundant with 5.3.3, and as
> currently written might give the mistaken impression that N is
> determined by sizeof(), rather than simply being reported by it.

James, now I too understand the intent of this paragraph, your explanation
was cristal clear !
So one could say ..
"The object representation of an object of type T is the (finite:-) set
 of all unsigned char objects taken up by that object of type T".

Some years ago when I came to 5.3.3 and 3.9p4, after a long round
trip through the document, trying to solve a problem and a stack
of related crossreferences almost blasting my head, when I realized
this 'circularity' (not really one, now I know it) I aborted my
undertaking with a lot of unkind words leaving my mouth ...

So I would appreciate it if the next version would make this
paragraph clearer in the sense you have explained it.

> > 5.3.3
> > -----------
> > The sizeof operator yields the number of bytes in the object representation
> > of its operand. .....
> > -----------
> > .... and "sizeof(T)" seems to rely on T's "object representation"
>
> There's no circularity in the actual meaning. The size of a byte is
> implementation-defined. The representation of an object takes up an
> implementation-defined amount of memory space, which must be a positive
> integral number of bytes. sizeof(object) reports how many bytes that is.

yes to all


Thanks,
Markus.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Fri, 19 Apr 2002 19:28:55 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:

| Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:
|
| |>  Huh?!?  The C++ standard requires that all bits in a char
| |>  participate in a char value representation.  And EOF is not a
| |>  character.
|
| However, as far as I can see, it doesn't place any constraints with
| regards as to what a character can be (except that the characters in the
| basic character set must have positive values, even if char is signed).

Surely, the standard does define what "character" means.
Have a look at 17.1.2.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Sat, 20 Apr 2002 10:11:50 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:

[...]

| |>  it suffices to look at the first two pages.
|
| |>  21.1.2/2
| |>    For a certain character container type char_type, a related
| |>    container type INT_T shall be a type or class which can represent
| |>    all of the valid characters converted from the corresponding
| |>    char_type values, as well as an end-of-file value, eof(). The type
| |>    int_type represents a charac-ter container type which can hold
| |>    end-of-file to be used as a return type of the iostream class member
| |>    functions.
|
| OK.  Consider the case of 32 bit char, int and long, using ISO 10646 as
| a code set.  And read the text *very* carefully.

I did.

[...]

| |>  The standard also says that any bit pattern for char represents a
| |>  valid char value, therefore eof() can't be in the values-set of
| |>  char.
|
| Table 37 talks of characters, not valid char values.  Not all valid char
| values need be valid characters.

Sure, they do.

Let's look at the definitions given at the begining of the library (17.1)

  17.1.2 character
  in clauses 21, 22, and 27, means any object which, when treated
  sequentially, can represent text. The term does *not only mean char
  and wchar_t objects*, but *any value* that can be represented by a type
  that pro-vides the definitions specified in these clauses.

(Emphasis is mine).

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Sat, 20 Apr 2002 10:12:41 GMT
Raw View
On Fri, 19 Apr 2002 17:07:55 GMT, "James Kuyper Jr."
<kuyper@wizard.net> wrote:

>Markus Mauhart wrote:
>>
>> "Gennaro Prota" <gennaro_prota@yahoo.com> wrote ...
>> >
>> > P.S.: the only thing that leaves me perplexed is the apparent circular
>> > definition constituted by 5.3.3 and 3.9p4. Does anybody know if it is
>> > resolved in an other part of the standard?
>>
>> IMO thats worth a defect report; any other opinions ?
>>
>> 3.9p4:
>> -----------
>> The object representation of an object of type T is the sequence of
>> N unsigned char objects taken up by the object of type T, where N
>> equals sizeof(T).

[...]

>The comment about sizeof(T) is a side issue; the comment is true, and a
>useful thing to know, but does not play a part in defining the what the
>object representation is, nor how big it is.

Gulp! This is because you probably know the intended wording, but it's
not what is written there :)

Anyhow, it seems to me that moving the comment in a non-normative part
wouldn't solve another problem: objects of the same type can have
different sizes. Example:

class A {};
class B : public A { public: int i;};

A a;
B b;

The A sub-object in b can occupy 0 bytes, while the complete object a
cannot (1.8p5). Now how do you apply the text from 3.9p4 quoted above?

Moreover you cannot say that sizeof "yields the number of bytes in the
object representation of its operand", since AFAIK what is defined by
the standard is the object representation of an object, not that of a
(parenthesis enclosed name of a) type.



>There's no circularity in the actual meaning. The size of a byte is
>implementation-defined.

You meant "the size of an object", I suppose.

> The representation of an object takes up an
>implementation-defined amount of memory space, which must be a positive
>integral number of bytes. sizeof(object) reports how many bytes that is.


Genny.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 20 Apr 2002 12:21:09 GMT
Raw View
Gennaro Prota wrote:
>
> On Fri, 19 Apr 2002 17:07:55 GMT, "James Kuyper Jr."
> <kuyper@wizard.net> wrote:
>
> >Markus Mauhart wrote:
....
> >> 3.9p4:
> >> -----------
> >> The object representation of an object of type T is the sequence of
> >> N unsigned char objects taken up by the object of type T, where N
> >> equals sizeof(T).
>
> [...]
>
> >The comment about sizeof(T) is a side issue; the comment is true, and a
> >useful thing to know, but does not play a part in defining the what the
> >object representation is, nor how big it is.
>
> Gulp! This is because you probably know the intended wording, but it's
> not what is written there :)

I'm afraid that I don't see that. I'm not a fan of the "mind-reading"
school for interpreting the standard. I can see how this wording is
misleading, but not how it's incorrect. 3.9p4 says that N==sizeof(T);
that's perfectly true. It doesn't say that sizeof(T) determines what the
value of N is. It doesn't actually say what it is that determines the
value of N, it just describes some facts involving N. The standard does
not determine what the value of N is; that's up to the implementation.

> Anyhow, it seems to me that moving the comment in a non-normative part
> wouldn't solve another problem: objects of the same type can have
> different sizes. Example:
>
> class A {};
> class B : public A { public: int i;};
>
> A a;
> B b;
>
> The A sub-object in b can occupy 0 bytes, while the complete object a
> cannot (1.8p5). Now how do you apply the text from 3.9p4 quoted above?

Good point; I don't know. I'd recommend filing a DR on that issue.

> Moreover you cannot say that sizeof "yields the number of bytes in the
> object representation of its operand", since AFAIK what is defined by
> the standard is the object representation of an object, not that of a
> (parenthesis enclosed name of a) type.
>
> >There's no circularity in the actual meaning. The size of a byte is
> >implementation-defined.
>
> You meant "the size of an object", I suppose.

No, I meant "the size of a byte". See 1.7p1: "A byte is ... composed of
... bits, the number of which is implementation-defined."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 22 Apr 2002 19:27:28 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  James Kanze <kanze@gabi-soft.de> writes:

|>  | Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  | |>  Huh?!?  The C++ standard requires that all bits in a char
|>  | |>  participate in a char value representation.  And EOF is not a
|>  | |>  character.

|>  | However, as far as I can see, it doesn't place any constraints
|>  | with regards as to what a character can be (except that the
|>  | characters in the basic character set must have positive values,
|>  | even if char is signed).

|>  Surely, the standard does define what "character" means. =20

No.  We all know what "character" means.  And that it has nothing to do
with char, wchar_t, etc.  (A "character" is not a numerical value, for
example.)

|>  Have a look at 17.1.2.

It is an interesting definition.  In particular the "[...] any object
which, when treated sequentially, can represent text" part.  I'm not to
sure what that is supposed to mean -- whether an object represents text
or not depends on how it is interpreted, and a char[] doesn't
necessarily represent text, whereas in specific contexts, a double[]
may.  (APL basically uses the equivalent of float[] to represent text.
So if I write an APL interpreter in C++...)

About the only way to make sense of it is suppose that the word "object"
was meant to be taken very literally -- the use of "object" instead of
"type" is intentional, and of course, a 32 bit wchar_t which contains
the value 0x5a5a5a5a is not a character, because there is no way which,
taking it sequentially (whatever that is supposed to mean -- I suppose
it is an attempt to cover multi-byte characters), there is no way which
it can be taken to represent text.

Given the wording, I wouldn't read too much into this definition.  And I
think some sort of clarification is necessary.  I have proposed an
implemention with 32 bit characters, where int_type and char_type are
identical.  I think that the standard can be interpreted two ways, one
of which forbids this implementation, and another which allows it.  I
would like to know which interpretation is correct.

>From a practical point of view, the implementation seems more than
reasonable, and exceedingly useful.  So I would like to see it allowed.
But there may be reasons of which I am unaware which argue for
forbidding it.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Daniel James <internet@nospam.demon.co.uk>
Date: Mon, 22 Apr 2002 19:32:43 GMT
Raw View
In article <81erbu44lttk47kcdm0abnq3j4km2mnppk@4ax.com>, Tom Plunket wrote:
> Merriam-Webster currently defines a byte to be "a group of
> eight binary digits...",

IMHO Merriam-Webster is a /terrible/ dictionary, so I'm now convinced that
/whatever/ "byte" means it cannot be that, or not only that.

FWIW the Shorter Oxford (which I would rate as  "fairly decent" dictionary)
defines "byte" as: "Computing. A group of binary digits (usu. eight) operated
on as a unit."

That seems to be pretty-much on the mark.

Cheers,
 Daniel.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 23 Apr 2002 08:27:53 GMT
Raw View
James Kanze wrote:
>
> Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:
....
> |>  Surely, the standard does define what "character" means.
>
> No.  We all know what "character" means.  And that it has nothing to do
> with char, wchar_t, etc.  (A "character" is not a numerical value, for
> example.)

Whether or not we all "know" what a character, the standard does define
the term. And the standard's definition is (regardless of any defects
that definition might contain) is the one that applies in all
discussions of what the standard means.

> |>  Have a look at 17.1.2.
>
> It is an interesting definition.  In particular the "[...] any object
> which, when treated sequentially, can represent text" part.  I'm not to
> sure what that is supposed to mean -- whether an object represents text
> or not depends on how it is interpreted, and a char[] doesn't
> necessarily represent text, whereas in specific contexts, a double[]
> may.  (APL basically uses the equivalent of float[] to represent text.
> So if I write an APL interpreter in C++...)

Sure, you can use a double to represent a character, so long as you make
sure that it provides all of the functionality that is called for of a
character type in sections 21, 22, and 27, as specified by 17.1.2. That
implies, among other things, specialization of std::char_traits for that
type. The standard does not require that an implementation provide a
general implementation of std::char_traits<>, only specializations for
char and wchar_t. Therefore, you'll need to provide one yourself. You
can specialize templates in namespace std, only if one of the template
arguments of the specialization is a user-defined type. Therefore,
portable code can use double as a character type only by wrapping it in
a POD class, and specializing std::char_traits<> for that class. Similar
specializations are required for any of the templates defined in
<locale> that you make use of with that type.

> About the only way to make sense of it is suppose that the word "object"
> was meant to be taken very literally -- the use of "object" instead of
> "type" is intentional, and of course, a 32 bit wchar_t which contains
> the value 0x5a5a5a5a is not a character, because there is no way which,
> taking it sequentially (whatever that is supposed to mean -- I suppose
> it is an attempt to cover multi-byte characters), ...


No, "taking it sequentially" is meant to cover the building of character
strings out of individual characters.

> ... there is no way which
> it can be taken to represent text.

I don't see that. An implementation can choose to define 0x5a5a5a5a as
being the encoding it uses for 'e'. And yes, I do mean 'e', not just
L'e'.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Mike Wahler" <mkwahler@ix.netcom.com>
Date: Thu, 18 Apr 2002 00:15:37 GMT
Raw View
Hillel Y. Sims <usenet@phatbasset.com> wrote in message
news:9BMu8.78898$8E3.20778405@news02.optonline.net...
> Here's what I'm taking away from all of this:
>
>    8-bits <= 1 C++ byte <= 'natural-word'-bits
> (where # of bits in a "natural-word" and actual # of bits used for the
> "byte" are platform-specific)
> ("C++ byte" not necessarily equivalent to platform-specific "byte")
>
> It could theoretically be 8, 9, 10, 11, 12, ... 16, ... 32, ... 64, or
even
> maybe 128 bits on some current graphics processors (guessing), or anything
> inbetween too (theoretically). It makes sense even; there are some
machines
> (DSPs) where "char" (as in character, as in human-readable text) is not a
> very heavily used concept vs efficient 32-bit numerical processing, so
they
> just define 'char' (1 byte!) to refer to the full 32-bits of machine
storage
> for efficiency (otherwise they'd probably have to do all sorts of bit
> masking arithmetic).

No need for such a complex description.
Various platforms implement a 'byte' with the number
of bits deemed most 'appropriate'.

A C or C++ program required the host platform to provide
(or perhaps 'emulate') a byte with at least eight bits.
The C and C++ data type representing this 'byte' is
type 'char'.  So regardless of the number of bits therein,
sizeof(char) == one byte.  Period.  Forever and ever, amen. :-)

-Mike



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Edwin Robert Tisdale <E.Robert.Tisdale@jpl.nasa.gov>
Date: Thu, 18 Apr 2002 07:18:15 GMT
Raw View
NotFound wrote:

> > Not everyone uses the term correctly, not even (apparantly) Intel. I'll
>
> What is the correct use? In others contexts a word is the minimun
> addressable unit, then a word on all the x86 family will be an octet.
>
> > consider an ISO specification more authoritative than a company
> > specification any day (though it could still be wrong).
>
> ISO has no authority to define the universal mean of a word.
> No more than intel, that is, they can define the mean it has in your
documents.

I believe the term in question is "machine word".
This is not a data type any more than "byte" is a data type.
It is simply the width of the typical [integral] "data path"
in a given computer architecture --
the width of a data bus, register, ALU, etc.

The meaning of the term "word" is typically "overloded"
by computer architects to descibe data types which may be interpreted
as characters, integers, fixed or floating point numbers, etc.
These type definitions are only meaningful
within the context of a particular computer architecture.
Intel's word, doubleword, quadword and double quadword types
are all based upon the original 8086 architecture's 16 bit machine word
and has remained fixed as the actual machine word size increased
to 32 then 64 bits with the introduction of new architectures
in the same family.




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Tom Plunket <tomas@fancy.org>
Date: Thu, 18 Apr 2002 07:19:23 GMT
Raw View
James Kanze wrote:


> |>  A machine word is as wide as the integer data path throught the
> |>  Arithmetic and Logic Unit (ALU).
>
> Or as wide as the memory bus?
>
> I'm not sure that there is a real definition of "word".

I think it's one of those terms...  ;)

The Sony PlayStation2 has 128-bit registers and has 128-bit
busses to the memory and other systems, but then the instruction
set typically works on the low 64 bits of those registers, even
though "int" is 32-bits for some bizzaro reason, and all of the
docs call the 128-bit values "quadwords".

-tom!

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Thu, 18 Apr 2002 07:19:51 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> | Gabriel Dos Reis wrote:
> | >
> | > "James Kuyper Jr." <kuyper@wizard.net> writes:
> | ....
> | > Now, look at the table 37 (Traits requirements)
> | >
> | >   X::eof() yields: a value e such that X::eq_int_type(e,X::to_int_type(c))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> | >   is false for all values c.
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
> | >
> | > | I don't see how that requirement would be violated by an implementation
> | > | which had CHAR_MIN==INT_MIN, and CHAR_MAX==INT_MAX.
> | >
> | > See above.
> |
> | In itself, that would merely mean that 'int' can't be the int_type for
> | 'char'. The clincher is that 21.1.3.1 explicitly specifies that int_type
> | for char_traits<char> must be 'int'. Therefore, I concede your point.
> |
> | Someone was keeping a list of C/C++ differences - this should be added
> | to that list; C makes no such guarantee.
>
> Which "no such guarantee"?

The guarantee that you cited from table 37, which (translated into C
terms) says EOF!=(int)c for all values of c.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Thu, 18 Apr 2002 07:20:55 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  James Kanze <kanze@gabi-soft.de> writes:

|>  [...]

|>  | The open issue is, I think, whether fgetc is required to be able
|>  | to return *all* values in the range of 0...UCHAR.  For actual
|>  | characters, this is not a problem -- if we have 32 bit char's, it
|>  | is certain that some of the values will not be used as a
|>  | character.

|>  That won't be conforming, since the standard says that any bit
|>  pattern represent a valid value.

But it doesn't require that fgetc actually be able to return all valid
values.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Thu, 18 Apr 2002 07:21:14 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  "James Kuyper Jr." <kuyper@wizard.net> writes:

|>  | Gabriel Dos Reis wrote:
|>  | ....
|>  | > Or you may just look at the requirements imposed by the standard
|>  | > std::string class in clause 21.

|>  | Clause 21 is very large and complicated;

|>  Not that complicated;=20

I'll admit that there are worse.  I was just looking for an excuse for
my laziness.

|>  it suffices to look at the first two pages.

|>  21.1.2/2
|>    For a certain character container type char_type, a related
|>    container type INT_T shall be a type or class which can represent
|>    all of the valid characters converted from the corresponding
|>    char_type values, as well as an end-of-file value, eof(). The type
|>    int_type represents a charac-ter container type which can hold
|>    end-of-file to be used as a return type of the iostream class membe=
r
|>    functions.

OK.  Consider the case of 32 bit char, int and long, using ISO 10646 as
a code set.  And read the text *very* carefully.  It doesn't say that
INT_T must be able to represent all valid values which can be put in a
char_type.  It says that it must be able to represent all valid
*characters* -- in this case, all values in the range 0...0x10FFFF --
plus a singular value for eof (say, 0xFFFFFFFF).

Other constraints mean that such an implementation would have to use
some somewhat particular definitions for some of the other functions,
but I think that such an implementation would be legal.  I would feel
better about it if it were more clearly statement somewhere that
"character" doesn't necessarily mean all possible values that can be
stored in a "char_type", but if this isn't what is meant, why use the
word character?

|>  The case in interest is when char_type =3D=3D char and int_type =3D=3D=
 int.=20

|>  Now, look at the table 37 (Traits requirements)

|>    X::eof() yields: a value e such that X::eq_int_type(e,X::to_int_typ=
e(c)) =20
|>    is false for all values c.

|>  (by 21.1.1/1, c is of type char).

X::eof() yields 0xFFFFFFFF.

X::to_int_type( char_type c ) is constrained to always yield a value
less than 0x11000, e.g.:

    to_int_type( char_type c )
    {
        return c > 0 && cc < 0x11000 ? c : 0 ;
    }

X::eq_int_type simply uses =3D=3D.

Where is the error in this implementation?

|>  The standard also says that any bit pattern for char represents a
|>  valid char value, therefore eof() can't be in the values-set of
|>  char.

Table 37 talks of characters, not valid char values.  Not all valid char
values need be valid characters.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Tom Plunket <tomas@fancy.org>
Date: Thu, 18 Apr 2002 07:24:55 GMT
Raw View
James Kanze wrote:

> The word "byte" has never meant eight bits.  Historically...

Ironically, words mean whatever they get used to mean, and as
long as a word has a definition that is understood by the
involved parties, that definition is valid regardless of what is
"official".

Indeed, dictionaries are developed to track the changes in our
language.  "Irregardless", for instance, doesn't make any
etymological sense, and yet it is used so often and always has
the same definition every time it's used that it has made its way
into many dictionaries.

Popular usage of the word "byte" does mean "eight bits" or
"octet", regardless of what ISO says and regardless of what IBM
once did 40 or 50 years ago.

Merriam-Webster currently defines a byte to be "a group of eight
binary digits...", and since dictionaries get definitions from
popular usage, we can assume that this definition is what most
people use as their definition of "byte".  This does not mean
that the ISO is wrong, of course, it just means that they are
defining byte to be something other than the popular usage.

As an example, a "nice" girl referred to a prostitute in
Victorian England.  The meaning of "nice" has morphed over the
years; the only thing defining it was popular usage and
understanding of what the word meant.

> The fact that machines with bytes of other than 8 bits have
> become rare doesn't negate the fact that when you do talk of
> them, the word "byte" doesn't mean 8 bits.  And the distinction
> is still relevant.  -- look at any of the RFC's, for example, and
> you'll find that when 8 bits is important, the word used is
> octet, and not byte.

Yes; the distinction is still relevant in that they need to
define these words to something other than the popular
definition.  This doesn't make the standards and RFCs wrong, just
anachronistic.  ;)

-tom!

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Thu, 18 Apr 2002 16:14:20 GMT
Raw View
David Schwartz wrote:
>
> "James Kuyper Jr." wrote:
>
> > >  4.1. FUNDAMENTAL DATA TYPES
>
> > >  The fundamental data types of the IA-32 architecture are bytes,
> > >  words, doublewords, quadwords, and double quadwords (see Figure
> > >  4-1). A byte is eight bits, a word is 2 bytes (16 bits), a
> > >  doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits),
> > >  and a double quadword is 16 bytes (128 bits). "
>
> > Not everyone uses the term correctly, not even (apparantly) Intel.

"the term" I was referring to  was "word".

>         What is wrong with Intel's usage? If a byte means "an 8-bit quantity",
> then they're right. If a byte means "the smallest addressable unit of
> storage on a particular architecture", then they are still right. What
> definition of "byte" makes Intel's usage incorrect?

If it's properly described as a 32-bit architecture, then "word" should
indicate a 32-bit unit of memory.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Thu, 18 Apr 2002 17:29:00 GMT
Raw View
Tom Plunket <tomas@fancy.org> writes:

|>  James Kanze wrote:

|>  > The word "byte" has never meant eight bits.  Historically...

|>  Ironically, words mean whatever they get used to mean, and as long
|>  as a word has a definition that is understood by the involved
|>  parties, that definition is valid regardless of what is "official".

True, but words are used within distinct communities.  Here, we are
talking of a specialized technical community; how the man on the street
uses the word (or if he has even heard of it) is irrelevant: when we use
the word stack, or loop, in this forum, it generally also has a meaning
quite different from that used by the man on the street.

    [...]
|>  Popular usage of the word "byte" does mean "eight bits" or "octet",
|>  regardless of what ISO says and regardless of what IBM once did 40
|>  or 50 years ago.

I'm not sure that there is a popular usage of the word "byte".  If so,
it is very recent, and probably is 8 bits.  But that is separate from
the technical usage, just as the use of stack or loop with regards to
programming is different from other uses.

|>  Merriam-Webster currently defines a byte to be "a group of eight
|>  binary digits...", and since dictionaries get definitions from
|>  popular usage, we can assume that this definition is what most
|>  people use as their definition of "byte".  This does not mean that
|>  the ISO is wrong, of course, it just means that they are defining
|>  byte to be something other than the popular usage.

And that Merriam-Webster is giving a general definition, and not a
technical one.  IMHO, if they don't mention it's use with a meaning of
other than 8 bytes, they are wrong; the two uses are related, and
presenting one without the other is highly misleading, since the
definition they do give "sounds" technical.  They might, of course,
label my usage as "technical", or give some other indication that it is
not the everyday usage.

With regards to the technical meaning, it is significant to note that
technical documents in which the unit must be 8 bits (descriptions of
netword protocols, etc.) do NOT use the word byte, but octet.

|>  As an example, a "nice" girl referred to a prostitute in Victorian
|>  England.  The meaning of "nice" has morphed over the years; the only
|>  thing defining it was popular usage and understanding of what the
|>  word meant.

A good dictionary will still give this meaning, indicating, of course,
that it is archaic.

I would agree that we are in a situation where the word byte is changing
meaning, and 50 years from now, it probably will mean 8 bits.  For the
moment, even if many people assume 8 bits, the word is still
occasionally used for other sizes, and still retains to some degree its
older meaning.  (This is, of course, *why* it isn't used in protocol
descriptions.)

|>  > The fact that machines with bytes of other than 8 bits have become
|>  > rare doesn't negate the fact that when you do talk of them, the
|>  > word "byte" doesn't mean 8 bits.  And the distinction is still
|>  > relevant.  -- look at any of the RFC's, for example, and you'll
|>  > find that when 8 bits is important, the word used is octet, and
|>  > not byte.

|>  Yes; the distinction is still relevant in that they need to define
|>  these words to something other than the popular definition.  This
|>  doesn't make the standards and RFCs wrong, just anachronistic.  ;)

Not even anachronistic.  Just more precise and more technical than
everyday usage.

In the case of the C/C++, the use is a bit special, even with regards to
the older meaning.  I'd actually favor a different word here, but I
don't have any suggestions. =20

And what about the use in the library section, where there is a question
of multi-byte characters -- I've never heard anyone use anything else
but "multi-byte characters" when referring to the combining codes in 16
bit Unicode, for example.  So at least in this compound word, byte has
retained a more general meaning.

In the case of the RFC's and the various standards for the OSI
protocols, I see no reason to switch from "octet" to "byte".  The word
"octet" is well established, and is precise, and makes it 100% clear
that exactly 8 bits are involved.  Even if "byte" is generally
understood to be 8 bits, why choose the less precise word?

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Markus Mauhart" <markus.mauhart@chello.at>
Date: Thu, 18 Apr 2002 21:45:48 GMT
Raw View
"Gennaro Prota" <gennaro_prota@yahoo.com> wrote ...
>
> P.S.: the only thing that leaves me perplexed is the apparent circular
> definition constituted by 5.3.3 and 3.9p4. Does anybody know if it is
> resolved in an other part of the standard?

IMO thats worth a defect report; any other opinions ?


3.9p4:
-----------
The object representation of an object of type T is the sequence of
N unsigned char objects taken up by the object of type T, where N
equals sizeof(T). The value representation of an object is the set
of bits that hold the value of type T. For POD types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementationdefined set of values.37)
37) The intent is that the memory model of C++ is compatible with
that of ISO/IEC 9899 Programming Language C.
-----------
.... so this definition of "object representation of an object of type T"
relies on "sizeof(T)".


5.3.3
-----------
The sizeof operator yields the number of bytes in the object representation
of its operand. .....
-----------
.... and "sizeof(T)" seems to rely on T's "object representation"



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: NotFound <correo@tengo.no>
Date: Fri, 19 Apr 2002 01:05:17 GMT
Raw View
> > What is the correct use? In others contexts a word is the minimun
> > addressable unit, then a word on all the x86 family will be an octet.
> The definition I'm familiar with can be paraphrased by saying that if
> it's correctly described as a 32-bit machine, then the word size is 32
> bits.

Then win32 it's not correctly described as a 32-bit machine?

> > ISO has no authority do define the universal mean of a word. No more
> No one has that authority. However, ISO does have the authority to
> define the usage within ISO documents, and the usage by anyone who cares
> about ISO standards. Which includes me.

In the context of this newsgroup the relevant standard does not define
WORD.

Regards.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Wed, 24 Apr 2002 00:45:19 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:

| Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:
|
| |>  James Kanze <kanze@gabi-soft.de> writes:
|
| |>  | Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:
|
| |>  | |>  Huh?!?  The C++ standard requires that all bits in a char
| |>  | |>  participate in a char value representation.  And EOF is not a
| |>  | |>  character.
|
| |>  | However, as far as I can see, it doesn't place any constraints
| |>  | with regards as to what a character can be (except that the
| |>  | characters in the basic character set must have positive values,
| |>  | even if char is signed).
|
| |>  Surely, the standard does define what "character" means.
|
| No.

Huh?!?
Would you deny evidence and references given in the standard text?

|  We all know what "character" means.

And aparently it doesn't mean the same thing for everybody.  Which is
why we're having this whole thread.

|  And that it has nothing to do
| with char, wchar_t, etc.  (A "character" is not a numerical value, for
| example.)

As far as C++ is concerned (aren't we in a C++ group?), what a
"character" means is what the standard text says it is.

| |>  Have a look at 17.1.2.
|
| It is an interesting definition.  In particular the "[...] any object
| which, when treated sequentially, can represent text" part.  I'm not to
| sure what that is supposed to mean -- whether an object represents text
| or not depends on how it is interpreted, and a char[] doesn't
| necessarily represent text, whereas in specific contexts, a double[]
| may.  (APL basically uses the equivalent of float[] to represent text.
| So if I write an APL interpreter in C++...)
|
| About the only way to make sense of it is suppose that the word "object"
| was meant to be taken very literally -- the use of "object" instead of
| "type" is intentional,

But the thing is that you cannot meaningfully talk about an object
without talking about its type.  And the standard does relates a
"character" to "type":

17.1.2
  in clauses 21, 22, and 27, means any object which, when treated
  sequentially, can represent text. The term does not only mean char and
  wchar_t objects, but any value that can be represented by a type that
  provides the definitions specified in these clauses.

The last sentence makes it very clear that *any* char (or unsigned
char) value is a character.  Whether you could use double[] to
represent a text is irrevelent to our discussion.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Wed, 24 Apr 2002 17:23:46 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  James Kanze wrote:

|>  > Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:
|>  ....
|>  > |>  Surely, the standard does define what "character" means.

|>  > No.  We all know what "character" means.  And that it has
|>  > nothing to do with char, wchar_t, etc.  (A "character" is not a
|>  > numerical value, for example.)

|>  Whether or not we all "know" what a character, the standard does
|>  define the term. And the standard's definition is (regardless of
|>  any defects that definition might contain) is the one that applies
|>  in all discussions of what the standard means.

Quite.  I'll admit that I only looked for a definition in 1.3.  Given
that this section defines things like multi-byte-character, I would
have expected any definition there.  The definitions in 17.1, of
course, only concern the library, but since that is all that interests
me for the moment...

Also, I would argue that unless it is inconsistent, the definition in
itself cannot, by its nature, contain a defect.  The nature of a
standard is such that it must use words in a very restrictive, and
often special, way.  I have no objections to this.  On the other hand,
I'm still not very sure as to what the intention is, globally, with
regards to my problem.  But I will address this further in the
following.  And I hope Gaby will take this as a belated answer to his
comments as well -- belated, because I really needed to give the issue
some considerate thought, and even experiment some.

|>  > |>  Have a look at 17.1.2.

|>  > It is an interesting definition.  In particular the "[...] any
|>  > object which, when treated sequentially, can represent text"
|>  > part.  I'm not to sure what that is supposed to mean -- whether
|>  > an object represents text or not depends on how it is
|>  > interpreted, and a char[] doesn't necessarily represent text,
|>  > whereas in specific contexts, a double[] may.  (APL basically
|>  > uses the equivalent of float[] to represent text.  So if I write
|>  > an APL interpreter in C++...)

|>  Sure, you can use a double to represent a character, so long as
|>  you make sure that it provides all of the functionality that is
|>  called for of a character type in sections 21, 22, and 27, as
|>  specified by 17.1.2. That implies, among other things,
|>  specialization of std::char_traits for that type. The standard
|>  does not require that an implementation provide a general
|>  implementation of std::char_traits<>, only specializations for
|>  char and wchar_t.

I am aware of this.  In fact, while several implementations DO provide
general versions, these versions are not compatible. =20

I gather this less from my own experience, than from a question in
fr.comp.lang.c++, where someone tried to use basic_istream<unsigned
char>, and got radically different results on different platforms.
IMHO, it would be better for an implementation not to provide these
specializations, so that any attempt at using them would result in a
compiler error, rather than some unknown behavior.  (On the other
hand, I won't make any reproaches.  I'm sure that the implementors
were trying to be helpful.  And it is very difficult to know what is
really helpful without some concrete experience.)

|>  Therefore, you'll need to provide one yourself. You can specialize
|>  templates in namespace std, only if one of the template arguments
|>  of the specialization is a user-defined type. Therefore, portable
|>  code can use double as a character type only by wrapping it in a
|>  POD class, and specializing std::char_traits<> for that
|>  class. Similar specializations are required for any of the
|>  templates defined in <locale> that you make use of with that type.

And this raises a very real question for me: what on earth was the
purpose of making the stream's templates.  The two existing
specializations (char and wchar_t) have differing semantics; because
of the requirement in 22.2.1.5/3 that codecvt<char,char,mbstate_t>
implement a degenerate conversion, basic_streambuf<char> really
doesn't even need to use the locale member.  And since I cannot
instantiate a stream over any other built-in type, most, if not all of
the utility is lost for the user.

I am very curious about this.  I cannot believe that anyone would have
been foolish enough to propose templating the streams without some
concrete experience with the results.  How did the templates work in
actual practice?  What types were used, other than char and wchar_t?
Because given the current situation, I can only conclude that
templating the streams was a major mistake, making them more complex
for the implementor, with more overhead (at least in practice) for the
user, and with no corresponding gain.  I suspect, however, that this
is more a case of the right hand not knowing what the left hand is
doing (something which inevitably occurs in a document as complex as
the C++ standard), and that in fact, in the existing practice, people
did specialize char_traits over built in types.

Actually, I'm beginning to wonder in general about the existing
practice.  I have just started trying to use wide characters, and with
my very limited experience, have come up with the following
conclusions:

  - I cannot use wchar_t, because in practice, I *need* ISO 10646, or
    something equivalent, and wchar_t doesn't guarantee this.  The
    result is that I am trying to use an implementation dependant
    uint_32 (a typedef).

  - I cannot use the actual stream classes for formatted input,
    because the semantics of num_get are useless in an international
    environment.  It is extremely na=EFve to think that "013456789" are
    the only digits, something which is immediately apparent to anyone
    having to deal with, say, Arabic.  (It was, in fact, my desire to
    support Arabic which lead to my experimentation.)

  - And now, I cannot use the standard streams because I'm not allowed
    to provide a specialization for char_traits<uint_32>.

I have just started experimenting with this, so there are doubtlessly
a lot of points I have missed.  For this reason, I literally beg the
proponants of locale and of the templated streams to tell me a bit
about their experience in using them, and how they answered such
questions.

|>  > About the only way to make sense of it is suppose that the word
|>  > "object" was meant to be taken very literally -- the use of
|>  > "object" instead of "type" is intentional, and of course, a 32
|>  > bit wchar_t which contains the value 0x5a5a5a5a is not a
|>  > character, because there is no way which, taking it sequentially
|>  > (whatever that is supposed to mean -- I suppose it is an attempt
|>  > to cover multi-byte characters), ...

|>  No, "taking it sequentially" is meant to cover the building of
|>  character strings out of individual characters.

|>  > ... there is no way which it can be taken to represent text.

|>  I don't see that. An implementation can choose to define
|>  0x5a5a5a5a as being the encoding it uses for 'e'. And yes, I do
|>  mean 'e', not just L'e'.

I'm not talking about what an implementation can do.  An
implementation is free to do a lot of things which would make it
useless -- inventing a totally new character encoding would probably
qualify (preferably one in which the most common character, ' ', is
encoded 0xffffffff), although it is certainly legal.

As a result of recent discussions and experiments, I have come to the
conclusion that we have three fundamentally different types of streams
(from the user point of view), which are not being kept as distinct as
they should be.  Part of the problem is definitly inherited from C,
and is due to the fact that on many systems, the distinctions are
irrelevant.  But the problems have been increased by our attempts at a
useless genericity; useless because the types really are different,
and because we do not allow enough freedom for the user to define
their own types.

The first type of "stream" is just a pure stream of binary bytes.  It
is what basic_streambuf<char> returns, at least in intent, although
logically, one would expect the results to be unsigned char.

The second type is a stream of narrow, possibly multi-byte or state
encoded characters.  A stream which is read with no translation on
input and output.  IMHO, this type is really only of interest in a
limited number of cases (Americas, Europe and black Africa, when
internationalization is not an issue), and the number is decreasing.
(In Germany for example, I can make do with the ISO 8859-1 character
set, which conveniently fits on an 8 bit byte.  Except that people in
Germany sometimes come from other countries, and it certainly not very
polite to spell a name like Dvorak without putting the hacek on the r.
As the quality of computer output improves, so do the expectations.)
Generally, using multi-byte or state encoded characters internally is
a sure way to added programming complexity, and since valid
alternatives exist, should be avoided.

Historically, these two types of streams have been confounded,
probably because they are very similar, at least on Unix.  But the
differences do raise their heads from time to time.  Thus, for
example, fgetc returns an int in the range of 0...UCHAR_MAX when it
returns a character -- interestingly enough, you cannot reliably
assign this value to a char, according to the standard.  (In practice,
any implementation which doesn't allow such a common idiom is doomed
to fail on the market, and I have no hesitations about this in
practice.)  In C++, we go one step further: istream::read writes to a
char array (instead of using a void* like fread).  Logically, I would
expect to use instantiations of unsigned char (and not char) for the
first type of stream, but this is not legal -- I cannot assume that
the implementation provides a version of char_traits<unsigned char>
which works, and I cannot provided one myself.

The confusion becomes serious once we leave the domain of "typical"
machines, with 8 or 9 bit bytes, and int's of two or more bytes.  If I
have a machine with 32 bit bytes, for example, it is still reasonable
that character input be 8 bit ASCII -- regardless of my internal
representations, I still have to interface with the outside world.  So
a version of fgetc which returns values in the range 0...255 or -1
(for EOF) is quite useful, whether legal or not.  On the other hand,
when reading binary data, I probably want to fill all of the bits.  I
don't think a conforming implementation can fulfill both expectations
(and you've convinced me that a hosted implementation is not possible
on machines where sizeof(int)=3D=3D1 -- something that I don't think was
originally intended).  (While I'm at it, I might also ask why there is
a state field in fpos for such streams.)

When we extend our analysis to wide character streams, we find a
completely different case.  First, they imply active code translation
on input and output; we are still reading and writing bytes.  But
normally, multi-byte characters or state encoded characters are not
expected, at least as I understand the intent of the standard -- from
a users point of view, this is the raison d'=EAtre for such streams.  So
the semantics of such streams are different from those of narrow
character streams, and the relationship via templates that they have
in C++ is misleading at best (and probably makes a quality
implementation impossible at worst).  Several points to consider:

  - The type and the semantics of the state attribute in positionning
    depends on the character encoding, and thus, in C++, on the
    embedded locale.  So once again, why is it in fpos, which depends
    on the character type, and not the locale?  (One obvious answer is
    "because it is in fpos_t in C".  As I said, part of the problem is
    inherited.)

    This probably isn't too much of a problem in practice.  I can
    think of no encoding scheme where an int would not be sufficient.
    So the type can be defined in fpos, and the semantics in codecvt.
    Of course, if we want users to be able to provide additional
    codecvt, for external codes not forseen by the implementor, it
    would be better to impose this (or at least an integral type);
    otherwise, user code is at the mercy of an implementation defined
    type.

  - In real life, the most widely used code is 22 bits (ISO 10646,
    alias Unicode).  For most machines, this means a 32 bit type.  And
    on a 32 bit machine, I see no pratical reason to impose a larger
    int_type; a larger int_type will typically have a non-negligible
    effect on run time, for no corresponding benefits.  There is no
    way a streambuf reading UTF-8, for example, can ever return a
    character value greater than 0x10ffff.

    I'm not totally convinced, but accepting the arguments of Gaby
    (which are not unfounded), the current standard makes an efficient
    implementation for ISO 10646 on a 32 bit machine impossible.  I do
    not consider this an acceptable state of affaires.

  - In real life, there are several sets of digits.  The num_get and
    num_put must take this into account.

    There are certain limits here; I certainly wouldn't expect the
    routines to be able to cope with things like \u2153 (vulgar
    fraction one third), and realistically speaking, I would expect to
    wait for some existing practice before requiring correct handling
    of things like \u0BF1 (Tamil number one hundred).  But for
    starters, it isn't too much to require that all characters in
    category Nd be correctly interpreted.  (I've written a small awk
    script which automatically generates an initialized
    std::map<Character,int> for the digit values from the Unicode data
    file, and for output, the decimal digits are always in contiguous
    ascending order, so adding a formatting flag in which the code for
    0 is given should be sufficient.)

  - IMHO, the only possible justification for making the streams
    templates is to allow the user to define his own character types.
    This is particularly useful on machines which, for historical
    reasons or other, do not support Unicode 3.2 with their wchar_t
    (typically because they fixed wchar_t at 16 bits based on Unicode
    3.0).  I would dearly love to be able to simply write a typedef
    for whatever type is needed (uint_32), implement the necessary
    traits, and have everything else work.  (Realistically, I'll have
    to implement some locales as well, which makes the job pretty near
    impossible for all but the most advanced C++ programmer.  Again, I
    have the feeling that a less generic solution would have stood us
    better.)

I am currently experimenting with much of this, and will make any code
which I finally get working available for comment on my web site.  But
I am rapidly becoming convinced that in order to implement anything
useful using ISO 10646 aka Unicode 3.2, I'm going to have to throw the
standard out the window, and reimplement all of iostream from scratch;
technically, I suspect that I should also do the same with string,
since I don't know how to define off_type, pos_type and state_type in
char_traits.  But perhaps someone will offer me a solution, and even
if not, I'm betting that std::string itself doesn't actually use these
types, and I'll get away with violating the standard on this point
(and on the fact that I'm specializing std::char_traits on a basic
type).

In the mean time, I think that this is an issue which needs more
thought (perhaps for the next version of the C++ standard).  I don't
ask for an immediate solution, because I'm convinced that in the
absense of any real existing practice, any solution adopted will turn
out wrong in the long run.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Thu, 25 Apr 2002 16:56:38 GMT
Raw View
James Kanze wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
....
> |>  define the term. And the standard's definition is (regardless of
> |>  any defects that definition might contain) is the one that applies
> |>  in all discussions of what the standard means.
....
> Also, I would argue that unless it is inconsistent, the definition in
> itself cannot, by its nature, contain a defect.  The nature of a

I'd say that it is a defect  for the standard to establish a definition
that is excessily inconsistent with conventional usage. This document is
meant to be read and understood; it's not just an excersize is figuring
out the most compact way of expressing an idea. However, I don't think
filing a DR on this basis is likely to be very effective.

....
>   - And now, I cannot use the standard streams because I'm not allowed
>     to provide a specialization for char_traits<uint_32>.

You can wrap uint_32 in a class, and then specialize it.

....
> The second type is a stream of narrow, possibly multi-byte or state
> encoded characters.  A stream which is read with no translation on
> input and output.  IMHO, this type is really only of interest in a
> limited number of cases (Americas, Europe and black Africa, when
> internationalization is not an issue), and the number is decreasing.

I've spent my entire career within that "limited" area, so "only" seems
a little inappropriate to me. I believe in internationalization, but
there are huge markets out that can get away without it.

....
> (and you've convinced me that a hosted implementation is not possible
> on machines where sizeof(int)==1 -- something that I don't think was

I'd hate to have convinced you, when I've become convinced of the
opposite. The C++ requirement is:
" X::eof() yields: a value e such that
X::eq_int_type(e,X::to_int_type(c)) is false for all values c." I
translated the relevant requirement from C++ terms into C terms as
"EOF!=(int)c for all values of c", solely for the purpose of pointing
out that C has no corresponding requirement. The translated requirement
is incompatible with sizeof(int)==1; but the untranslated requirement is
not, as you've shown. That's because eq_int_type() and to_int_type() are
member functions, and are therefore not strictly equivalent to "==" and
"(int)", respectively. I think you've established that they can be
defined in a way that makes sizeof(int)==1 legal.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alexander Terekhov <terekhov@web.de>
Date: Mon, 15 Apr 2002 07:42:13 GMT
Raw View
Chris Wolfe wrote:
>=20
> "Martin v. L=F6wis" wrote:
> >
> > James Kanze <kanze@gabi-soft.de> writes:
> >
> > > Excuse me, but on 32 bit machines (at least the ones I've seen), DW=
ORD
> > > is 64 bits.
> >
> > I guess you have not seen Microsoft Windows, then. Just try
> >
> > #include <windows.h>
> > #include <stdio.h>
> >
> > int main()
> > {
> >   printf("%d\n", sizeof(DWORD));
> > }
> >
> > in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.
> >
> > Regards,
> > Martin
>=20
> AFAIK that is for backwards compatibility with 16-bit DOS and Windows
> 3.x. A double word at the assembler level is still 64 bits.
>=20
> And as we're well off-topic at this point...

Sure, nevertheless:

"IA-32 Intel=AE Architecture
 Software Developer=92s
 Manual
 Volume 1:
 Basic Architecture
 ....
 4.1. FUNDAMENTAL DATA TYPES

 The fundamental data types of the IA-32 architecture are bytes,=20
 words, doublewords, quadwords, and double quadwords (see Figure=20
 4-1). A byte is eight bits, a word is 2 bytes (16 bits), a=20
 doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits),=20
 and a double quadword is 16 bytes (128 bits). "

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Jack Klein <jackklein@spamcop.net>
Date: Mon, 15 Apr 2002 12:14:51 GMT
Raw View
On Sun, 14 Apr 2002 06:56:42 GMT, Edwin Robert Tisdale
<E.Robert.Tisdale@jpl.nasa.gov> wrote in comp.lang.c++:

> Bob Hairgrove wrote:
>
> > In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> > there is an interesting passage on page 24:
> >
> >     "A char variable is of the natural size to hold a character
> >      on a given machine (typically a byte) and an int variable
> >      is of the natural size for integer arithmetic
> >      on a given machine (typically a word)."
> >
> > Now the last statement (i.e. sizeof(int) typically == a word)
> > certainly shows the age of the text here.
> > In the meantime, the "natural" size of an int
> > has grown to a 32-bit DWORD on most machines,
> > whereas 64-bit int's are becoming more and more common.
> >
> > But what does this mean for char?
> > I was always under the assumption that sizeof(char)
> > is ALWAYS guaranteed to be exactly 1 byte,
> > especially since there is no C++ "byte" type.
> > As we now have the wchar_t as an intrinsic data type,
> > wouldn't this cement the fact that char is always 1 byte?
> >
> > What does the ANSI standard have to say about this?
>
> A byte is a data size -- not a data type.
> A byte is 8 bits on virtually every modern processor
> and the memories are almost always byte addressable.

This is absurd and totally incorrect.  Just for example, the Analog
Devices SHARC is a very modern processor.  It's byte is 32 bits and
its memory is not octet addressable at all.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Mon, 15 Apr 2002 12:15:22 GMT
Raw View
On Sun, 14 Apr 2002 21:02:07 GMT, "Carl Daniel" <cpdaniel@pacbell.net>
wrote:


>> Section 5.3.3: "The sizeof operator yields the number of bytes in the
>> object representation of its operand."
>
>I hadn't looked at that section before this morning.  I'm surprised they
>worded it that way, since it's patently false given the most common mean=
ing
>of 'byte' (8 bits).

The standard doesn't rely on the common meaning infact: it uses the
term as explained in =A71.7p1. Note also, to complete the definition,
that what the standard requires is that a byte is uniquely addressable
*within* C++ and not within the hardware architecture: the two "units"
can be different, with the char type either larger or smaller.

As an example of the latter, a machine where the hardware-addressable
unit is 32-bit can still have a C++ compiler with 8-bit chars (the
minimum anyway, remember that CHAR_BIT>=3D8), even though this requires
that the addresses not multiple of the machine-unit contain both the
actual address and a relative offset.

The long and the short of it is that the compiler can perform all kind
of magic to let you appear what doesn't exist at the assembly level:
it is a shell, and we are its inhabitants, at least until we don't go
starting our favourite disassembler and take a look at the world
outside ;)


P.S.: the only thing that leaves me perplexed is the apparent circular
definition constituted by 5.3.3 and 3.9p4. Does anybody know if it is
resolved in an other part of the standard?


Genny.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Pete Becker <petebecker@acm.org>
Date: Mon, 15 Apr 2002 12:16:59 GMT
Raw View
James Kanze wrote:
>
> Although I'll admit that I don't see what something like DWORD is doing
> in a C++, or even a C, interface. Somebody must have seriously muffed
> the design, a long time ago.
>

Welcome to the wonderful world of Windows, where everything is a typedef
or a macro.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 15 Apr 2002 12:18:19 GMT
Raw View
Jack Klein <jackklein@spamcop.net> writes:

|>  There are now C++ compilers for 32 bit digital signal processors
|>  where char, short, int and long are all 1 byte and share the same
|>  representation.  Each of those bytes contains 32 bits.

A slightly different issue, but I believe that most, if not all of these
are freestanding implementations.  There is some question whether int
and char can be the same size on a hosted implementation, since
functions like fgetc (inherited from C) must return a value in the range
[0...UCHAR] or EOF which must be negative.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 15 Apr 2002 12:19:34 GMT
Raw View
Witless <witless@attbi.com> writes:

|>  Bob Hairgrove wrote:

|>  > On Sun, 14 Apr 2002 07:01:54 GMT, Witless <witless@attbi.com> wrote=
:

|>  > >> But what does this mean for char?? I was always under the
|>  > >> assumption that sizeof(char) is ALWAYS guaranteed to be exactly
|>  > >> 1 byte,

|>  > >Your assumption is invalid.

|>  > Check out Mike Wahler's response ... seems that the standard does
|>  > guarantee this (although a byte doesn't have to be 8 bits). That
|>  > is, the guarantee seems to be that sizeof(char)=3D=3D1 under all
|>  > circumstances.

|>  That's not the issue.  The hidden redefinition of "byte" is the
|>  issue.

What hidden redefinition?  The definition for the word as used in the
standard is in 1.7, which is where all of the definitions are.  As
usual, the standard uses a somewhat stricter definition that the
"normal" definition.  In particular:

  - Not all machines have addressable bytes.  All C/C++ implementations
    must have addressable bytes.  This requirement can be met in one of
    two ways: declaring machine words to be bytes (typical for DSP's),
    or implementing some form of extended addressing, where char* is
    larger than int* (typical for general purpose word addressed
    machines).

  - A byte may be less than 8 bits -- the first use of the word, in
    fact, was for six bit entities.  The C/C++ standard requires bytes
    to have at least eight bits.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 15 Apr 2002 12:20:11 GMT
Raw View
"Mike Wahler" <mkwahler@ix.netcom.com> writes:

|>  Carl Daniel <cpdaniel@pacbell.net> wrote in message
|>  news:Im5u8.1420$Uf.1278678108@newssvr21.news.prodigy.com...
|>  > "Bob Hairgrove" <rhairgroveNoSpam@Pleasebigfoot.com> wrote in messa=
ge
|>  > news:3cb820f5.7959715@news.ch.kpnqwest.net...
|>  > > But what does this mean for char?? I was always under the
|>  > > assumption that sizeof(char) is ALWAYS guaranteed to be exactly
|>  > > 1 byte, especially since there is no C++ "byte" type. As we now
|>  > > have the wchar_t as an intrinsic data type, wouldn't this cement
|>  > > the fact that char is always 1 byte?

|>  > sizeof(char) is guaranteed to be 1.  1 what though?

|>  One byte.

|>  > 1 memory allocation unit.

|>  No.

I'd say yes.  But the name of that memory allocation unit is "byte".

|>  >  All other types must have sizes which are multiples of
|>  >  sizeof(char).

|>  Right.  'char' and 'byte' are synonymous in C++

Not quite.  A C/C++ cannot directly access "bytes"; it can only access
"objects", which are sequences of contiguous bytes.  On the other hand,
the standard requires that for char and its signed and unsigned
variants, this sequence is exactly one element long, and that these
types (or at least unsigned char) contain no padding.  So any
distinction between unsigned char and byte is purely formal.

|>  > The standard makes no claim that 1 memory allocation unit =3D=3D 1
|>  > byte.

|>  It absolutely does.  See my quote of the standard elsethread.

|>  > On a
|>  > system with a 16-bit "natural character",

|>  In this context, 'natural character' =3D=3D byte.

According to the definition in the standard, at any rate.

|>  >sizeof(char) and sizeof(wchar_t) might both be 1,

|>  sizeof(char) is *required* to be one byte.
|>  sizeof(wchar_t) is usually larger, typically two
|>  (but it's implementation-defined).

The most frequent situation, I think, is 8 bit char's and 32 bit
wchar_t's.  Anything less than about 21 bits for a wchar_t pretty much
makes them relatively useless, since the only widespread code set with
more than 8 bits is ISO 10646/Unicode, which requires 21 bits.  (But of
course, the standard doesn't require wchar_t -- or anything else, for
that matter -- to be useful:-).)

|>  > and sizeof(int), though it's 32 bits, would be 2 not 4.

|>  Absolutely not.  sizeof(int) is implementation-defined, but is still
|>  express in bytes (i.e. chars).  A 32-bit int's sizeof will be 32 /
|>  CHAR_BIT.

Which on some DSP is 1.

|>  > Further, there's no guarantee that you have any access to the
|>  > smallest addressable unit of storage,

|>  Yes there is.  The byte is specified as smallest addressible unit.

Yes and no.  Before making any claims, it is important to state what you
are claiming.  If the claim involves the smallest unit of addressable
storage at the hardware level, there is no guarantee -- the smallest
unit of addressable storage in C/C++ must be at least 8 bits, and there
exist machines with hardware addressable bits.  If the claim refers to
the C/C++ memory model, it is true by definition.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 15 Apr 2002 12:21:09 GMT
Raw View
"Carl Daniel" <cpdaniel@pacbell.net> writes:

|>  "James Kuyper Jr." <kuyper@wizard.net> wrote in message
|>  news:3CB9CE84.7F4267B2@wizard.net...

|>  > Carl Daniel wrote:
|>  > ....
|>  > > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory
|>  > > allocation unit.  All other types must have sizes which are
|>  > > multiples of sizeof(char).  The standard makes no claim that 1
|>  > > memory allocation unit =3D=3D 1 byte.  On a

|>  > Section 5.3.3: "The sizeof operator yields the number of bytes in
|>  > the object representation of its operand."

|>  I hadn't looked at that section before this morning.  I'm surprised
|>  they worded it that way, since it's patently false given the most
|>  common meaning of 'byte' (8 bits).  It would have helped if the
|>  standard actually defined the word byte, or simply not used it at
|>  all.  As is, the section is confusing at best.

The word "byte" has never meant eight bits.  Historically, the word was
invented at IBM to refer to a unit of addressable memory smaller than a
word -- I believe that the first use of the word was for 6 bit units.

The standard, of course, doesn't use byte in this sense -- a word
addressed machine doesn't have bytes, but an implementation of C or C++
on it does.  The standard actually uses the word with two different (but
not incompatible) meanings.

The first definition is given in 1.7.  The identity of these bytes with
char/unsigned char/signed char is not explicitly stated, but follows
from the fact that sizeof on these types must return 1, and that these
types (or at least unsigned char) cannot contain padding or bits which
don't participate in the value.

The second definition is indirectly given in 17.3.2.1.3.1, which
defines "null-terminated byte string".  In this case, a "null-terminated
byte string" is an array of char, signed char or unsigned char, which is
delimited by a 0 sentinal object.  In the given context, the implication
is that the "bytes" are actually characters (or parts of multi-byte
characters), and that the sequence doesn't contain the value 0.

|>  And yes, I realize that in the past 'byte' was used more flexibly,
|>  with 'bytes' being 6, 7, 8, 9, 10, 12, and even 15 bits on various
|>  systems.  Surely today, and as surely in 1998, most readers think "8
|>  bits" when they see the word "byte".

Not just in the past.  The fact that machines with bytes of other than 8
bits have become rare doesn't negate the fact that when you do talk of
them, the word "byte" doesn't mean 8 bits.  And the distinction is still
relevant.  -- look at any of the RFC's, for example, and you'll find
that when 8 bits is important, the word used is octet, and not byte.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 15 Apr 2002 15:37:18 GMT
Raw View
James Kanze wrote:
....
> |>  Right.  'char' and 'byte' are synonymous in C++
>
> Not quite.  A C/C++ cannot directly access "bytes"; it can only access
> "objects", which are sequences of contiguous bytes.  On the other hand,
> the standard requires that for char and its signed and unsigned
> variants, this sequence is exactly one element long, and that these
> types (or at least unsigned char) contain no padding.  So any
> distinction between unsigned char and byte is purely formal.

A simpler way to clarify the distinction is point out that a byte is a
unit used to measure memory, while char is a data type that defined as
fitting into one byte. As such, 'char' is a much richer concept than
'byte'.

....
> The most frequent situation, I think, is 8 bit char's and 32 bit
> wchar_t's.  Anything less than about 21 bits for a wchar_t pretty much
> makes them relatively useless, since the only widespread code set with
> more than 8 bits is ISO 10646/Unicode, which requires 21 bits.  (But of
> course, the standard doesn't require wchar_t -- or anything else, for

There's a 16 bit variable of it. While I don't use either version of it
in any of my own programs, from what I'd heard here and on comp.std.c,
I'd gotten the impression that the 16 bit variant was more widely used
than the 32 bit version. Yours is the first mention I've ever seen of a
21 bit version - or are you specifying the number of bits actually used
by the 32 bit version?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alexander Terekhov <terekhov@web.de>
Date: Mon, 15 Apr 2002 15:38:11 GMT
Raw View
Gennaro Prota wrote:
[...]
> As an example of the latter, a machine where the hardware-addressable
> unit

Right, this is called "a memory location" or "a memory granule", AFAIK.

> is 32-bit can still have a C++ compiler with 8-bit chars ....

Well, things are getting much more interesting with *threads*
added into play. Just for your information (it is probably
off-topic here; at least in this thread):

http://www.opengroup.org/austin/aardvark/finaltext/xbdbug.txt
(see "Defect in XBD 4.10 Memory Synchronization (rdvk#  26)")

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 01:48:29 GMT
Raw View
Witless wrote:
....
> > But what does this mean for char?? I was always under the assumption
> > that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
>
> Your assumption is invalid.

Not with respect to the C++ standard. Section 5.3.3p1 says "The sizeof
operator yields the number of bytes in the object representation of its
operand. ... sizeof(char), sizeof(signed char), and sizeof(unsigned
char) are 1."

> > especially since there is no C++ "byte" type. As we now have the
> > wchar_t as an intrinsic data type, wouldn't this cement the fact that
> > char is always 1 byte?
>
> No.
>
> The type char can be 16 bits like Unicode or even 32 bits like the ISO
> character sets.

Yes, but under the C++ standard, that simply means that a "byte" will
become 16 or 32 bits, respectively. That's what the CHAR_BITS macro is
for.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Tue, 16 Apr 2002 01:48:40 GMT
Raw View
"Carl Daniel" <cpdaniel@pacbell.net> writes:

> sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
> unit.  All other types must have sizes which are multiples of sizeof(char).
> The standard makes no claim that 1 memory allocation unit == 1 byte.

It certainly does: 1.7, [intro.memory]/1:

# The fundamental storage unit in the C++ memory model is the byte. A
# byte is at least large enough to con-tain any member of the basic
# execution character set and is composed of a contiguous sequence of
# bits, the number of which is implementation-defined.

> On a system with a 16-bit "natural character", sizeof(char) and
> sizeof(wchar_t) might both be 1, and sizeof(int), though it's 32
> bits, would be 2 not 4.

On such a system, a byte would have 16 bits.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 01:48:45 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Carl Daniel wrote:
| ....
| > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
| > unit.  All other types must have sizes which are multiples of sizeof(char).
| > The standard makes no claim that 1 memory allocation unit == 1 byte.  On a
|
| Section 5.3.3: "The sizeof operator yields the number of bytes in the
| object representation of its operand."

Exact.  The question is what you think "byte" means in the C++
standards text.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Witless <witless@attbi.com>
Date: Tue, 16 Apr 2002 01:48:56 GMT
Raw View
"Martin v. L=F6wis" wrote:

> James Kanze <kanze@gabi-soft.de> writes:
>
> > Excuse me, but on 32 bit machines (at least the ones I've seen), DWOR=
D
> > is 64 bits.
>
> I guess you have not seen Microsoft Windows, then. Just try

Microsoft(R) Windows(!tm) is not based on 32 bits but on 16 bits.

>
>
> #include <windows.h>
> #include <stdio.h>
>
> int main()
> {
>   printf("%d\n", sizeof(DWORD));
> }
>
> in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.

.... which is consistent with its ancestry.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Mike Wahler" <mkwahler@ix.netcom.com>
Date: Tue, 16 Apr 2002 01:49:14 GMT
Raw View
Carl Daniel <cpdaniel@pacbell.net> wrote in message
news:EUlu8.2504$4S6.1564155140@newssvr21.news.prodigy.com...
> "James Kuyper Jr." <kuyper@wizard.net> wrote in message
> news:3CB9CE84.7F4267B2@wizard.net...
> > Carl Daniel wrote:
> > ....
> > > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory
allocation
> > > unit.  All other types must have sizes which are multiples of
> sizeof(char).
> > > The standard makes no claim that 1 memory allocation unit == 1 byte.
On
> a
> >
> > Section 5.3.3: "The sizeof operator yields the number of bytes in the
> > object representation of its operand."
>
> I hadn't looked at that section before this morning.  I'm surprised they
> worded it that way, since it's patently false

The standard is what *defines* these issues.  No
way can it be 'false'.

>given the most common meaning
> of 'byte' (8 bits).

Irrelevant.  size of a byte is only required to be
*at least* eight bits, but is allowed to be larger.


>It would have helped if the standard actually defined
> the word byte,

It does.  Smallest addressible storage unit.
The system you describe above's 1-bit addressible
unit does not meet the requirement of at least
eight bits.  So from C++, smallest addressible
unit for that machine is whichever larger unit
with at least eight bits is addressible.  Perhaps
a 'word'.

> or simply not used it at all.

There has to be defined some point of reference.
It's a byte.

>As is, the section is
> confusing at best.

Yes, "IOS-ese" takes a while to understand.

>
> And yes, I realize that in the past 'byte' was used more flexibly, with
> 'bytes' being 6, 7, 8, 9, 10, 12, and even 15 bits on various systems.

It still is.

> Surely today, and as surely in 1998, most readers think "8 bits" when they
> see the word "byte".

And they're wrong.  "Eight bits" == "octet".

-Mike




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Tue, 16 Apr 2002 01:49:27 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  "James Kuyper Jr." <kuyper@wizard.net> writes:

|>  [...]

|>  | Bjarne's statement is technically incorrect, but true to the histor=
y of
|>  | C, when he identifies "char" more closely with "character" than wit=
h
|>               ^^^^^^^^^^
|>  | "byte".

|>  Firstly, note that B. Stroustrup didn't *identify* "char" with
|>  "character"; rather, I quote (from the original poster):

|>    "A char variable is of the natural size to hold a character on a
|>    given machine (typically a byte)

|>  Secondly, it has been the tradition that 'char', in C++, is the
|>  natural type for holding characters, as examplified by the standard
|>  type std::string and the standard narrow streams.

For you and me, maybe, but one could argue that we are being
anchronistic.  For most modern applications which deal with text, I
suspect that there is no natural type for holding characters -- wchar_t
comes close, but there are systems where it is not sufficiently large to
hold an ISO 10646 character.  And it is rarely well supported.  The
result is that if I had to write an application dealing with text, I'd
probably end up defining my own character type (which might be a typedef
to wchar_t, if portability weren't a real concerne -- all of the
machines I normally deal with define wchar_t as a 32 bit type).

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 01:49:53 GMT
Raw View
Alexander Terekhov wrote:
....
> "IA-32 Intel=AE Architecture
>  Software Developer?s
>  Manual
>  Volume 1:
>  Basic Architecture
>  ....
>  4.1. FUNDAMENTAL DATA TYPES
>=20
>  The fundamental data types of the IA-32 architecture are bytes,
>  words, doublewords, quadwords, and double quadwords (see Figure
>  4-1). A byte is eight bits, a word is 2 bytes (16 bits), a
>  doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits),
>  and a double quadword is 16 bytes (128 bits). "

Not everyone uses the term correctly, not even (apparantly) Intel. I'll
consider an ISO specification more authoritative than a company
specification any day (though it could still be wrong). I'd like to
know; do any of the other ISO standards define the "byte"? If so, which
definition do they use?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 01:50:17 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:

| Jack Klein <jackklein@spamcop.net> writes:
|
| |>  There are now C++ compilers for 32 bit digital signal processors
| |>  where char, short, int and long are all 1 byte and share the same
| |>  representation.  Each of those bytes contains 32 bits.
|
| A slightly different issue, but I believe that most, if not all of these
| are freestanding implementations.  There is some question whether int
| and char can be the same size on a hosted implementation, since
| functions like fgetc (inherited from C) must return a value in the range
| [0...UCHAR] or EOF which must be negative.

Or you may just look at the requirements imposed by the standard
std::string class in clause 21.

A conforming hosted implementation cannot have

  values_set(char) == values_set(int)

because every bit in a char representation participate to a value
representation, i.e. all bits in a char are meaningful.


--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 01:50:28 GMT
Raw View
James Kanze wrote:
>
> Jack Klein <jackklein@spamcop.net> writes:
>
> |>  There are now C++ compilers for 32 bit digital signal processors
> |>  where char, short, int and long are all 1 byte and share the same
> |>  representation.  Each of those bytes contains 32 bits.
>
> A slightly different issue, but I believe that most, if not all of these
> are freestanding implementations.  There is some question whether int
> and char can be the same size on a hosted implementation, since
> functions like fgetc (inherited from C) must return a value in the range
> [0...UCHAR] or EOF which must be negative.

However, since there are other ways of detecting file errors and end of
file, than checking for EOF, that doesn't absolutely require that EOF be
outside the range of char values. In fact, I gather that the consensus
of the C committee has been that it doesn't, though I couldn't find any
currently listed DR on the issue - however, the place I searched only
goes back to DR 201.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Carl Daniel" <cpdaniel@pacbell.net>
Date: Tue, 16 Apr 2002 01:50:40 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3CBA0DB6.3E95216B@wizard.net...
> Carl Daniel wrote:
> ....
> > of 'byte' (8 bits).  It would have helped if the standard actually
defined
> > the word byte, or simply not used it at all.  As is, the section is
>
> It does define it, in section 1.7p1: "The fundamental storage unit in
> the C++ memory model is the _byte_.

I discovered that definition in 1.7 too late - too bad it isn't mentioned in
the index or cross-referenced in 5.3.3.  I still maintain that it was a poor
choice of word, since the definition the standard uses (and gives) is not
the common one (these days).

-cd

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Pete Becker <petebecker@acm.org>
Date: Tue, 16 Apr 2002 01:50:44 GMT
Raw View
"James Kuyper Jr." wrote:
>
> There's a 16 bit variable of it. While I don't use either version of it
> in any of my own programs, from what I'd heard here and on comp.std.c,
> I'd gotten the impression that the 16 bit variant was more widely used
> than the 32 bit version. Yours is the first mention I've ever seen of a
> 21 bit version - or are you specifying the number of bits actually used
> by the 32 bit version?
>

Unicode 2.0 had 40-some-odd thousand characters, so a 16-bit variable
could hold all possible values. Unicode 3.0 has over 90,000 characters,
so a 16-bit variable doesn't work. There's a UTF-16 encoding, but that's
analogous to a multi-byte character string in C and C++: a pain to work
with. Java apologists will tell you that this is no big deal, because
the characters that require two 16-bit values are rarely used. But then,
they're stuck with Java's choice of 16 bits for character types, so
naturally they claim that it doesn't really matter.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 01:51:27 GMT
Raw View
James Kanze wrote:
....
> I presume that there are some reasons of backwards compatibility.
> Although I'll admit that I don't see what something like DWORD is doing
> in a C++, or even a C, interface. Somebody must have seriously muffed
> the design, a long time ago.

It is a Microsoft design, after all. :-)


======================================= MODERATOR'S COMMENT:
 I almost bounced this as a flame...

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 15 Apr 2002 22:14:12 CST
Raw View
Gabriel Dos Reis wrote:
....
> Or you may just look at the requirements imposed by the standard
> std::string class in clause 21.

Clause 21 is very large and complicated; it would help if you could be
more specific about what you're referring to.

> A conforming hosted implementation cannot have
>
>   values_set(char) == values_set(int)

I'm not clear what you're saying. The standard doesn't define anything
called values_set(), so I presume you're just using it as convenient
shorthand for the set of valid values for the type. However, I can't see
any reason why that would be prohibited.

> because every bit in a char representation participate to a value
> representation, i.e. all bits in a char are meaningful

I don't see how that requirement would be violated by an implementation
which had CHAR_MIN==INT_MIN, and CHAR_MAX==INT_MAX.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Tue, 16 Apr 2002 07:54:14 GMT
Raw View
The following applies to C++ in the context of potential issues to be aware
of, moving toward adding formal threading capability to the language.

"Alexander Terekhov" <terekhov@web.de> wrote in message
news:3CBAD3A5.99E4BD21@web.de...
>
> Well, things are getting much more interesting with *threads*
> added into play. Just for your information (it is probably
> off-topic here; at least in this thread):
>
> http://www.opengroup.org/austin/aardvark/finaltext/xbdbug.txt
> (see "Defect in XBD 4.10 Memory Synchronization (rdvk#  26)")

==** BEGIN PASTE **==
 Problem:

 Defect code :  3. Clarification required

 dvv@dvv.ru (Dima Volodin) wrote:
 ....
 The standard doesn't provide any definition on memory location [POSIX is
 a C API, so it must be done in C terms?]. Also, as per standard C rules,
 access to one memory location [byte?] shouldn't have any effect on a
 different memory location. POSIX doesn't seem to address this issue, so
 the assumption is that the usual C rules apply to multi-threaded
 programs. On the other hand, the established industry practices are such
 that there is no guarantee of integrity of certain memory locations when
 modification of some "closely residing" memory locations is performed.
 The standard either has to clarify that access to distinct memory
 locations doesn't have to be locked [which, I hope, we all understand,
 is not a feasible solution] or incorporate current practices in its
 wording providing users with means to guarantee data integrity of
 distinct memory locations. "Please advise."

 ---

 http://groups.google.com/groups?hl=en&selm=3B0CEA34.845E7AFF%40compaq.com

 Dave Butenhof (David.Butenhof@compaq.com) wrote:
 ....
 POSIX says you cannot have multiple threads using "a memory location"
 without explicit synchronization. POSIX does not claim to know, nor
 try to specify, what constitutes "a memory location" or access to it,
 across all possible system architectures. On systems that don't use
 atomic byte access instructions, your program is in violation of the
 rules.

==**END PASTE**==

I don't like that answer, as it seems it would be near impossible to write
portable code without some common notion of atomically updatable memory
location. But isn't this actually what type sigatomic_t (sizeof >= 1) is
intended for?

hys

--
Hillel Y. Sims
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Tue, 16 Apr 2002 11:26:28 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  James Kanze wrote:

|>  > Jack Klein <jackklein@spamcop.net> writes:

|>  > |>  There are now C++ compilers for 32 bit digital signal
|>  > |>  processors where char, short, int and long are all 1 byte and
|>  > |>  share the same representation.  Each of those bytes contains
|>  > |>  32 bits.

|>  > A slightly different issue, but I believe that most, if not all of
|>  > these are freestanding implementations.  There is some question
|>  > whether int and char can be the same size on a hosted
|>  > implementation, since functions like fgetc (inherited from C) must
|>  > return a value in the range [0...UCHAR] or EOF which must be
|>  > negative.

|>  However, since there are other ways of detecting file errors and end
|>  of file, than checking for EOF, that doesn't absolutely require that
|>  EOF be outside the range of char values. In fact, I gather that the
|>  consensus of the C committee has been that it doesn't, though I
|>  couldn't find any currently listed DR on the issue - however, the
|>  place I searched only goes back to DR 201.

The C standard definitly requires that all characters in the basic
character set be positive, and the EOF be negative.

The open issue is, I think, whether fgetc is required to be able to
return *all* values in the range of 0...UCHAR.  For actual characters,
this is not a problem -- if we have 32 bit char's, it is certain that
some of the values will not be used as a character.  (ISO 10646, for
example, only uses values in the range 0...0x10FFFF.)  But fgetc can
also be used to read raw "bytes"; what happens then?

What I suspect is that on an implementation using 32 bit char's, fgetc
in fact will return something in the range 0...255, or -1 for EOF.
IMHO, this should be a legal implementation, however, I don't think that
the current C standard is unambiguously clear that this is the case.=20

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Tue, 16 Apr 2002 11:26:38 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  James Kanze <kanze@gabi-soft.de> writes:

|>  | Jack Klein <jackklein@spamcop.net> writes:

|>  | |>  There are now C++ compilers for 32 bit digital signal
|>  | |>  processors where char, short, int and long are all 1 byte and
|>  | |>  share the same representation.  Each of those bytes contains
|>  | |>  32 bits.

|>  | A slightly different issue, but I believe that most, if not all of
|>  | these are freestanding implementations.  There is some question
|>  | whether int and char can be the same size on a hosted
|>  | implementation, since functions like fgetc (inherited from C) must
|>  | return a value in the range [0...UCHAR] or EOF which must be
|>  | negative.

|>  Or you may just look at the requirements imposed by the standard
|>  std::string class in clause 21.

The advantage of basing the argument on fgetc is that it becomes a C
problem as well, and not something specific to C++.

|>  A conforming hosted implementation cannot have=20

|>    values_set(char) =3D=3D values_set(int)

|>  because every bit in a char representation participate to a value
|>  representation, i.e. all bits in a char are meaningful.

Where does it say this.  (Section 21 is large.)

What are the implications for an implementation which wants to support
ISO 10646 on a 32 bit machine?  The smallest type it can declare which
supports ISO 10646 is 32 bits.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 12:02:36 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| ....
| > Or you may just look at the requirements imposed by the standard
| > std::string class in clause 21.
|
| Clause 21 is very large and complicated;

Not that complicated; it suffices to look at the first two pages.

21.1.2/2
  For a certain character container type char_type, a related
  container type INT_T shall be a type or class which can represent
  all of the valid characters converted from the corresponding
  char_type values, as well as an end-of-file value, eof(). The type
  int_type represents a charac-ter container type which can hold
  end-of-file to be used as a return type of the iostream class member
  functions.

The case in interest is when char_type == char and int_type == int.

Now, look at the table 37 (Traits requirements)


  X::eof() yields: a value e such that X::eq_int_type(e,X::to_int_type(c))
  is false for all values c.

(by 21.1.1/1, c is of type char).

The standard also says that any bit pattern for char represents a
valid char value, therefore eof() can't be in the values-set of char.

[...]

| I don't see how that requirement would be violated by an implementation
| which had CHAR_MIN==INT_MIN, and CHAR_MAX==INT_MAX.

See above.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 12:07:37 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:


[...]

| The open issue is, I think, whether fgetc is required to be able to
| return *all* values in the range of 0...UCHAR.  For actual characters,
| this is not a problem -- if we have 32 bit char's, it is certain that
| some of the values will not be used as a character.

That won't be conforming, since the standard says that any bit pattern
represent a valid value.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 13:11:37 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
....
> Now, look at the table 37 (Traits requirements)
>
>   X::eof() yields: a value e such that X::eq_int_type(e,X::to_int_type(c))
>   is false for all values c.
>
> | I don't see how that requirement would be violated by an implementation
> | which had CHAR_MIN==INT_MIN, and CHAR_MAX==INT_MAX.
>
> See above.

In itself, that would merely mean that 'int' can't be the int_type for
'char'. The clincher is that 21.1.3.1 explicitly specifies that int_type
for char_traits<char> must be 'int'. Therefore, I concede your point.

Someone was keeping a list of C/C++ differences - this should be added
to that list; C makes no such guarantee.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 13:11:42 GMT
Raw View
Gabriel Dos Reis wrote:
>
> James Kanze <kanze@gabi-soft.de> writes:
>
> [...]
>
> | The open issue is, I think, whether fgetc is required to be able to
> | return *all* values in the range of 0...UCHAR.  For actual characters,
> | this is not a problem -- if we have 32 bit char's, it is certain that
> | some of the values will not be used as a character.
>
> That won't be conforming, since the standard says that any bit pattern
> represent a valid value.

It explicitly restricts that guarantee to unsigned char; char is allowed
to be signed.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Ron Natalie <ron@sensor.com>
Date: Tue, 16 Apr 2002 14:44:16 GMT
Raw View

Witless wrote:
>=20
> "Martin v. L=F6wis" wrote:
>=20
> > James Kanze <kanze@gabi-soft.de> writes:
> >
> > > Excuse me, but on 32 bit machines (at least the ones I've seen), DW=
ORD
> > > is 64 bits.
> >
> > I guess you have not seen Microsoft Windows, then. Just try
>=20
> Microsoft(R) Windows(!tm) is not based on 32 bits but on 16 bits.
>=20

The 32 bit windows (95 and later, NT and later) are 32 bit enviornments
as far as the app is concerned.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Tue, 16 Apr 2002 14:55:45 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

|>  > The most frequent situation, I think, is 8 bit char's and 32 bit
|>  > wchar_t's.  Anything less than about 21 bits for a wchar_t pretty
|>  > much makes them relatively useless, since the only widespread code
|>  > set with more than 8 bits is ISO 10646/Unicode, which requires 21
|>  > bits.  (But of course, the standard doesn't require wchar_t -- or
|>  > anything else, for

|>  There's a 16 bit variable of it. While I don't use either version of
|>  it in any of my own programs, from what I'd heard here and on
|>  comp.std.c, I'd gotten the impression that the 16 bit variant was
|>  more widely used than the 32 bit version. Yours is the first mention
|>  I've ever seen of a 21 bit version - or are you specifying the
|>  number of bits actually used by the 32 bit version?

The code set occupies values in the range 0...0x10FFFF.  That requires
21 bits.

The standard specifies several ways to represent the code set.  The most
natural way (the only one which doesn't involve multi-something
encodings) uses 32 bit values, with the code on the lower bits, and the
upper bits 0.  There are also variants with 8 and 16 bits; these do NOT
offer the full 32 bits.

Of the machines I've seen (and can remember), wchar_t is most often 32
bits.  In fact, the only exception seems to be Windows; all of the
Unixes I can remember (Linux, Solaris, AIX -- and I think HP/UX, but my
memory is a bit weak there) have 32 bits.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Ron Natalie <ron@sensor.com>
Date: Tue, 16 Apr 2002 15:15:16 GMT
Raw View

"James Kuyper Jr." wrote:
>
>
> Yes, but under the C++ standard, that simply means that a "byte" will
> become 16 or 32 bits, respectively. That's what the CHAR_BITS macro is
> for.
>

The problem is that just isn't practical in most cases.  Remember that
char plays double duty both as the native character type and the fundamental
memory unit.  Yes, you could have 16 bit chars, but you lose the ability
to address 8 bit sized memory for all practical purposes.  You're more
or less doomed (as Windows Nt does) to use wchar_t's.  The only sad thing
is that C++ doesn't define wchar_t interfaces to everything.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 11:24:46 CST
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| >
| > "James Kuyper Jr." <kuyper@wizard.net> writes:
| ....
| > Now, look at the table 37 (Traits requirements)
| >
| >   X::eof() yields: a value e such that X::eq_int_type(e,X::to_int_type(c))
| >   is false for all values c.
| >
| > | I don't see how that requirement would be violated by an implementation
| > | which had CHAR_MIN==INT_MIN, and CHAR_MAX==INT_MAX.
| >
| > See above.
|
| In itself, that would merely mean that 'int' can't be the int_type for
| 'char'. The clincher is that 21.1.3.1 explicitly specifies that int_type
| for char_traits<char> must be 'int'. Therefore, I concede your point.
|
| Someone was keeping a list of C/C++ differences - this should be added
| to that list; C makes no such guarantee.

Which "no such guarantee"?

-- Gaby

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alexander Terekhov <terekhov@web.de>
Date: Tue, 16 Apr 2002 16:25:32 GMT
Raw View
"Hillel Y. Sims" wrote:
>
> The following applies to C++ in the context of potential issues to be aware
> of, moving toward adding formal threading capability to the language.
[...]
> I don't like that answer, as it seems it would be near impossible to write
> portable code without some common notion of atomically updatable memory
> location.

Well, the real "problem" here is also known as "word-tearing"
(and there is also somewhat similar/related performance problem
of "false-sharing").

There was even comp.std.c thread on this in the past:

http://groups.google.com/groups?threadm=3B54AB12.7F555834%40dvv.org
(with GRANULARIZE(X) macros, etc ;-))

Personally, I just love this topic! ;-) My view/opinion on this:

http://groups.google.com/groups?as_umsgid=3C3F0C77.CFF9CADC%40web.de
http://groups.google.com/groups?as_umsgid=3C428BC0.1D5F2D90%40web.de

> But isn't this actually what type sigatomic_t (sizeof >= 1) is
> intended for?

AFAICT, "Nope":
http://groups.google.com/groups?as_umsgid=3B02A7A4.C6FEDC23%40dvv.org

*static volatile sig_atomic_t* vars could only help/work for *single-
threaded* asynchrony w.r.t access to volatile sig_atomic_t STATIC
object(s) in the thread itself and its interrupt/async.signal
handler(s);

BTW, I've "collected" some C/C++ "volatile"/sig_atomic_t stuff
in the following article and the "Hardware port" thread:

http://groups.google.com/groups?as_umsgid=3CB1EE1D.5671E923%40web.de
http://groups.google.com/groups?threadm=a8hgtr%24euj%241%40news.hccnet.nl

Also, FYI w.r.t. C/C++ volatiles and threads (I mean "atomicity" and
"visibility", etc):

http://groups.google.com/groups?as_umsgid=L9JR7.478%24BK1.14104%40news.cpqcorp.net

And, finally, FYI on memory "granularity":

http://www.tru64unix.compaq.com/docs/base_doc/DOCUMENTATION/V51_HTML/ARH9RBTE/DOCU0007.HTM#gran_sec

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: David Schwartz <davids@webmaster.com>
Date: Tue, 16 Apr 2002 18:07:14 GMT
Raw View
"James Kuyper Jr." wrote:

> >  4.1. FUNDAMENTAL DATA TYPES

> >  The fundamental data types of the IA-32 architecture are bytes,
> >  words, doublewords, quadwords, and double quadwords (see Figure
> >  4-1). A byte is eight bits, a word is 2 bytes (16 bits), a
> >  doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits),
> >  and a double quadword is 16 bytes (128 bits). "

> Not everyone uses the term correctly, not even (apparantly) Intel.

 What is wrong with Intel's usage? If a byte means "an 8-bit quantity",
then they're right. If a byte means "the smallest addressable unit of
storage on a particular architecture", then they are still right. What
definition of "byte" makes Intel's usage incorrect?

 DS

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Tue, 16 Apr 2002 18:07:42 GMT
Raw View
Here's what I'm taking away from all of this:

   8-bits <= 1 C++ byte <= 'natural-word'-bits
(where # of bits in a "natural-word" and actual # of bits used for the
"byte" are platform-specific)
("C++ byte" not necessarily equivalent to platform-specific "byte")

It could theoretically be 8, 9, 10, 11, 12, ... 16, ... 32, ... 64, or even
maybe 128 bits on some current graphics processors (guessing), or anything
inbetween too (theoretically). It makes sense even; there are some machines
(DSPs) where "char" (as in character, as in human-readable text) is not a
very heavily used concept vs efficient 32-bit numerical processing, so they
just define 'char' (1 byte!) to refer to the full 32-bits of machine storage
for efficiency (otherwise they'd probably have to do all sorts of bit
masking arithmetic).

"Mike Wahler" <mkwahler@ix.netcom.com> wrote in message
news:a9cvlo$lj5$1@slb2.atl.mindspring.net...
> It does.  Smallest addressible storage unit.
> The system you describe above's 1-bit addressible
> unit does not meet the requirement of at least
> eight bits.  So from C++, smallest addressible
> unit for that machine is whichever larger unit
> with at least eight bits is addressible.  Perhaps
> a 'word'.
>
> > or simply not used it at all.
>
> There has to be defined some point of reference.
> It's a byte.
>

>
> > Surely today, and as surely in 1998, most readers think "8 bits" when
they
> > see the word "byte".
>

Well not anymore! 1 C++ Byte >= 8-bits! :-)

hys

--
Hillel Y. Sims
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 18:07:59 GMT
Raw View
Pete Becker wrote:
>
> "James Kuyper Jr." wrote:
> >
> > There's a 16 bit variable of it. While I don't use either version of it

Sorry, that got garbled due to lack of sleep. I meant "16 bit version";
"16 bit encoding" would have been even better, but I didn't even think
of that wording.

> > in any of my own programs, from what I'd heard here and on comp.std.c,
> > I'd gotten the impression that the 16 bit variant was more widely used
> > than the 32 bit version. Yours is the first mention I've ever seen of a
> > 21 bit version - or are you specifying the number of bits actually used
> > by the 32 bit version?
> >
>
> Unicode 2.0 had 40-some-odd thousand characters, so a 16-bit variable
> could hold all possible values. Unicode 3.0 has over 90,000 characters,
> so a 16-bit variable doesn't work. There's a UTF-16 encoding, but that's
> analogous to a multi-byte character string in C and C++: a pain to work
> with. Java apologists will tell you that this is no big deal, because
> the characters that require two 16-bit values are rarely used.

I suspect they're correct. A fairly large portion of the C++ world can
get away with 8-bit characters, an even larger portion will never need
to go beyond 16-bits. Of course, the people who need the larger
characters, need them all the time.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 18:08:41 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:

| Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:
|
| |>  James Kanze <kanze@gabi-soft.de> writes:
|
| |>  | Jack Klein <jackklein@spamcop.net> writes:
|
| |>  | |>  There are now C++ compilers for 32 bit digital signal
| |>  | |>  processors where char, short, int and long are all 1 byte and
| |>  | |>  share the same representation.  Each of those bytes contains
| |>  | |>  32 bits.
|
| |>  | A slightly different issue, but I believe that most, if not all of
| |>  | these are freestanding implementations.  There is some question
| |>  | whether int and char can be the same size on a hosted
| |>  | implementation, since functions like fgetc (inherited from C) must
| |>  | return a value in the range [0...UCHAR] or EOF which must be
| |>  | negative.
|
| |>  Or you may just look at the requirements imposed by the standard
| |>  std::string class in clause 21.
|
| The advantage of basing the argument on fgetc is that it becomes a C
| problem as well, and not something specific to C++.

Yeah, a clever way of getting rid of the problem ;-)
By, the wording concerning some functions in <ctype.h> I gather that
EOF cannot be a valid 'unsigned char' converted to int.

| |>  A conforming hosted implementation cannot have
|
| |>    values_set(char) == values_set(int)
|
| |>  because every bit in a char representation participate to a value
| |>  representation, i.e. all bits in a char are meaningful.
|
| Where does it say this.  (Section 21 is large.)

Table 37 says that char_traits<char>::eof() -- identical to EOF --
should return a value not equal to any char value (converted to int).

| What are the implications for an implementation which wants to support
| ISO 10646 on a 32 bit machine?  The smallest type it can declare which
| supports ISO 10646 is 32 bits.

Then it must make sure values-set(char) is a strict subset of
values-set(int) (for example having a 64-bit int).  Or it doesn't ;-)

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 16 Apr 2002 18:08:03 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> | Carl Daniel wrote:
> | ....
> | > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
> | > unit.  All other types must have sizes which are multiples of sizeof(char).
> | > The standard makes no claim that 1 memory allocation unit == 1 byte.  On a
> |
> | Section 5.3.3: "The sizeof operator yields the number of bytes in the
> | object representation of its operand."
>
> Exact.  The question is what you think "byte" means in the C++
> standards text.

See 1.7p1.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 18:08:21 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| James Kanze wrote:
| >
| > Jack Klein <jackklein@spamcop.net> writes:
| >
| > |>  There are now C++ compilers for 32 bit digital signal processors
| > |>  where char, short, int and long are all 1 byte and share the same
| > |>  representation.  Each of those bytes contains 32 bits.
| >
| > A slightly different issue, but I believe that most, if not all of these
| > are freestanding implementations.  There is some question whether int
| > and char can be the same size on a hosted implementation, since
| > functions like fgetc (inherited from C) must return a value in the range
| > [0...UCHAR] or EOF which must be negative.
|
| However, since there are other ways of detecting file errors and end of
| file, than checking for EOF, that doesn't absolutely require that EOF be
| outside the range of char values.

Huh?!?  The C++ standard requires that all bits in a char participate
in a char value representation.  And EOF is not a character.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Ray Lischner <dontspamme@spam.you>
Date: Tue, 16 Apr 2002 18:09:36 GMT
Raw View
On Monday 15 April 2002 09:14 pm, James Kuyper Jr. wrote:

> Clause 21 is very large and complicated; it would help if you could be
> more specific about what you're referring to.

He probably means 21.1.2 [lib.char.traits.typedefs], which states that
character traits must have a type or class (int_type) that can represent
all the valid characters converted from the character type, plus an
end-of-file value. It does not state, however, that this type must be
"int".
--
Ray Lischner, author of C++ in a Nutshell (forthcoming, Q4 2002)
http://www.tempest-sw.com/cpp/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Pete Becker <petebecker@acm.org>
Date: Tue, 16 Apr 2002 18:51:23 GMT
Raw View
"James Kuyper Jr." wrote:
>
> Pete Becker wrote:
> >
> > Unicode 2.0 had 40-some-odd thousand characters, so a 16-bit variable
> > could hold all possible values. Unicode 3.0 has over 90,000 characters,
> > so a 16-bit variable doesn't work. There's a UTF-16 encoding, but that's
> > analogous to a multi-byte character string in C and C++: a pain to work
> > with. Java apologists will tell you that this is no big deal, because
> > the characters that require two 16-bit values are rarely used.
>
> I suspect they're correct. A fairly large portion of the C++ world can
> get away with 8-bit characters, an even larger portion will never need
> to go beyond 16-bits. Of course, the people who need the larger
> characters, need them all the time.
>

If you need 32-bit characters and the language you're using doesn't
support them you've got a problem, regardless of the rationalizing that
language designers may engage in.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 18:51:40 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

| Gabriel Dos Reis wrote:
| >
| > "James Kuyper Jr." <kuyper@wizard.net> writes:
| >
| > | Carl Daniel wrote:
| > | ....
| > | > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
| > | > unit.  All other types must have sizes which are multiples of sizeof(char).
| > | > The standard makes no claim that 1 memory allocation unit == 1 byte.  On a
| > |
| > | Section 5.3.3: "The sizeof operator yields the number of bytes in the
| > | object representation of its operand."
| >
| > Exact.  The question is what you think "byte" means in the C++
| > standards text.
|
| See 1.7p1.

I know that paragraph very well.  Thanks.  But from your assertions,
it wasn't clear that you knew of that paragraph.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: NotFound <correo@tengo.no>
Date: Tue, 16 Apr 2002 18:57:26 GMT
Raw View
> Not everyone uses the term correctly, not even (apparantly) Intel. I'll

What is the correct use? In others contexts a word is the minimun
addressable unit, then a word on all the x86 family will be an octet.

> consider an ISO specification more authoritative than a company
> specification any day (though it could still be wrong).

ISO has no authority do define the universal mean of a word. No more
than intel, that is, they can define the mean it has in your documents.

Regards.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Tue, 16 Apr 2002 18:59:39 GMT
Raw View
Ray Lischner <dontspamme@spam.you> writes:

| On Monday 15 April 2002 09:14 pm, James Kuyper Jr. wrote:
|
| > Clause 21 is very large and complicated; it would help if you could be
| > more specific about what you're referring to.
|
| He probably means 21.1.2 [lib.char.traits.typedefs], which states that
| character traits must have a type or class (int_type) that can represent
| all the valid characters converted from the character type, plus an
| end-of-file value. It does not state, however, that this type must be
| "int".

Thanks.

I would add that the standard says that when char_type is char then
int_type must be int.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Wed, 17 Apr 2002 16:59:25 GMT
Raw View
On Tue, 16 Apr 2002 01:49:53 GMT, "James Kuyper Jr."
<kuyper@wizard.net> wrote:

>Alexander Terekhov wrote:
>....
>> "IA-32 Intel=AE Architecture
>>  Software Developer?s
>>  Manual
>>  Volume 1:
>>  Basic Architecture
>>  ....
>>  4.1. FUNDAMENTAL DATA TYPES
>>=20
>>  The fundamental data types of the IA-32 architecture are bytes,
>>  words, doublewords, quadwords, and double quadwords (see Figure
>>  4-1). A byte is eight bits, a word is 2 bytes (16 bits), a
>>  doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits),
>>  and a double quadword is 16 bytes (128 bits). "
>
>Not everyone uses the term correctly, not even (apparantly) Intel. I'll
>consider an ISO specification more authoritative than a company
>specification any day (though it could still be wrong).

But if such a document doesn't exist, who uses the term "correctly"?
There are many meanings of the same term, and each one is "correct" as
long as it is consistent. A "byte" can be an IA-32 data type, an IDL
type, a C/C++ storage unit, and many other things (yes, different
things, not only different sizes, since I can't identify a data type
with a storage unit)

The above quote would be wrong if it pretented to be a general
definition, but it's ok if (as I believe) it is intended as it was
"Within this specification, the term byte refers to...". In this
respect it's no way different from what the C++ standard does.

Note that I'm not saying that this de facto overloading of the term
(as well as of the terms "word", "dword" and others) isn't annoying.
It is! :)


Genny.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 17 Apr 2002 11:59:30 CST
Raw View
Ron Natalie wrote:
>
> "James Kuyper Jr." wrote:
> >
> >
> > Yes, but under the C++ standard, that simply means that a "byte" will
> > become 16 or 32 bits, respectively. That's what the CHAR_BITS macro is
> > for.
> >
>
> The problem is that just isn't practical in most cases.  Remember that

Practicality is an issue for the implementation to worry about. As long
as the standard allows each implementation enough freedom to choose
practical values for those type sizes, it's done its job. If one
implementor decides that the most practical thing for their market is a
16-bit char, that's permitted. If another decides that the most
practical thing for their market is an 8-bit char and a 16-bit wchar_t,
that's permitted. The two implmentations might be targeting different
markets, or one of them might be mistaken, but the C++ standard has been
designed to let each of them be conforming, and code that needs to be
portable will be designed to work correctly in either case (which can be
a highly non-trivial excercise in many cases).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 17 Apr 2002 16:59:37 GMT
Raw View
NotFound wrote:
>
> > Not everyone uses the term correctly, not even (apparantly) Intel. I'll
>
> What is the correct use? In others contexts a word is the minimun
> addressable unit, then a word on all the x86 family will be an octet.

The definition I'm familiar with can be paraphrased by saying that if
it's correctly described as a 32-bit machine, then the word size is 32
bits.

> > consider an ISO specification more authoritative than a company
> > specification any day (though it could still be wrong).
>
> ISO has no authority do define the universal mean of a word. No more

No one has that authority. However, ISO does have the authority to
define the usage within ISO documents, and the usage by anyone who cares
about ISO standards. Which includes me.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 17 Apr 2002 16:59:45 GMT
Raw View
Gabriel Dos Reis wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> writes:
>
> | Gabriel Dos Reis wrote:
> | >
> | > "James Kuyper Jr." <kuyper@wizard.net> writes:
> | >
> | > | Carl Daniel wrote:
> | > | ....
> | > | > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
> | > | > unit.  All other types must have sizes which are multiples of sizeof(char).
> | > | > The standard makes no claim that 1 memory allocation unit == 1 byte.  On a
> | > |
> | > | Section 5.3.3: "The sizeof operator yields the number of bytes in the
> | > | object representation of its operand."
> | >
> | > Exact.  The question is what you think "byte" means in the C++
> | > standards text.
> |
> | See 1.7p1.
>
> I know that paragraph very well.  Thanks.  But from your assertions,
> it wasn't clear that you knew of that paragraph.

I'm not sure why. Could you be a little less laconic, and explain what
your point is?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 17 Apr 2002 16:59:53 GMT
Raw View
Pete Becker wrote:
>
> "James Kuyper Jr." wrote:
> >
> > Pete Becker wrote:
> > >
> > > Unicode 2.0 had 40-some-odd thousand characters, so a 16-bit variable
> > > could hold all possible values. Unicode 3.0 has over 90,000 characters,
> > > so a 16-bit variable doesn't work. There's a UTF-16 encoding, but that's
> > > analogous to a multi-byte character string in C and C++: a pain to work
> > > with. Java apologists will tell you that this is no big deal, because
> > > the characters that require two 16-bit values are rarely used.
> >
> > I suspect they're correct. A fairly large portion of the C++ world can
> > get away with 8-bit characters, an even larger portion will never need
> > to go beyond 16-bits. Of course, the people who need the larger
> > characters, need them all the time.
> >
>
> If you need 32-bit characters and the language you're using doesn't
> support them you've got a problem, regardless of the rationalizing that
> language designers may engage in.

I'm not very familiar with Java; I got the impression from what you said
earlier that they did support the larger range of characters, they just
supported them inconveniently, using a multi-byte encoding.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Wed, 17 Apr 2002 17:00:10 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  James Kanze <kanze@gabi-soft.de> writes:

|>  | Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  | |>  James Kanze <kanze@gabi-soft.de> writes:

|>  | |>  | Jack Klein <jackklein@spamcop.net> writes:

|>  | |>  | |>  There are now C++ compilers for 32 bit digital signal
|>  | |>  | |>  processors where char, short, int and long are all 1
|>  | |>  | |>  byte and share the same representation.  Each of those
|>  | |>  | |>  bytes contains 32 bits.

|>  | |>  | A slightly different issue, but I believe that most, if not
|>  | |>  | all of these are freestanding implementations.  There is
|>  | |>  | some question whether int and char can be the same size on a
|>  | |>  | hosted implementation, since functions like fgetc (inherited
|>  | |>  | from C) must return a value in the range [0...UCHAR] or EOF
|>  | |>  | which must be negative.

|>  | |>  Or you may just look at the requirements imposed by the
|>  | |>  standard std::string class in clause 21.

|>  | The advantage of basing the argument on fgetc is that it becomes a
|>  | C problem as well, and not something specific to C++.

|>  Yeah, a clever way of getting rid of the problem ;-)

I almost added something to the effect of letting the C committee do the
work:-).

|>  By, the wording concerning some functions in <ctype.h> I gather that
|>  EOF cannot be a valid 'unsigned char' converted to int.

I'm not sure.  The wording says that the functions must work for all
values in the range 0...UCHAR_MAX and EOF.  *IF* one of the values in
the range 0...UCHAR_MAX results in EOF when converted to int, I don't
think that it is a problem as long as that value isn't alpha, numeric,
etc., i.e. as long as all functions return 0.

If we suppose that char has 32 bits, and uses ISO 10646, this isn't a
problem, since all of the values greater than 0x10FFFF are invalid
characters, and should return false.  (EOF must be negative, which would
mean an unsigned char value of 0x80000000 or greater.  Supposing typical
implementations.)

|>  | |>  A conforming hosted implementation cannot have=20

|>  | |>    values_set(char) =3D=3D values_set(int)

|>  | |>  because every bit in a char representation participate to a
|>  | |>  value representation, i.e. all bits in a char are meaningful.

|>  | Where does it say this.  (Section 21 is large.)

|>  Table 37 says that char_traits<char>::eof() -- identical to EOF --
|>  should return a value not equal to any char value (converted to
|>  int).

Not quite.  Table 37 says that char_traits<charT>::eof() must return a
value e for which eq_int_type(e,to_int_type(c)) is false for all c.

For 32 bit ISO 10646, I use char_type =3D=3D int_type =3D=3D unsigned int=
 (on a
32 bit machine), with:

    int_type to_int_type( char_type c )
    {
        return c < 0x11000 ? c : 0 ;
    }

I'm not sure, but I believe that this is legal.  (At any rate, it seems
the most useful solution.)

|>  | What are the implications for an implementation which wants to
|>  | support ISO 10646 on a 32 bit machine?  The smallest type it can
|>  | declare which supports ISO 10646 is 32 bits.

|>  Then it must make sure values-set(char) is a strict subset of
|>  values-set(int) (for example having a 64-bit int).  Or it doesn't
|>  ;-)

That is the crux of my question.  On the two machines I use (a 32 bit
Sparc under Solaris 2.7 and a PC under Linux), wchar_t is a 32 bit
quantity, there are no integral data types larger than 32 bits.  I don't
want wchar_t to be any smaller, since it must be at least 21 bits for
ISO 10646.  This means that *if* int_type must be larger than char_type,
I have to define a class type for it.  But in practice, I don't need for
it to be larger, since in fact, all legal characters are in the range
0...0x10FFFF.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Wed, 17 Apr 2002 17:00:34 GMT
Raw View
NotFound <correo@tengo.no> writes:

|>  > Not everyone uses the term correctly, not even (apparantly)
|>  > Intel. I'll

|>  What is the correct use? In others contexts a word is the minimun
|>  addressable unit, then a word on all the x86 family will be an
|>  octet.

A word is normally the bus width in the ALU; it is often larger than the
minimum addressable unit.  The correct word for the minimal addressable
unit is byte, although this is normally only used if this unit is
smaller than a word (as it is on most modern processors).

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Wed, 17 Apr 2002 17:03:16 GMT
Raw View
Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr> writes:

|>  Huh?!?  The C++ standard requires that all bits in a char
|>  participate in a char value representation.  And EOF is not a
|>  character.

However, as far as I can see, it doesn't place any constraints with
regards as to what a character can be (except that the characters in the
basic character set must have positive values, even if char is signed).
The requirement that EOF not be a character doesn't mean that it cannot
be a legal char value.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Pete Becker <petebecker@acm.org>
Date: Wed, 17 Apr 2002 17:26:17 GMT
Raw View
"James Kuyper Jr." wrote:
>
> I'm not very familiar with Java; I got the impression from what you said
> earlier that they did support the larger range of characters, they just
> supported them inconveniently, using a multi-byte encoding.
>

As does every programming language, I suppose. The point of wide
characters is to not have to deal with multi-byte encodings. The Java
libraries have a bunch of code that assumes that a single character is
not part of a multi-character sequence.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rhairgroveNoSpam@Pleasebigfoot.com (Bob Hairgrove)
Date: Sat, 13 Apr 2002 23:42:39 GMT
Raw View
In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
there is an interesting passage on page 24:

"A char variable is of the natural size to hold a character on a given
machine (typically a byte), and an int variable is of the natural size
for integer arithmetic on a given machine (typically a word)."

Now the last statement (i.e. sizeof(int) typically == a word)
certainly shows the age of the text here. In the meantime, the
"natural" size of an int has grown to a 32-bit DWORD on most machines,
whereas 64-bit int's are becoming more and more common.

But what does this mean for char?? I was always under the assumption
that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
especially since there is no C++ "byte" type. As we now have the
wchar_t as an intrinsic data type, wouldn't this cement the fact that
char is always 1 byte?

What does the ANSI standard have to say about this?

Bob Hairgrove
rhairgroveNoSpam@Pleasebigfoot.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Edwin Robert Tisdale <E.Robert.Tisdale@jpl.nasa.gov>
Date: Sun, 14 Apr 2002 06:56:42 GMT
Raw View
Bob Hairgrove wrote:

> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
>     "A char variable is of the natural size to hold a character
>      on a given machine (typically a byte) and an int variable
>      is of the natural size for integer arithmetic
>      on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here.
> In the meantime, the "natural" size of an int
> has grown to a 32-bit DWORD on most machines,
> whereas 64-bit int's are becoming more and more common.
>
> But what does this mean for char?
> I was always under the assumption that sizeof(char)
> is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type.
> As we now have the wchar_t as an intrinsic data type,
> wouldn't this cement the fact that char is always 1 byte?
>
> What does the ANSI standard have to say about this?

A byte is a data size -- not a data type.
A byte is 8 bits on virtually every modern processor
and the memories are almost always byte addressable.
A machine word is as wide as the integer data path
throught the Arithmetic and Logic Unit (ALU).

The old Control Data Corporation (CDC) computers
had 60 bit words and were word addressable.
Characters were represented by 60 bit words
or were packed into a word 10 at a time
which means that the CDC character code set
had just 64 distinct codes represented by a 6 bit byte.





---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Witless <witless@attbi.com>
Date: Sun, 14 Apr 2002 07:01:54 GMT
Raw View
Bob Hairgrove wrote:

> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
> "A char variable is of the natural size to hold a character on a given
> machine (typically a byte), and an int variable is of the natural size
> for integer arithmetic on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here.

No.  You are applying a corruption of the term "word".  It does not mean 16
bits.  It means the natural size for the machine, typically the register
size.  On a 128-bit machine it is 128 bits.  On an 8-bit machine it is 8
bits.

> In the meantime, the
> "natural" size of an int has grown to a 32-bit DWORD on most machines,

No it hasn't.  Most machines do not have DWORDs.  32-bit machines often have
words and half words.

4-bit machines that became 8-bit machines that became 16-bit machines that
became 32-bit machines have DWORDs.  Nobody else has anything half as silly.

>
> whereas 64-bit int's are becoming more and more common.

64-bit registers are becoming more common.  Many people who believe in DWORDs
object to 64-bit ints because their religion says that ints are 32 bits.

>
>
> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,

Your assumption is invalid.

> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?

No.

The type char can be 16 bits like Unicode or even 32 bits like the ISO
character sets.

>
>
> What does the ANSI standard have to say about this?

Have you read it?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sun, 14 Apr 2002 07:01:02 GMT
Raw View
Bob Hairgrove wrote:
>
> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
> "A char variable is of the natural size to hold a character on a given
> machine (typically a byte), and an int variable is of the natural size
> for integer arithmetic on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here. In the meantime, the
> "natural" size of an int has grown to a 32-bit DWORD on most machines,
> whereas 64-bit int's are becoming more and more common.
>
> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?
>
> What does the ANSI standard have to say about this?

The standard mandates sizeof(char)==1. The only requirements on the size
of an 'int' are those implied by the requirements that INT_MIN<=-32767,
and INT_MAX>=32767 (these limits are incorporated by reference from the
C standard, rather than being specified in the C++ standard itself).

Bjarne's statement is technically incorrect, but true to the history of
C, when he identifies "char" more closely with "character" than with
"byte". His statement about "words" is actually more accurate;
traditionally a "word" of memory wasn't a fixed amount of memory, but
varied from machine to machine. On a 32-bit machine, a "word" should
properly be a 32-bit chunk of memory. However, when people are used to
programming only for a limited range of architectures, all of which
share the same word size, they tend to assume that "word" means the same
amount of memory on all machines, that it refers to on the machines
they're used to. If enough people do this, the term may even end up
being redefined; confusing people who still remember the original
definition.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Sun, 14 Apr 2002 15:22:56 GMT
Raw View
Edwin Robert Tisdale <E.Robert.Tisdale@jpl.nasa.gov> writes:

|>  Bob Hairgrove wrote:

|>  > In Bjarne Stroustrup's 3rd edition of "The C++ Programming
|>  > Language", there is an interesting passage on page 24:

|>  >     "A char variable is of the natural size to hold a character
|>  >      on a given machine (typically a byte) and an int variable
|>  >      is of the natural size for integer arithmetic
|>  >      on a given machine (typically a word)."

|>  > Now the last statement (i.e. sizeof(int) typically =3D=3D a word)
|>  > certainly shows the age of the text here.  In the meantime, the
|>  > "natural" size of an int has grown to a 32-bit DWORD on most
|>  > machines, whereas 64-bit int's are becoming more and more common.

Excuse me, but on 32 bit machines (at least the ones I've seen), DWORD
is 64 bits.  The "traditional" widths (from IBM, since the 360) are:

    BYTE                 8 bits
    HWORD               16 bits
    WORD                32 bits
    DWORD               64 bits

The only place I've seen otherwise is on 16 bit machines.  Where word is
16 bits.  Or, of course, on 36 bit machines, with 36 bit words, or 48
bit machines, with 48 bit words.

|>  > But what does this mean for char?  I was always under the
|>  > assumption that sizeof(char) is ALWAYS guaranteed to be exactly 1
|>  > byte, especially since there is no C++ "byte" type.  As we now
|>  > have the wchar_t as an intrinsic data type, wouldn't this cement
|>  > the fact that char is always 1 byte?

The standard defines the results of sizeof as the size in bytes.  And
guarantees that sizeof(char) =3D=3D 1.  So by definition, the size of a c=
har
is one byte, even if that char has 32 bits.

|>  > What does the ANSI standard have to say about this?
|> =20
|>  A byte is a data size -- not a data type.
|>  A byte is 8 bits on virtually every modern processor and the
|>  memories are almost always byte addressable.

I'm not so sure.  From what I've heard, more than a few DSP use 32 bit
char's.

|>  A machine word is as wide as the integer data path throught the
|>  Arithmetic and Logic Unit (ALU).

Or as wide as the memory bus?

I'm not sure that there is a real definition of "word".  I've used
machines (Interdata 32/7) where the ALU was 16 bits wide, but the native
instruction set favored 32 bits (through judicious microcode), and if I
remember correctly, the memory bus was 32 bits wide (but it has been a
long time, and I could be mistaken).

|>  The old Control Data Corporation (CDC) computers had 60 bit words
|>  and were word addressable.  Characters were represented by 60 bit
|>  words or were packed into a word 10 at a time which means that the
|>  CDC character code set had just 64 distinct codes represented by a 6
|>  bit byte.

This wouldn't be legal in C/C++, since UCHAR_MAX must be at least 255.
A C/C++ implementation on this machine would probably use 6 10 bit bytes
to the word.  (This is a C/C++ specific.  The original use of byte was
for a 6 bit chunk of data.)

There have definitly been C implementations on 36 bit machines, normally
with 9 bit bytes, and there are implementations today (for DSPs) with 32
bit bytes.  There probably are, and have been, others as well.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Mike Wahler" <mkwahler@ix.netcom.com>
Date: Sun, 14 Apr 2002 15:21:52 GMT
Raw View
Witless <witless@attbi.com> wrote in message
news:3CB8E3C9.E547980@attbi.com...
> Bob Hairgrove wrote:
>
> > In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> > there is an interesting passage on page 24:
> >
> > "A char variable is of the natural size to hold a character on a given
> > machine (typically a byte), and an int variable is of the natural size
> > for integer arithmetic on a given machine (typically a word)."
> >
> > Now the last statement (i.e. sizeof(int) typically == a word)
> > certainly shows the age of the text here.
>
> No.  You are applying a corruption of the term "word".  It does not mean
16
> bits.  It means the natural size for the machine, typically the register
> size.  On a 128-bit machine it is 128 bits.  On an 8-bit machine it is 8
> bits.
>
> > In the meantime, the
> > "natural" size of an int has grown to a 32-bit DWORD on most machines,
>
> No it hasn't.  Most machines do not have DWORDs.  32-bit machines often
have
> words and half words.
>
> 4-bit machines that became 8-bit machines that became 16-bit machines that
> became 32-bit machines have DWORDs.  Nobody else has anything half as
silly.
>
> >
> > whereas 64-bit int's are becoming more and more common.
>
> 64-bit registers are becoming more common.  Many people who believe in
DWORDs
> object to 64-bit ints because their religion says that ints are 32 bits.
>
> >
> >
> > But what does this mean for char?? I was always under the assumption
> > that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
>
> Your assumption is invalid.

No, it's not invalid, it is precisely correct.
sizeof (char) is required to be one byte.  Note
that byte size can and does vary among platforms,
and that a char (byte) is required by the C standard
to have at least eight bits, but is not prevented
from having more.

> > especially since there is no C++ "byte" type. As we now have the
> > wchar_t as an intrinsic data type, wouldn't this cement the fact that
> > char is always 1 byte?
>
> No.

A char is indeed always one byte, but the definition
of type 'wchar_t' has no influence upon this.
"sizeof(char) == one byte" is mandated by the standard.

>
> The type char can be 16 bits like Unicode or even 32 bits like the ISO
> character sets.

Yes, on machines with 16-bit or 32-bit *bytes*.
On a machine with e.g. 8-bit bytes, type 'char'
cannot represent every Unicode character.  Thus
'wchar_t' was invented.

>
> >
> >
> > What does the ANSI standard have to say about this?
>
> Have you read it?

Have *you*? :-)

-Mike



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Sun, 14 Apr 2002 15:21:57 GMT
Raw View
On Sun, 14 Apr 2002 07:01:02 GMT, "James Kuyper Jr."
<kuyper@wizard.net> wrote:

> when people are used to
>programming only for a limited range of architectures, all of which
>share the same word size, they tend to assume that "word" means the same
>amount of memory on all machines, that it refers to on the machines
>they're used to. If enough people do this, the term may even end up
>being redefined; confusing people who still remember the original
>definition.

Yes, and a similar confusion already exists for "byte", which many
people incorrectly assume to mean "8-bit byte".


Genny

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Sun, 14 Apr 2002 15:24:08 GMT
Raw View
Witless <witless@attbi.com> writes:

|>  > In the meantime, the "natural" size of an int has grown to a
|>  > 32-bit DWORD on most machines,

|>  No it hasn't.  Most machines do not have DWORDs.  32-bit machines
|>  often have words and half words.

IBM 360's (the prototypical 32 bit machine) certainly have DWORDS.  A
DWORD is a 8 byte quantity, often initialized with 16 BCD digits.  (The
IBM 360 had machine instructions for all four operations on such
quantities, as well as instructions for 4 bit left and right shifts over
DWORDs.  Very useful for Cobol, or other languages that used decimal
arithmetic.  We once converted the BCD arithmetic routines in a Basic
interpreter from C to assembler -- something like 150 lignes of C became
10 lignes of assembler, and ran four or five magnitudes faster.)

|>  4-bit machines that became 8-bit machines that became 16-bit
|>  machines that became 32-bit machines have DWORDs.  Nobody else has
|>  anything half as silly.

That's because nobody else has been around half as long:-)?  Seriously,
historical reasons lead to all kinds of silliness, where the normal
registers are called extended, and the non-extended registers need a
special instruction prefix to access them.

In the mean time, there are 64 bit machines out there where int is only
32 bits, and you need long to get 64 bits.  That sounds pretty silly,
too, until you realize that the vendors have a lot of customers who were
stupid enough to write code which depended on int being exactly 32 bits.
And making your customer feel like an idiot has never been a
particularly successful commercial policy, even if it is sometimes the
truth.

In the good old days (pre-360), of course, no one worried about
compatibility, so a WORD in IBM's assembler could change from one
machine to the next.  We didn't get such silliness.  But we did have to
rewrite all of our code every time we upgraded the processor.

|>  > whereas 64-bit int's are becoming more and more common.

|>  64-bit registers are becoming more common.  Many people who believe
|>  in DWORDs object to 64-bit ints because their religion says that
|>  ints are 32 bits.

|>  > But what does this mean for char?? I was always under the
|>  > assumption that sizeof(char) is ALWAYS guaranteed to be exactly 1
|>  > byte,

|>  Your assumption is invalid.

I think you misread something.  He said that his assumtion was that
sizeof(char) is guaranteed to be exactly one byte.  Which is exactly
what the standard says.

|>  > especially since there is no C++ "byte" type. As we now have the
|>  > wchar_t as an intrinsic data type, wouldn't this cement the fact
|>  > that char is always 1 byte?

|>  No.

Yes.  ISO 14882, 5.3.3 and ISO 9899 6.5.3.4.

|>  The type char can be 16 bits like Unicode or even 32 bits like the
|>  ISO character sets.

The type char can be 16 bits, or 32 bits.  In the past, it has often
been 9 bits, and I think that there have also been 10 bit
implementations.

But the size of char in bytes is always 1.

|>  > What does the ANSI standard have to say about this?

|>  Have you read it?

Have you?

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Carl Daniel" <cpdaniel@pacbell.net>
Date: Sun, 14 Apr 2002 15:30:42 GMT
Raw View
"Bob Hairgrove" <rhairgroveNoSpam@Pleasebigfoot.com> wrote in message
news:3cb820f5.7959715@news.ch.kpnqwest.net...
> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?

sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
unit.  All other types must have sizes which are multiples of sizeof(char).
The standard makes no claim that 1 memory allocation unit == 1 byte.  On a
system with a 16-bit "natural character", sizeof(char) and sizeof(wchar_t)
might both be 1, and sizeof(int), though it's 32 bits, would be 2 not 4.

Further, there's no guarantee that you have any access to the smallest
addressable unit of storage, only to storage which is allocated in multiples
of char.  For example, on an 8051, the smallest addressable unit is 1 bit,
but char is still 8 bits on 8051 C compilers - those addressable bits are
simply outside the C/C++ memory model on such a system (of course, and 8051
compiler will provide a way to access them, but it will do so by an
extension - nothing in the standard makes it possible).

HTH

-cd

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Gabriel Dos Reis <dosreis@cmla.ens-cachan.fr>
Date: Sun, 14 Apr 2002 15:32:07 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:

[...]

| Bjarne's statement is technically incorrect, but true to the history of
| C, when he identifies "char" more closely with "character" than with
             ^^^^^^^^^^
| "byte".

Firstly, note that B. Stroustrup didn't *identify* "char" with
"character"; rather, I quote (from the original poster):

  "A char variable is of the natural size to hold a character on a given
   machine (typically a byte)

Secondly, it has been the tradition that 'char', in C++, is the
natural type for holding characters, as examplified by the standard
type std::string and the standard narrow streams.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rhairgroveNoSpam@Pleasebigfoot.com (Bob Hairgrove)
Date: Sun, 14 Apr 2002 15:35:26 GMT
Raw View
On Sun, 14 Apr 2002 07:01:54 GMT, Witless <witless@attbi.com> wrote:

>> But what does this mean for char?? I was always under the assumption
>> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
>
>Your assumption is invalid.
>

Check out Mike Wahler's response ... seems that the standard does
guarantee this (although a byte doesn't have to be 8 bits). That is,
the guarantee seems to be that sizeof(char)==1 under all
circumstances.

>> What does the ANSI standard have to say about this?
>
>Have you read it?

Hmm ... I thought it was more expensive than it is ... Now that I have
gone to www.ansi.org, I was delighted to discover that it is only $18.
I'm sure this will be well worth buying.


Bob Hairgrove
rhairgroveNoSpam@Pleasebigfoot.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Sun, 14 Apr 2002 18:33:09 GMT
Raw View
James Kanze <kanze@gabi-soft.de> writes:

> Excuse me, but on 32 bit machines (at least the ones I've seen), DWORD
> is 64 bits.

I guess you have not seen Microsoft Windows, then. Just try

#include <windows.h>
#include <stdio.h>

int main()
{
  printf("%d\n", sizeof(DWORD));
}

in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Chris Wolfe <cwolfe@globetrotter.qc.ca>
Date: Sun, 14 Apr 2002 20:04:13 GMT
Raw View
"Martin v. L=F6wis" wrote:
>=20
> James Kanze <kanze@gabi-soft.de> writes:
>=20
> > Excuse me, but on 32 bit machines (at least the ones I've seen), DWOR=
D
> > is 64 bits.
>=20
> I guess you have not seen Microsoft Windows, then. Just try
>=20
> #include <windows.h>
> #include <stdio.h>
>=20
> int main()
> {
>   printf("%d\n", sizeof(DWORD));
> }
>=20
> in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.
>=20
> Regards,
> Martin

AFAIK that is for backwards compatibility with 16-bit DOS and Windows
3.x. A double word at the assembler level is still 64 bits.

And as we're well off-topic at this point...

Cheers,
Chris

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sun, 14 Apr 2002 20:03:56 GMT
Raw View
Carl Daniel wrote:
....
> sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
> unit.  All other types must have sizes which are multiples of sizeof(char).
> The standard makes no claim that 1 memory allocation unit == 1 byte.  On a

Section 5.3.3: "The sizeof operator yields the number of bytes in the
object representation of its operand."

> system with a 16-bit "natural character", sizeof(char) and sizeof(wchar_t)
> might both be 1, and sizeof(int), though it's 32 bits, would be 2 not 4.


Correct. For instance, that means that on such a system, 'int' is two
16-bit bytes long.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Carl Daniel" <cpdaniel@pacbell.net>
Date: Sun, 14 Apr 2002 21:02:07 GMT
Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3CB9CE84.7F4267B2@wizard.net...
> Carl Daniel wrote:
> ....
> > sizeof(char) is guaranteed to be 1.  1 what though?  1 memory allocation
> > unit.  All other types must have sizes which are multiples of
sizeof(char).
> > The standard makes no claim that 1 memory allocation unit == 1 byte.  On
a
>
> Section 5.3.3: "The sizeof operator yields the number of bytes in the
> object representation of its operand."

I hadn't looked at that section before this morning.  I'm surprised they
worded it that way, since it's patently false given the most common meaning
of 'byte' (8 bits).  It would have helped if the standard actually defined
the word byte, or simply not used it at all.  As is, the section is
confusing at best.

And yes, I realize that in the past 'byte' was used more flexibly, with
'bytes' being 6, 7, 8, 9, 10, 12, and even 15 bits on various systems.
Surely today, and as surely in 1998, most readers think "8 bits" when they
see the word "byte".

-cd


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Witless <witless@attbi.com>
Date: Sun, 14 Apr 2002 21:41:04 GMT
Raw View
Bob Hairgrove wrote:

> On Sun, 14 Apr 2002 07:01:54 GMT, Witless <witless@attbi.com> wrote:
>
> >> But what does this mean for char?? I was always under the assumption
> >> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> >
> >Your assumption is invalid.
> >
>
> Check out Mike Wahler's response ... seems that the standard does
> guarantee this (although a byte doesn't have to be 8 bits). That is,
> the guarantee seems to be that sizeof(char)==1 under all
> circumstances.

That's not the issue.  The hidden redefinition of "byte" is the issue.

{OT} This sleight of hand is similar to the IRS definition of income.

>
>
> >> What does the ANSI standard have to say about this?
> >
> >Have you read it?
>
> Hmm ... I thought it was more expensive than it is ... Now that I have
> gone to www.ansi.org, I was delighted to discover that it is only $18.
> I'm sure this will be well worth buying.

I wish you good luck with it.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Mike Wahler" <mkwahler@ix.netcom.com>
Date: Mon, 15 Apr 2002 01:13:58 GMT
Raw View
Witless <witless@attbi.com> wrote in message
news:3CB9F78D.A211190D@attbi.com...
> Bob Hairgrove wrote:
>
> > On Sun, 14 Apr 2002 07:01:54 GMT, Witless <witless@attbi.com> wrote:
> >
> > >> But what does this mean for char?? I was always under the assumption
> > >> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> > >
> > >Your assumption is invalid.
> > >
> >
> > Check out Mike Wahler's response ... seems that the standard does
> > guarantee this (although a byte doesn't have to be 8 bits). That is,
> > the guarantee seems to be that sizeof(char)==1 under all
> > circumstances.
>
> That's not the issue.  The hidden redefinition
> of "byte" is the issue.

It's not 'redefined', it's defined.  And it's not hidden.

>
> {OT} This sleight of hand is similar to the IRS definition of income.

Sleight of hand?  I agree with the IRS part, but not that
it applies to the standard.


-Mike



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Mike Wahler" <mkwahler@ix.netcom.com>
Date: Mon, 15 Apr 2002 01:14:37 GMT
Raw View
Carl Daniel <cpdaniel@pacbell.net> wrote in message
news:Im5u8.1420$Uf.1278678108@newssvr21.news.prodigy.com...
> "Bob Hairgrove" <rhairgroveNoSpam@Pleasebigfoot.com> wrote in message
> news:3cb820f5.7959715@news.ch.kpnqwest.net...
> > But what does this mean for char?? I was always under the assumption
> > that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> > especially since there is no C++ "byte" type. As we now have the
> > wchar_t as an intrinsic data type, wouldn't this cement the fact that
> > char is always 1 byte?
>
> sizeof(char) is guaranteed to be 1.  1 what though?


One byte.

> 1 memory allocation
> unit.

No.

>  All other types must have sizes which are multiples of sizeof(char).

Right.  'char' and 'byte' are synonymous in C++

> The standard makes no claim that 1 memory allocation unit == 1 byte.

It absolutely does.  See my quote of the standard elsethread.

> On a
> system with a 16-bit "natural character",


In this context, 'natural character' == byte.

>sizeof(char) and sizeof(wchar_t)
> might both be 1,

sizeof(char) is *required* to be one byte.
sizeof(wchar_t) is usually larger, typically two
(but it's implementation-defined).

> and sizeof(int), though it's 32 bits, would be 2 not 4.

Absolutely not.  sizeof(int) is implementation-defined,
but is still express in bytes (i.e. chars).  A 32-bit
int's sizeof will be 32 / CHAR_BIT.

>
> Further, there's no guarantee that you have any access to the smallest
> addressable unit of storage,

Yes there is.  The byte is specified as smallest addressible unit.

>only to storage which is allocated in multiples
> of char.

Right.  'char' == 'byte'

>For example, on an 8051, the smallest addressable unit is 1 bit,

But not from C++.

> but char is still 8 bits on 8051 C compilers

Which means smallest addressible unit (from C++) is an eight-bit byte.

>- those addressable bits are
> simply outside the C/C++ memory model on such a system

Exactly.

> (of course, and 8051
> compiler will provide a way to access them, but it will do so by an
> extension - nothing in the standard makes it possible).

Right.

-Mike



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 15 Apr 2002 01:15:08 GMT
Raw View
Carl Daniel wrote:
....
> of 'byte' (8 bits).  It would have helped if the standard actually defined
> the word byte, or simply not used it at all.  As is, the section is

It does define it, in section 1.7p1: "The fundamental storage unit in
the C++ memory model is the _byte_. A byte is at least large enough to
contain any member of the basic execution character set and is composed
of a contiguous sequence of bits, the number of which is
implementation-defined." The fact that "byte" is italicized, indicates
that this clause should be taken as defining that term. As far as
standardese goes (which isn't very far) you can't get much clearer than
that. In particular, pay special attention the the very last part of
that definition.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Jack Klein <jackklein@spamcop.net>
Date: Mon, 15 Apr 2002 01:16:17 GMT
Raw View
On Sat, 13 Apr 2002 23:42:39 GMT, rhairgroveNoSpam@Pleasebigfoot.com
(Bob Hairgrove) wrote in comp.lang.c++:

> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
> "A char variable is of the natural size to hold a character on a given
> machine (typically a byte), and an int variable is of the natural size
> for integer arithmetic on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here. In the meantime, the
> "natural" size of an int has grown to a 32-bit DWORD on most machines,
> whereas 64-bit int's are becoming more and more common.

Who's "DWORD"?  On a PowerPC, a DWORD is 64 bits, a WORD is 32 bits.
Neither Microsoft nor Intel define C++.

> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?
>
> What does the ANSI standard have to say about this?

sizeof(char) is 1 by definition, always has been in C and C++, and
almost certainly always will be.  Changing it would break far too much
existing, properly working, conforming code.  So a char is 1 byte,
which contains at least 8 bits or possibly more.

There are now C++ compilers for 32 bit digital signal processors where
char, short, int and long are all 1 byte and share the same
representation.  Each of those bytes contains 32 bits.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: James Kanze <kanze@gabi-soft.de>
Date: Mon, 15 Apr 2002 07:41:43 GMT
Raw View
loewis@informatik.hu-berlin.de (Martin v. L=F6wis) writes:

|>  James Kanze <kanze@gabi-soft.de> writes:

|>  > Excuse me, but on 32 bit machines (at least the ones I've seen),
|>  > DWORD is 64 bits.

|>  I guess you have not seen Microsoft Windows, then. Just try

Not directly.  I've written a few programs for Windows, but we always
used Java/Corba for the GUI parts, and I wrote the code in pretty much
standard C++.  A priori, however, DWORD is an assembler concept, and not
something I'd expect to see in C/C++.

|>  #include <windows.h>
|>  #include <stdio.h>

|>  int main()
|>  {
|>    printf("%d\n", sizeof(DWORD));
|>  }

|>  in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.

I presume that there are some reasons of backwards compatibility.
Although I'll admit that I don't see what something like DWORD is doing
in a C++, or even a C, interface. Somebody must have seriously muffed
the design, a long time ago.

--=20
James Kanze                                mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]