Thread

Topic: C and C++ Standard incompatibility for signed character type

Author: "Douglas A. Gwyn" <gwyn@arl.army.mil>
Date: Thu, 2 Nov 2000 18:48:27 GMT Raw View

Francis Glassborow wrote:
> >    CHAR_BIT == 10
> >    UCHAR_MAX == 1023
> >    CHAR_MAX == 255
> >    CHAR_MIN == -256
> True, but does anyone know of such a system, or a reason for doing that
> other than the Standard allows it.

I think the point is that the standard should not allow it.
An argument can be given for exempting the wider integer types
from any requirement that signed and unsigned varieties have
the same width, e.g. there is at least one implementation
where multi-word signed arithmetic cannot be efficiently
executed if the low-order word's sign bit is not omitted from
the representation, but unsigned operations work okay if all
bits are used.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: James.Kanze@dresdner-bank.com
Date: Wed, 1 Nov 2000 16:44:48 GMT Raw View

In article <c5vCyUAGAq$5EwFI@ntlworld.com>,
  Francis Glassborow <francisG@robinton.demon.co.uk> wrote:
> In article <slrn8vs450.igr.Team-Rocket@nightrunner.nm.dnsalias.net>,
> Niklas Matthies <Team-Rocket@gmx.net> writes
> >On Mon, 30 Oct 2000 23:51:03 GMT, Francis Glassborow
<francis.glassborow@ntlworld.com> wrote:
> >> In article
<slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
> >> Niklas Matthies <Team-Rocket@gmx.net> writes
> >> >It's not (in C). The uninitialized chars may have trap values
> >> >(for the type char)

> >> Please quote text to support this claim

> >It's the other way round. The standard does not specify such a
> >guarantee (unless I missed something), therefore there is none.

> The C standard requires that unsigned char has no trap values (that
> can be deduced from the rules for type punning etc.) It also
> requires that char can be exactly accessed as an unsigned char
> though the negative values will be mapped to positive values beyond
> the maximum value for char (assuming it is a signed version). I do
> not see how these requirements can allow for the existence of trap
> values.

And I don't see what is to prevent them.  Again, take Clive's example:

    CHAR_BIT == 10
    UCHAR_MAX == 1023
    CHAR_MAX == 255
    CHAR_MIN == -256

This obviously allows trapping values for char.  It also easily meets
the requirements you give.

--
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627


Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: Wed, 1 Nov 2000 16:44:59 GMT Raw View

On Wed,  1 Nov 2000 01:08:09 GMT, Niklas Matthies <Team-Rocket@gmx.net> w=
rote:
[=B7=B7=B7]
>    char c, c2;
>    unsigned char u;
>    uc =3D *(unsigned char *) &c;   /* ok */
>    c2 =3D c;                       /* could constitute behavior */
                                                      ^
                                                  undefined

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: Wed, 1 Nov 2000 18:34:20 GMT Raw View

On Wed,  1 Nov 2000 15:47:39 GMT, James.Kanze@dresdner-bank.com <James.Ka=
nze@dresdner-bank.com> wrote:
> In article <slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
>   Team-Rocket@gmx.net (Niklas Matthies) wrote:
> > On Mon, 30 Oct 2000 18:47:33 GMT, kanze@gabi-soft.de
> > <kanze@gabi-soft.de> wrote:
[=B7=B7=B7]
> > > There is an interesting case, however.  Consider the following code=
:
>=20
> > >     struct X { char buf[ 16 ] ; } ;
>=20
> > >     struct X src ;
> > >     strcpy( src.buf, "a" ) ;
> > >     struct X dst ;
> > >     dst =3D src ;
>=20
> > > Is this guaranteed To work?
>=20
> > It's not (in C). The uninitialized chars may have trap values (for
> > the type char), and structure assignment is performed by value (with
> > the respective type).
>=20
> I know that uninitialized char's may have trap values, but where does
> the C standard specify memberwise copy?

It doesn't. Neither does it specify copying the underlying object
representation. A struct is not anymore special than the scalar types,
which needn't preserve their object representation when copied.
Conceptually, values are read, and values are written. The object
representation gets lost inbetween.

> I would have imagined that the results would be the same as copying
> the struct with memcpy, and that isn't allowed to trap.

In the C standard, memcpy() is specified to "copy characters". It is not
specified in terms of copying the object representation (i.e. in terms
of unsigned char). This wording would allow copying of char values
without necessarily copying any padding bits char might have. This is
probably a DR candidate.

Nevertheless, why should assignment share the semantics of memcpy()?
The standard certainly doesn't say so.

> All I see in C99 is that the value of the right operand "replaces" the
> left, however, and that isn't really clear on this point.

Yes, it doesn't properly define what the "value" of a struct should be.
But it seems pretty straightforward that it should be the tuple of
values of its members.

> > Another consequence is that if the char values stored in buf[] are
> > valid and have padding bits, the padding bits need not to remain the
> > same when the values are copied. Similarly, whether
>=20
> >    int f() {
> >       char c1 =3D 'a';
> >       char c2 =3D c1;
> >       return memcmp(&c1, &c2, 1);
> >    }
>=20
> > returns 0, a positive or a negative value is implementation-defined,
> > and may differ for different invocations of f().
>=20
> I'm aware of this.

Well, then why should

   struct cs { char c; } cs1, cs2;
   ...  =20

   cs1 =3D cs2;

work differently from / guarantee more than

   cs1.c =3D cs2.c;

?
  =20
-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Wed, 1 Nov 2000 19:13:56 GMT Raw View

In article <slrn8vu927.46g.Team-Rocket@nightrunner.nm.dnsalias.net>,
Niklas Matthies <Team-Rocket@gmx.net> writes
>Trap values are a property of the type. A trap value of type T only
>causes undefined behavior when accessed through an lvalue of type T. If
>an object ("object" in the sense that the C standard uses) of type T1 is
>accessed through an lvalue of type T2, then it doesn't matter whether
>the bit pattern of the object constitutes a trap representation of type
>T1, it only matters whether it constitutes a trap representation of type
>T2 (the type through which it is accessed). So, since unsigned char does
>not have trap values, it is of course absolutely valid to access chars
>as unsigned chars.

There is a requirement that the range of values for char shall either be
that of signed char or that for unsigned char.  Any trap value could not
be in that range. So the question is 'Can a signed char include trap
values?  I guess the answer to that may be 'yes' but I am uncertain.
CHAR_BIT give the number of bits in the smallest object that is not a
bit-field. It would be a very unusual implementation in which unsigned
char had more bits than CHAR_BIT, indeed I am not convinced that that is
allowed.


Francis Glassborow      Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: Wed, 1 Nov 2000 20:28:45 GMT Raw View

On Wed,  1 Nov 2000 19:13:56 GMT, Francis Glassborow <francis.glassborow@=
ntlworld.com> wrote:
> In article <slrn8vu927.46g.Team-Rocket@nightrunner.nm.dnsalias.net>,
> Niklas Matthies <Team-Rocket@gmx.net> writes
> >Trap values are a property of the type. A trap value of type T only
> >causes undefined behavior when accessed through an lvalue of type T. I=
f
> >an object ("object" in the sense that the C standard uses) of type T1 =
is
> >accessed through an lvalue of type T2, then it doesn't matter whether
> >the bit pattern of the object constitutes a trap representation of typ=
e
> >T1, it only matters whether it constitutes a trap representation of ty=
pe
> >T2 (the type through which it is accessed). So, since unsigned char do=
es
> >not have trap values, it is of course absolutely valid to access chars
> >as unsigned chars.
>=20
> There is a requirement that the range of values for char shall either b=
e
> that of signed char or that for unsigned char.  Any trap value could no=
t
> be in that range. So the question is 'Can a signed char include trap
> values?  I guess the answer to that may be 'yes' but I am uncertain.
> CHAR_BIT give the number of bits in the smallest object that is not a
> bit-field. It would be a very unusual implementation in which unsigned
> char had more bits than CHAR_BIT, indeed I am not convinced that that i=
s
> allowed.

A char object consists of value bits, possibly a sign bit (if it is
signed), and possibly padding bits (if it signed). CHAR_BIT counts all
those bits, including the padding bits.

Quoting some relevant bits from [C99:6.2.6.1] and [C99:6.2.6.2]:

   Values stored in unsigned bit-fields and objects of type unsigned
   char shall be represented using a pure binary notation. Values stored
   in non-bit-field objects of any other object type consist of /n/ =D7
   CHAR_BIT bits, where /n/ is the size of an object of that type, in
   bytes. The value may be copied into an object of type unsigned char
   [/n/] (e.g., by memcpy); the resulting set of bytes is called the
   /object representation/ of the value. [=B7=B7=B7] Two values (other th=
an
   NaNs) with the same object representation compare equal, but values
   that compare equal may have different object representations.

   [=B7=B7=B7]

   For signed integer types, the bits of the object representation shall
   be divided into three groups: value bits, padding bits, and the sign
   bit. There need not be any padding bits; there shall be exactly one
   sign bit. Each bit that is a value bit shall have the same value as
   the same bit in the object representation of the corresponding
   unsigned type (if there are /M/ value bits in the signed type and /N/
   in the unsigned type, then /M/ <=3D /N/). If the sign bit is zero, it
   shall not affect the resulting value. If the sign bit is one, the
   value shall be modified in one of the following ways:=20

   - the corresponding value with sign bit 0 is negated
     (/sign and magnitude/);

   - the sign bit has the value -(2^/N/) (/two's complement/);
=20
   - the sign bit has the value -(2^/N/ - 1) (/one's complement/).

   Which of these applies is implementation-defined, as is whether the
   value with sign bit 1 and all value bits zero (for the first two), or
   with sign bit and all value bits 1 (for one's complement), is a trap
   representation or a normal value. In the case of sign and magnitude
   and one s complement, if this representation is a normal value it is
   called a negative zero.

   The values of any padding bits are unspecified. A valid (non-trap)
   object representation of a signed integer type where the sign bit is
   zero is a valid object representation of the corresponding unsigned
   type, and shall represent the same value. The precision of an integer
   type is the number of bits it uses to represent values, excluding any
   sign and padding bits. The width of an integer type is the same but
   including any sign bit; thus for unsigned integer types the two
   values are the same, while for signed integer types the width is one
   greater than the precision.

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Wed, 1 Nov 2000 20:46:56 GMT Raw View

In article <8tov45$l1n$1@nnrp1.deja.com>, James.Kanze@dresdner-bank.com
writes
>And I don't see what is to prevent them.  Again, take Clive's example:
>
>    CHAR_BIT == 10
>    UCHAR_MAX == 1023
>    CHAR_MAX == 255
>    CHAR_MIN == -256
>
>This obviously allows trapping values for char.  It also easily meets
>the requirements you give.

True, but does anyone know of such a system, or a reason for doing that
other than the Standard allows it.


Francis Glassborow      Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: jthill@telus.net (Jim Hill)
Date: Wed, 1 Nov 2000 22:32:13 GMT Raw View

Francis Glassborow <francis.glassborow@ntlworld.com> wrote:

> True, but does anyone know of such a system, or a reason for doing that
> other than the Standard allows it.

That anyone's still selling? I don't, but there's a well-respected
example that could reasonably be implemented that way: on the DEC-10

# define CHAR_BIT  9
# define UCHAR_MAX 511
# define CHAR_MAX  127
# define CHAR_MIN -128

would make perfect sense -- I don't see any need to support 9-bit chars
here beyond the dubious proposition that broken code should work anyway.

There is a requirement in C99 that unsigned char[N*sizeof(T)] be large
enough to hold any T[N] value. Is there any requirement that N be the
_smallest_ multiple of sizeof(T) large enough to hold a T[N]? I don't
see it on a cursory search, nor why there should be one.

Jim

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Tue, 31 Oct 2000 18:02:10 GMT Raw View

In article <slrn8vs450.igr.Team-Rocket@nightrunner.nm.dnsalias.net>,
Niklas Matthies <Team-Rocket@gmx.net> writes
>On Mon, 30 Oct 2000 23:51:03 GMT, Francis Glassborow <francis.glassborow@ntlworl
>d.com> wrote:
>> In article <slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
>> Niklas Matthies <Team-Rocket@gmx.net> writes
>> >It's not (in C). The uninitialized chars may have trap values (for the
>> >type char)
>>
>> Please quote text to support this claim
>
>It's the other way round. The standard does not specify such a guarantee
>(unless I missed something), therefore there is none.

The C standard requires that unsigned char has no trap values (that can
be deduced from the rules for type punning etc.) It also requires that
char can be exactly accessed as an unsigned char though the negative
values will be mapped to positive values beyond the maximum value for
char (assuming it is a signed version). I do not see how these
requirements can allow for the existence of trap values.


Francis Glassborow      Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: Wed, 1 Nov 2000 01:08:09 GMT Raw View

On Tue, 31 Oct 2000 18:02:10 GMT, Francis Glassborow <francis.glassborow@ntlworld.com> wrote:
> In article <slrn8vs450.igr.Team-Rocket@nightrunner.nm.dnsalias.net>,
> Niklas Matthies <Team-Rocket@gmx.net> writes
> >On Mon, 30 Oct 2000 23:51:03 GMT, Francis Glassborow <francis.glassborow@ntlworl
> >d.com> wrote:
> >> In article <slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
> >> Niklas Matthies <Team-Rocket@gmx.net> writes
> >> >It's not (in C). The uninitialized chars may have trap values (for the
> >> >type char)
> >>
> >> Please quote text to support this claim
> >
> >It's the other way round. The standard does not specify such a guarantee
> >(unless I missed something), therefore there is none.
>
> The C standard requires that unsigned char has no trap values (that can
> be deduced from the rules for type punning etc.) It also requires that
> char can be exactly accessed as an unsigned char though the negative
> values will be mapped to positive values beyond the maximum value for
> char (assuming it is a signed version). I do not see how these
> requirements can allow for the existence of trap values.

Trap values are a property of the type. A trap value of type T only
causes undefined behavior when accessed through an lvalue of type T. If
an object ("object" in the sense that the C standard uses) of type T1 is
accessed through an lvalue of type T2, then it doesn't matter whether
the bit pattern of the object constitutes a trap representation of type
T1, it only matters whether it constitutes a trap representation of type
T2 (the type through which it is accessed). So, since unsigned char does
not have trap values, it is of course absolutely valid to access chars
as unsigned chars. But:

   char c, c2;
   unsigned char u;
   uc = *(unsigned char *) &c;   /* ok */
   c2 = c;                       /* could constitute behavior */

The question is whether a structure consisting of an array of chars may
be copied by accesses through lvalues of type char. I believe this is
valid. And if it is, and char can have trap values, and the array
contains such trap values, then copying that structure will constitute
undefined behavior.

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: James.Kanze@dresdner-bank.com
Date: Wed, 1 Nov 2000 15:47:39 GMT Raw View

In article <slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
  Team-Rocket@gmx.net (Niklas Matthies) wrote:

> On Mon, 30 Oct 2000 18:47:33 GMT, kanze@gabi-soft.de
> <kanze@gabi-soft.de> wrote:

> [...]
> > I don't think that there was ever a vote, or anything, to exclude
> > signed char.  It just so happens that it was never included.  If I
> > understand Clive correctly, a legal implementation with ten bit
> > bytes could define:

> >     UCHAR_MAX == 1023
> >     SCHAR_MIN == -256
> >     SCHAR_MAX == 255

> > I certainly don't see any words in the C standard, either C90 or
> > C99, which would forbid this.  For that matter, I don't think it
> > would be illegal in C++, as long as all copy operations on signed
> > char's copied all ten bits, without trapping.

> > In both languages (and both versions of C), or course, the actual
> > number of bits of the two types must be identical.  And in both
> > languages, memcpy may not fault -- in practice, it must work as if
> > the copy took place through an unsigned copy.

> > There is an interesting case, however.  Consider the following code:

> >     struct X { char buf[ 16 ] ; } ;

> >     struct X src ;
> >     strcpy( src.buf, "a" ) ;
> >     struct X dst ;
> >     dst = src ;

> > Is this guaranteed To work?

> It's not (in C). The uninitialized chars may have trap values (for
> the type char), and structure assignment is performed by value (with
> the respective type).

I know that uninitialized char's may have trap values, but where does
the C standard specify memberwise copy?  I would have imagined that
the results would be the same as copying the struct with memcpy, and
that isn't allowed to trap.  All I see in C99 is that the value of
the right operand "replaces" the left, however, and that isn't really
clear on this point.

> Another consequence is that if the char values
> stored in buf[] are valid and have padding bits, the padding bits
> need not to remain the same when the values are copied. Similarly,
> whether

>    int f() {
>       char c1 = 'a';
>       char c2 = c1;
>       return memcmp(&c1, &c2, 1);
>    }

> returns 0, a positive or a negative value is implementation-defined,
> and may differ for different invocations of f().

I'm aware of this.


Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: Mon, 30 Oct 2000 22:23:56 GMT Raw View

On Mon, 30 Oct 2000 18:47:33 GMT, kanze@gabi-soft.de <kanze@gabi-soft.de>=
 wrote:
[=B7=B7=B7]
> I don't think that there was ever a vote, or anything, to exclude signe=
d
> char.  It just so happens that it was never included.  If I understand
> Clive correctly, a legal implementation with ten bit bytes could define=
:
>=20
>     UCHAR_MAX =3D=3D 1023
>     SCHAR_MIN =3D=3D -256
>     SCHAR_MAX =3D=3D 255
>=20
> I certainly don't see any words in the C standard, either C90 or C99,
> which would forbid this.  For that matter, I don't think it would be
> illegal in C++, as long as all copy operations on signed char's copied
> all ten bits, without trapping.
>=20
> In both languages (and both versions of C), or course, the actual numbe=
r
> of bits of the two types must be identical.  And in both languages,
> memcpy may not fault -- in practice, it must work as if the copy took
> place through an unsigned copy.
>=20
> There is an interesting case, however.  Consider the following code:
>=20
>     struct X { char buf[ 16 ] ; } ;
>=20
>     struct X src ;
>     strcpy( src.buf, "a" ) ;
>     struct X dst ;
>     dst =3D src ;
>=20
> Is this guaranteed To work?

It's not (in C). The uninitialized chars may have trap values (for the
type char), and structure assignment is performed by value (with the
respective type). Another consequence is that if the char values stored
in buf[] are valid and have padding bits, the padding bits need not to
remain the same when the values are copied. Similarly, whether

   int f() {
      char c1 =3D 'a';
      char c2 =3D c1;
      return memcmp(&c1, &c2, 1);
   }

returns 0, a positive or a negative value is implementation-defined, and
may differ for different invocations of f().

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Mon, 30 Oct 2000 23:51:03 GMT Raw View

In article <slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
Niklas Matthies <Team-Rocket@gmx.net> writes
>It's not (in C). The uninitialized chars may have trap values (for the
>type char)

Please quote text to support this claim


Francis Glassborow      Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: brahms@mindspring.com (Stan Brown)
Date: Mon, 30 Oct 2000 23:51:18 GMT Raw View

kanze@gabi-soft.de <kanze@gabi-soft.de> wrote in comp.std.c++:
>    UCHAR_MAX =3D=3D 1023
>    SCHAR_MIN =3D=3D -256
>    SCHAR_MAX =3D=3D 255

Quoted-printable encoding in your news-posting software has a bad
effect on the text of program constructs, like the above.

I tried to mail you privately about this, but the mail bounced.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
                                  http://oakroadsystems.com
C++ FAQ Lite: http://www.parashift.com/c++-faq-lite/
the C++ standard: http://webstore.ansi.org/
reserved C++ identifiers: http://oakroadsystems.com/tech/cppredef.htm
more FAQs: http://oakroadsystems.com/tech/faqget.htm

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: Tue, 31 Oct 2000 00:25:12 GMT Raw View

On Mon, 30 Oct 2000 23:51:03 GMT, Francis Glassborow <francis.glassborow@ntlworld.com> wrote:
> In article <slrn8vrl3c.uu1.Team-Rocket@nightrunner.nm.dnsalias.net>,
> Niklas Matthies <Team-Rocket@gmx.net> writes
> >It's not (in C). The uninitialized chars may have trap values (for the
> >type char)
>
> Please quote text to support this claim

It's the other way round. The standard does not specify such a guarantee
(unless I missed something), therefore there is none.

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: kanze@gabi-soft.de
Date: Mon, 30 Oct 2000 18:47:16 GMT Raw View

wmm@fastdial.net writes:

|>  The main reason for specifying the character types that way was to
|>  allow something like strcpy to be written in well-defined C++, with
|>  no magic required.  You need a type that can both access all bits of
|>  memory and not trap on any bit patterns.  We could have invented a
|>  "byte" type for that purpose, but we felt (and were assured by J11
|>  people participating in J16) that the C standard intended that
|>  unsigned char have the needed properties.

That is what I have always understood with regards to the C standard --
it guarantees that an *unsigned char* can be used to access raw memory,
including uninitialized memory.  The C++ standard guarantees that any
character type (char, unsigned char or, I think, signed char) can be
used.  This is a minor difference.

--=20
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: kanze@gabi-soft.de
Date: Mon, 30 Oct 2000 18:47:33 GMT Raw View

"Douglas A. Gwyn" <DAGwyn@null.net> writes:

|>  "Clive D.W. Feather" wrote:
|>  > >C99 6.2.6.2 paragraph 2:
|>  > >There is specifically no contrary wording for signed char, so unde=
r
|>  > >this standard signed char can have both padding bits and trap
|>  > >representations.
|>  > Correct. This means that WG14 and WG21 made different decisions.
|>  > In particular, a system with (say) 10 bit bytes might want signed c=
har
|>  > to be [-256,255].

|>  I frankly was surprised to discover that such a decision had been
|>  made.  Maybe it occurred at one of the overseas meetings (which I
|>  couldn't attend).  Anyway, I think it is a mistake to allow an
|>  operational width (sign + value bits) for signed char that differs
|>  from the width (value bits) for unsigned char.  I was pretty sure we
|>  had agreed that these widths must match for every corresponding
|>  (signed/unsigned) pair of integer types.

I don't think that there was ever a vote, or anything, to exclude signed
char.  It just so happens that it was never included.  If I understand
Clive correctly, a legal implementation with ten bit bytes could define:

    UCHAR_MAX =3D=3D 1023
    SCHAR_MIN =3D=3D -256
    SCHAR_MAX =3D=3D 255

I certainly don't see any words in the C standard, either C90 or C99,
which would forbid this.  For that matter, I don't think it would be
illegal in C++, as long as all copy operations on signed char's copied
all ten bits, without trapping.

In both languages (and both versions of C), or course, the actual number
of bits of the two types must be identical.  And in both languages,
memcpy may not fault -- in practice, it must work as if the copy took
place through an unsigned copy.

There is an interesting case, however.  Consider the following code:

    struct X { char buf[ 16 ] ; } ;

    struct X src ;
    strcpy( src.buf, "a" ) ;
    struct X dst ;
    dst =3D src ;

Is this guaranteed To work?  If so, why, since it potentially involves
copying uninitialized char's.  (It *is* guaranteed in C++, but you can
easily create the same problem in C++ by using wchar_t instead of char.)

--=20
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: "Douglas A. Gwyn" <DAGwyn@null.net>
Date: 2000/10/22 Raw View

"Clive D.W. Feather" wrote:
> >C99 6.2.6.2 paragraph 2:
> >There is specifically no contrary wording for signed char, so under
> >this standard signed char can have both padding bits and trap
> >representations.
> Correct. This means that WG14 and WG21 made different decisions.
> In particular, a system with (say) 10 bit bytes might want signed char
> to be [-256,255].

I frankly was surprised to discover that such a decision had been
made.  Maybe it occurred at one of the overseas meetings (which I
couldn't attend).  Anyway, I think it is a mistake to allow an
operational width (sign + value bits) for signed char that differs
from the width (value bits) for unsigned char.  I was pretty sure
we had agreed that these widths must match for every corresponding
(signed/unsigned) pair of integer types.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Jack Klein <jackklein@spamcop.net>
Date: 2000/10/16 Raw View

Both the C99 (ISO 9899:1999) and C++ (ISO 14882:1998) are claimed to
be compatible with the original ISO C standard in terms of the
original integral types (char, short, int, long), although these
claims are not in the standards and not normative.

Both purport that descriptions of padding bits and trap
representations were allowed, just not documented, in ISO C 90.

But there is one discrepancy:

C++ 1998 3.9.1 paragraph 1:

=====
For character types, all bits of the object representation participate
in the value representation. For unsigned character types, all
possible bit patterns of the value representation represent numbers.
These requirements do not hold for other types.
=====

This rules out padding bits in signed char, but permits trap
representations.

C99 6.2.6.2 paragraph 2:

=====
       [#2]  For  signed  integer  types,  the  bits  of the object
       representation shall be divided  into  three  groups:  value
       bits, padding bits, and the sign bit.  There need not be any
       padding bits; there shall be exactly one sign bit.
=====

There is specifically no contrary wording for signed char, so under
this standard signed char can have both padding bits and trap
representations.

The issue is:

If both standards are intended to be compatible with C90, one of must
be incorrect.  Signed char cannot both have and not have padding bits.

The questions are:

Are the two standards intentionally, or inadvertently, incompatible?

If they are intentionally incompatible, is this a difference in a
fundamental type that we truly want between C and C++?

If they are inadvertently incompatible and such a difference is not
wanted, which is incorrect?

Should signed char:

   1.  Be different between C and C++?

   2.  Allow padding bits in both languages?

   3.  Disallow padding bits in both languages?

Jack Klein
--
Home: http://jackklein.home.att.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: wmm@fastdial.net
Date: 2000/10/16 Raw View

In article <mDjqOSLqT7Z8L1bgEhHCTMYSNiXB@4ax.com>,
  Jack Klein <jackklein@spamcop.net> wrote:
> Both the C99 (ISO 9899:1999) and C++ (ISO 14882:1998) are claimed to
> be compatible with the original ISO C standard in terms of the
> original integral types (char, short, int, long), although these
> claims are not in the standards and not normative.
>
> Both purport that descriptions of padding bits and trap
> representations were allowed, just not documented, in ISO C 90.
>
> But there is one discrepancy:
>
> C++ 1998 3.9.1 paragraph 1:
>
> =====
> For character types, all bits of the object representation participate
> in the value representation. For unsigned character types, all
> possible bit patterns of the value representation represent numbers.
> These requirements do not hold for other types.
> =====
>
> This rules out padding bits in signed char, but permits trap
> representations.
>
> C99 6.2.6.2 paragraph 2:
>
> =====
>        [#2]  For  signed  integer  types,  the  bits  of the object
>        representation shall be divided  into  three  groups:  value
>        bits, padding bits, and the sign bit.  There need not be any
>        padding bits; there shall be exactly one sign bit.
> =====
>
> There is specifically no contrary wording for signed char, so under
> this standard signed char can have both padding bits and trap
> representations.
>
> The issue is:
>
> If both standards are intended to be compatible with C90, one of must
> be incorrect.  Signed char cannot both have and not have padding bits.
>
> The questions are:
>
> Are the two standards intentionally, or inadvertently, incompatible?
>
> If they are intentionally incompatible, is this a difference in a
> fundamental type that we truly want between C and C++?
>
> If they are inadvertently incompatible and such a difference is not
> wanted, which is incorrect?
>
> Should signed char:
>
>    1.  Be different between C and C++?
>
>    2.  Allow padding bits in both languages?
>
>    3.  Disallow padding bits in both languages?

I can't speak definitively for either version of C.  However, I
do know the background of that specification in C++, and it was
our understanding that we were simply making explicit what was
intended and understood to be the specification in C89.

The main reason for specifying the character types that way was
to allow something like strcpy to be written in well-defined C++,
with no magic required.  You need a type that can both access
all bits of memory and not trap on any bit patterns.  We could
have invented a "byte" type for that purpose, but we felt (and
were assured by J11 people participating in J16) that the C
standard intended that unsigned char have the needed properties.

I won't speculate as to either the implications or the rationale
of the C99 specification in this regard.

--
William M. Miller, wmm@fastdial.net
Vignette Corporation (www.vignette.com)


Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: 2000/10/16 Raw View

On Mon, 16 Oct 2000 15:48:56 GMT, wmm@fastdial.net <wmm@fastdial.net> wro=
te:
> In article <mDjqOSLqT7Z8L1bgEhHCTMYSNiXB@4ax.com>,
>   Jack Klein <jackklein@spamcop.net> wrote:
[=B7=B7=B7]
> > The questions are:
> >
> > Are the two standards intentionally, or inadvertently, incompatible?
> >
> > If they are intentionally incompatible, is this a difference in a
> > fundamental type that we truly want between C and C++?
> >
> > If they are inadvertently incompatible and such a difference is not
> > wanted, which is incorrect?
> >
> > Should signed char:
> >
> >    1.  Be different between C and C++?
> >
> >    2.  Allow padding bits in both languages?
> >
> >    3.  Disallow padding bits in both languages?
[=B7=B7=B7]
> The main reason for specifying the character types that way was
> to allow something like strcpy to be written in well-defined C++,
> with no magic required.  You need a type that can both access
> all bits of memory and not trap on any bit patterns.

There doesn't seem to be the need to apply strcpy() on a memory chunk
whose contents wasn't created by write accesses through lvalues of type
char (or by equivalent means), i.e. that doesn't contain trap values.
So this rationale doesn't seem valid.

This set aside, the C99 standard is utterly unclear on whether a
strcpy() implementation that traps on strings that contain trap
representations of type char is conforming or not.

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: wmm@fastdial.net
Date: 2000/10/16 Raw View

In article <slrn8umag9.d76.Team-Rocket@nightrunner.nm.dnsalias.net>,
  Team-Rocket@gmx.net (Niklas Matthies) wrote:
> There doesn't seem to be the need to apply strcpy() on a memory chunk
> whose contents wasn't created by write accesses through lvalues of
type
> char (or by equivalent means), i.e. that doesn't contain trap values.
> So this rationale doesn't seem valid.

What we were concerned about was "holes" in the object
representation, either in native types or as the result of
alignment padding in class or array objects.  We felt that
it was unreasonable to require implementations to initialize
the holes to prevent trapping while copying objects whose
value representation was well-defined.

--
William M. Miller, wmm@fastdial.net
Vignette Corporation (www.vignette.com)

Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Team-Rocket@gmx.net (Niklas Matthies)
Date: 2000/10/16 Raw View

On Mon, 16 Oct 2000 19:29:02 GMT, wmm@fastdial.net <wmm@fastdial.net> wrote:
> In article <slrn8umag9.d76.Team-Rocket@nightrunner.nm.dnsalias.net>,
> Team-Rocket@gmx.net (Niklas Matthies) wrote:
> > There doesn't seem to be the need to apply strcpy() on a memory
> > chunk whose contents wasn't created by write accesses through
> > lvalues of type char (or by equivalent means), i.e. that doesn't
> > contain trap values. So this rationale doesn't seem valid.
>
> What we were concerned about was "holes" in the object representation,
> either in native types or as the result of alignment padding in class
> or array objects.  We felt that it was unreasonable to require
> implementations to initialize the holes to prevent trapping while
> copying objects whose value representation was well-defined.

I don't understand how this is related to strcpy().
It seems that what you are talking about would involve memcpy(), which
doesn't have to check for '\0' chars and therefore could use unsigned
char for copying instead of plain char. And unsigned char cannot have
trap representations, as opposed to plain char (in C).

-- Niklas

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: wmm@fastdial.net
Date: Tue, 17 Oct 2000 00:41:42 GMT Raw View

In article <slrn8umn6p.kfn.Team-Rocket@nightrunner.nm.dnsalias.net>,
  Team-Rocket@gmx.net (Niklas Matthies) wrote:
> On Mon, 16 Oct 2000 19:29:02 GMT, wmm@fastdial.net <wmm@fastdial.net>
wrote:
> > In article <slrn8umag9.d76.Team-Rocket@nightrunner.nm.dnsalias.net>,
> > Team-Rocket@gmx.net (Niklas Matthies) wrote:
> > > There doesn't seem to be the need to apply strcpy() on a memory
> > > chunk whose contents wasn't created by write accesses through
> > > lvalues of type char (or by equivalent means), i.e. that doesn't
> > > contain trap values. So this rationale doesn't seem valid.
> >
> > What we were concerned about was "holes" in the object
representation,
> > either in native types or as the result of alignment padding in
class
> > or array objects.  We felt that it was unreasonable to require
> > implementations to initialize the holes to prevent trapping while
> > copying objects whose value representation was well-defined.
>
> I don't understand how this is related to strcpy().
> It seems that what you are talking about would involve memcpy(), which
> doesn't have to check for '\0' chars and therefore could use unsigned
> char for copying instead of plain char. And unsigned char cannot have
> trap representations, as opposed to plain char (in C).

Oops, yes, I intended to write "memcpy" and somehow my fingers
typed "strcpy".  Sorry for the confusion.

And the difference between char and unsigned char in C++ is as
you indicate for C: that's the reason for the specification
that all possible bit patterns in an unsigned char represent
numbers, which is not a requirement for the other character
types.

--
William M. Miller, wmm@fastdial.net
Vignette Corporation (www.vignette.com)


Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: "Clive D.W. Feather" <clive@on-the-train.demon.co.uk>
Date: 2000/10/17 Raw View

In article <mDjqOSLqT7Z8L1bgEhHCTMYSNiXB@4ax.com>, Jack Klein
<jackklein@spamcop.net> writes
>Both the C99 (ISO 9899:1999) and C++ (ISO 14882:1998) are claimed to
>be compatible with the original ISO C standard in terms of the
>original integral types (char, short, int, long),

I'm not sure what that is supposed to mean.

>Both purport that descriptions of padding bits and trap
>representations were allowed, just not documented, in ISO C 90.

WG14, when asked detailed questions about C90, gave a set of answers.
The meaning of these answers were included in the C99 review.

>C++ 1998 3.9.1 paragraph 1:
[...]
>This rules out padding bits in signed char, but permits trap
>representations.
>
>C99 6.2.6.2 paragraph 2:
[...]
>There is specifically no contrary wording for signed char, so under
>this standard signed char can have both padding bits and trap
>representations.

Correct. This means that WG14 and WG21 made different decisions.

In particular, a system with (say) 10 bit bytes might want signed char
to be [-256,255].

--
Clive D.W. Feather    | Internet Expert      | Work: <clive@demon.net>
Tel: +44 20 8371 1138 | Demon Internet       | Home: <clive@davros.org>
Fax: +44 20 8371 1037 | Thus plc             | Web:  <http://www.davros.org>
Written on my laptop; please observe the Reply-To address

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]