Topic: CWG Defect 912


Author: seth.cantrell@gmail.com
Date: Wed, 5 Jun 2013 09:38:37 -0700 (PDT)
Raw View
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#912

I'd like to ask about the rational of the proposed resolution to this issue=
.. In addition to not quite resolving the issue, I think the proposed soluti=
on isn't the right thing to specify.

First, the proposed resolution adds 'representable in the execution charact=
er set,' however this doesn't cover UCNs which are representable in the exe=
cution character set, but which have a multi-byte representation. The addit=
ion should probably read 'representable as a single char in the execution c=
haracter set.'

Second, the resolution proposes, as a conditionally-supported feature, trea=
ting UCNs which cannot be represented the same as multichar literals. That =
is, making them ints and giving them an implementation defined value.

What is the value in allowing this new kind of multicharacter literal? Isn'=
t it safer and more portable to ensure that, for example, a literal '\u00E9=
' which is clearly a single c-char is either a char in the execution charac=
ter set or is disallowed?

For reference, GCC happens to implement the proposal as is, and it does so =
by setting the multicharacter's value to the concatenation of the bytes of =
the UTF-8 encoding. I.e., '\u00E9' is an int with the value 0xC3A9 because =
the UTF-8 representation of U+00E9 is 0xC3 0xA9. The behavior is the same w=
hether a UCN is used or the character is written literally in the source. I=
 believe this behavior was implemented not because there's any value in it =
but simply because UTF-8 source happened to behave this way prior to any de=
liberate decision.

On the other hand clang disallows such a literal:

error: character too large for enclosing character literal type
   '\u00E9';
   ^

And clang's behavior is also the same whether a UCN is used or the characte=
r is written literally. (That is, clang recognizes the UTF-8 sequence in so=
urce and treats it as a single c-char.)

The proposal as is does not improve portability, as the behavior is left to=
 the implementation to define both in terms of the literal value and in wha=
t characters produce the behavior. This is no better than a compiler extens=
ion. Additionally, specifying such a feature would seem to encourage its im=
plementation, when for the sake of safety and portability a literal with a =
single c-char should be a char if possible and ill-formed otherwise.

I believe a better proposal would be for the paragraph [lex.ccon p1] to rea=
d something like:

> [...] A character literal that does not begin with u, U, or L is an ordin=
ary character literal, also referred to as a narrow-character literal. An o=
rdinary character literal that contains a single c-char __representable as =
a single char in the execution character set__ has type char, with value eq=
ual to the numerical value of the encoding of the c-char in the execution c=
haracter set. __An ordinary character literal that contains a single c-char=
 not representable as a single char in the execution character set is ill-f=
ormed.__ [...]

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposa=
ls/?hl=3Den.



.