Topic: Universal-character-names and escape sequences. (in C++0x)
Author: coppro <rideau3@gmail.com>
Date: Sun, 9 Sep 2007 21:54:55 CST Raw View
Given the following code:
#include <isotream>
int main() {
std::cout<<"\ "; // backslash followed by e with accent, Unicode 00E9
return 0;
}
According to standard, should this give "\u00e9" or some similar text
as output, or should it complain about the invalidity of the string
literal? In 2.1 ([lex.phases]), it says that any non-basic character
should be replaced with the appropriate universal-character-name. In
this case, that means that the string literal is now parsed
differently. If it treats it as a character, then that doesn't legally
parse as an escape-sequence, thus failing the parsing as a string-
literal. Which is correct?
Also, are universal-character-names standard in the current version of
C++?
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: =?ISO-8859-1?Q?Martin_Vejn=E1r?= <avakar@volny.cz>
Date: Thu, 13 Sep 2007 09:06:54 CST Raw View
coppro wrote:
> Given the following code:
>
> #include <isotream>
> int main() {
> std::cout<<"\ "; // backslash followed by e with accent, Unicode 00E9
> return 0;
> }
>
> According to standard, should this give "\u00e9" or some similar text
> as output, or should it complain about the invalidity of the string
> literal? [...]
This has already been discussed, see <http://preview.tinyurl.com/3xnabf>.
There is also the corresponding defect report 578
<http://preview.tinyurl.com/3a8zvo>.
Basically, during translation phase 1, all occurrences of a character
outside the basic source character set are replaced by the corresponding
universal-character-name, i.e.
std::cout << "\ "; // U+00E9
is changed to
std::cout << "\\u00e9";
or similar; the program should output a backslash followed by "u00e9".
I guess this is not the intended behavior (hence the defect report), and
implementations tend not to obey the standard in this respect.
> Also, are universal-character-names standard in the current version of
> C++?
I'm not sure, what you mean. universal-character-name is a non-terminal
defined by the standard in section [lex.charset] (2.2/2 in n2369).
--
Martin
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]