Topic: Universal-character-names and escape sequences. (in C++0x)


Author: coppro <rideau3@gmail.com>
Date: Sun, 9 Sep 2007 21:54:55 CST
Raw View
Given the following code:

#include <isotream>
int main() {
std::cout<<"\   "; // backslash followed by e with accent, Unicode 00E9
return 0;
}

According to standard, should this give "\u00e9" or some similar text
as output, or should it complain about the invalidity of the string
literal? In 2.1 ([lex.phases]), it says that any non-basic character
should be replaced with the appropriate universal-character-name. In
this case, that means that the string literal is now parsed
differently. If it treats it as a character, then that doesn't legally
parse as an escape-sequence, thus failing the parsing as a string-
literal. Which is correct?

Also, are universal-character-names standard in the current version of
C++?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: =?ISO-8859-1?Q?Martin_Vejn=E1r?= <avakar@volny.cz>
Date: Thu, 13 Sep 2007 09:06:54 CST
Raw View
coppro wrote:
> Given the following code:
>
> #include <isotream>
> int main() {
> std::cout<<"\   "; // backslash followed by e with accent, Unicode 00E9
> return 0;
> }
>
> According to standard, should this give "\u00e9" or some similar text
> as output, or should it complain about the invalidity of the string
> literal? [...]

This has already been discussed, see <http://preview.tinyurl.com/3xnabf>.
There is also the corresponding defect report 578
<http://preview.tinyurl.com/3a8zvo>.

Basically, during translation phase 1, all occurrences of a character
outside the basic source character set are replaced by the corresponding
universal-character-name, i.e.

std::cout << "\   ";  // U+00E9

is changed to

std::cout << "\\u00e9";

or similar; the program should output a backslash followed by "u00e9".

I guess this is not the intended behavior (hence the defect report), and
implementations tend not to obey the standard in this respect.

> Also, are universal-character-names standard in the current version of
> C++?

I'm not sure, what you mean. universal-character-name is a non-terminal
defined by the standard in section [lex.charset] (2.2/2 in n2369).
--
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]