Thread

Topic: UCNs in phase 1

Author: stephan.bergmann@sun.com (Stephan Bergmann)
Date: Mon, 24 Jan 2005 18:17:48 GMT Raw View

msalters wrote:
> neil@daikokuya.co.uk wrote:
>
>>In view of lex.phases paragraph 1, given a translation unit
>>
>>#define str(x) #x
>>char array[] = str($);
>>
>>what characters should array[] contain?
>
>
> $ isn't in the set of supported characters, so anything would be legal.
> (2.2/1 The basic source character set ). An error would be legal as
> well,
> in fact I think a diagnostic is required.

Oh, yes, you are probably right:  If $ is translated in phase 1 to
\u0024 or \U00000024, then it is translated in phase 3 to an identifier
preprocessing-token, which is malformed, as "Each
universal-character-name in an identifier shall designate a character
whose encoding in ISO 10646 falls into one of the ranges specified in
Annex E." [2.10/1]

However, if $ is translated in phase 1 to \u00c0...

-Stephan

> Regards,
> Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "msalters" <Michiel.Salters@logicacmg.com>
Date: Mon, 31 Jan 2005 12:22:27 CST Raw View

Stephan Bergmann wrote:
> msalters wrote:
> > neil@daikokuya.co.uk wrote:
> >
> >>In view of lex.phases paragraph 1, given a translation unit
> >>
> >>#define str(x) #x
> >>char array[] = str($);
> >>
> >>what characters should array[] contain?
> >
> >
> > $ isn't in the set of supported characters, so anything would be
legal.
> > (2.2/1 The basic source character set ). An error would be legal as
> > well,
> > in fact I think a diagnostic is required.
>
> Oh, yes, you are probably right:  If $ is translated in phase 1 to
> \u0024 or \U00000024, then it is translated in phase 3 to an
identifier
> preprocessing-token, which is malformed, as "Each
> universal-character-name in an identifier shall designate a character

> whose encoding in ISO 10646 falls into one of the ranges specified in

> Annex E." [2.10/1]
>
> However, if $ is translated in phase 1 to \u00c0...

then it must be treated as if you wrote (2.2/1.1)
#define str(x) #x
char array[] = str(\u00c0);

\u00c0 is a UCN, and a pp-token which cannot be a phase 7 token.
However, in phase 4 the # operator is applied which turns the \u00c0
pp-token into a quoted form ("\\u00c0"). This quoted pp-token will
turn into a string literal in phase 7.

Regards,
Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: stephan.bergmann@sun.com (Stephan Bergmann)
Date: Thu, 3 Feb 2005 18:23:46 GMT Raw View

msalters wrote:
> Stephan Bergmann wrote:
>
>>msalters wrote:
>>
>>>neil@daikokuya.co.uk wrote:
>>>
>>>
>>>>In view of lex.phases paragraph 1, given a translation unit
>>>>
>>>>#define str(x) #x
>>>>char array[] = str($);
>>>>
>>>>what characters should array[] contain?
>>>
>>>
>>>$ isn't in the set of supported characters, so anything would be
>
> legal.
>
>>>(2.2/1 The basic source character set ). An error would be legal as
>>>well,
>>>in fact I think a diagnostic is required.
>>
>>Oh, yes, you are probably right:  If $ is translated in phase 1 to
>>\u0024 or \U00000024, then it is translated in phase 3 to an
>
> identifier
>
>>preprocessing-token, which is malformed, as "Each
>>universal-character-name in an identifier shall designate a character
>
>
>>whose encoding in ISO 10646 falls into one of the ranges specified in
>
>
>>Annex E." [2.10/1]
>>
>>However, if $ is translated in phase 1 to \u00c0...
>
>
> then it must be treated as if you wrote (2.2/1.1)
> #define str(x) #x
> char array[] = str(\u00c0);
>
> \u00c0 is a UCN, and a pp-token which cannot be a phase 7 token.

I think \u00c0 is an identifier token ("00c0" is listed in Annex E).

> However, in phase 4 the # operator is applied which turns the \u00c0
> pp-token into a quoted form ("\\u00c0"). This quoted pp-token will
> turn into a string literal in phase 7.

I think the # operator turns it into "\u00c0", not "\\u00c0".  See
16.3.2/2: "[...] the original spelling of each preprocessing token in
the argument is retained in the character string literal, except for
special handling for producing the spelling of string literals and
character literals [...]"  No special handling of identifiers.

-Stephan

> Regards,
> Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: neil@daikokuya.co.uk
Date: Thu, 20 Jan 2005 15:56:26 CST Raw View

In view of lex.phases paragraph 1, given a translation unit

#define str(x) #x
char array[] = str($);

what characters should array[] contain?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: neil@daikokuya.co.uk
Date: Fri, 21 Jan 2005 11:23:44 CST Raw View

neil@daikokuya.co.uk wrote:
> In view of lex.phases paragraph 1, given a translation unit
>
> #define str(x) #x
> char array[] = str($);
>
> what characters should array[] contain?

Sorry, I meant

char array[] = str("$");

A literal reading of the standard would seem to require this become

char array[] = "\"\\u0024\"";

whereby the $ is completely lost.  Is this correct?  Could it use the
\U 8-digit form?  If the hexadecimal form has the letters a-f in it,
should they be upper or lower case?  etc.

Neil.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: stephan.bergmann@sun.com (Stephan Bergmann)
Date: Fri, 21 Jan 2005 17:23:04 GMT Raw View

neil@daikokuya.co.uk wrote:
> In view of lex.phases paragraph 1, given a translation unit
>
> #define str(x) #x
> char array[] = str($);
>
> what characters should array[] contain?

My understanding is that str($) results in a string literal of either of
the two forms

   "\uXXXX"  "\UXXXXXXXX"

(where each "X" is a hex-digit).  Note that it does *not* result in a
string literal of either of the two forms

   "\\uXXXX"  "\\UXXXXXXXX"

At least that is how I interpret 16.3.2/2.

On a related note, what puzzles *me* is what the string literal "\$"
shall denote.

-Stephan

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "msalters" <Michiel.Salters@logicacmg.com>
Date: Fri, 21 Jan 2005 11:23:40 CST Raw View

neil@daikokuya.co.uk wrote:
> In view of lex.phases paragraph 1, given a translation unit
>
> #define str(x) #x
> char array[] = str($);
>
> what characters should array[] contain?

$ isn't in the set of supported characters, so anything would be legal.
(2.2/1 The basic source character set ). An error would be legal as
well,
in fact I think a diagnostic is required.

Regards,
Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]