Topic: Splicing/Concatenation and Undefined Behavior


Author: kuyper@wizard.net (James Kuyper)
Date: Sun, 4 Jul 2004 07:22:43 +0000 (UTC)
Raw View
greg.hickman@lmco.com (Greg Hickman) wrote in message news:<cc4enu$lcq3@cui1.lmms.lmco.com>...
> "James Kuyper" <kuyper@wizard.net> wrote in message
> news:8b42afac.0407020400.300d52d8@posting.google.com...
>
>> // Splice occurs as described in 2.1p2:
>> int hello\U03\
>> 88 = 0;
>>
>> // Splice occurs as described in 2.1p4:
>> #define STR(a,b) a##b
>> #define STRING(a,b) STR(a,b)
>> int STRING(world\U03,89) = 1;
>>
>> Note: in contrast, the following code would be no problem:
>>
>> int hello\U0388 = 0;
>> int world\U0399 = 1;
>>
>> To avoid the problem, just be careful about using string
catenation,
>> and escaped new-lines.
>
>
>
> I thought these might be the kinds of scenarios being described in the
> standard, but wanted to be sure.  It isn't readily apparent to me why they
> can lead to undefined behavior.


Giving such splices undefined behavior, relieves the implementation of
any responsibility for checking the results of splices for whether or
not they contain UCNs. On an implementation which takes advantage of
that fact, the most likely form of undefined behavior would be a
failure to notice that the spliced-together identifier should have
been identified as a match with one that never needed splicing. Thus,
it wouldn't recognize the two following lines as containing the same
identifier:

int hello\U0388;
hello\U03\
88 = 1;

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: do-not-spam-benh@bwsint.com (Ben Hutchings)
Date: Mon, 5 Jul 2004 22:03:45 +0000 (UTC)
Raw View
Greg Hickman wrote:
> Why do 2.1(2) and 2.1(4) say that undefined behavior occurs if a character
> sequence that matches the syntax of a universal character name results from
> splicing physical source lines or token concatenation?

The justification I see is that this allows implementations to
interpret universal-character-names at whatever stage is most
convenient.  This may not be the actual reason for the decision.

> Does this mean it's possible to construct a well-formed program that
> unwittingly contains such undefined behavior?  If so, what might it
> look like and what can we do to prevent it?

It is possible to contrive a program with undefined behaviour due to
2.1(2):

int \u\
0100 = 0;

As for 2.1(4), I must admit I have found it useful to construct UCNs
by token concatenation in a macro in order to work with an
implementation that does not support UCNs (VC++ 6), for which the
macro was defined differently.  If you only need to support more
standard compilers then I do not see that this would be necessary.

I think it should only take a little self-disicpline to ensure that
whenever you type \u you also type a full 4 digits after it.  A
search through your source files with the (Perl-style) regexp:

(^|[^\\])(\\\\)*\\(\\\nu|u.{0,3}[^0-9A-Fa-f])

should reveal any deviation from this rule, unless you use trigraphs,
in which case you would need a slightly more complex but rather longer
regex.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Greg Hickman <greg.hickman@lmco.com>
Date: 2 Jul 2004 05:40:01 GMT
Raw View
Why do 2.1(2) and 2.1(4) say that undefined behavior occurs if a character
sequence that matches the syntax of a universal character name results from
splicing physical source lines or token concatenation?  Does this mean it's
possible to construct a well-formed program that unwittingly contains such
undefined behavior?  If so, what might it look like and what can we do to
prevent it?

Thanks,
Greg



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net (James Kuyper)
Date: Fri, 2 Jul 2004 19:47:49 +0000 (UTC)
Raw View
Greg Hickman <greg.hickman@lmco.com> wrote in message news:<cc1lhj$mt810@cui1.lmms.lmco.com>...
> Why do 2.1(2) and 2.1(4) say that undefined behavior occurs if a character
> sequence that matches the syntax of a universal character name results from
> splicing physical source lines or token concatenation?  Does this mean it's
> possible to construct a well-formed program that unwittingly contains such
> undefined behavior?  If so, what might it look like and what can we do to
> prevent it?

// Splice occurs as described in 2.1p2:
int hello\U03\
88 = 0;

// Splice occurs as described in 2.1p4:
#define STR(a,b) a##b
#define STRING(a,b) STR(a,b)
int STRING(world\U03,89) = 1;

Note: in contrast, the following code would be no problem:

int hello\U0388 = 0;
int world\U0399 = 1;

To avoid the problem, just be careful about using string catenation,
and escaped new-lines.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: greg.hickman@lmco.com (Greg Hickman)
Date: Fri, 2 Jul 2004 20:46:31 +0000 (UTC)
Raw View
"James Kuyper" <kuyper@wizard.net> wrote in message
news:8b42afac.0407020400.300d52d8@posting.google.com...
>
> // Splice occurs as described in 2.1p2:
> int hello\U03\
> 88 = 0;
>
> // Splice occurs as described in 2.1p4:
> #define STR(a,b) a##b
> #define STRING(a,b) STR(a,b)
> int STRING(world\U03,89) = 1;
>
> Note: in contrast, the following code would be no problem:
>
> int hello\U0388 = 0;
> int world\U0399 = 1;
>
> To avoid the problem, just be careful about using string catenation,
> and escaped new-lines.

I thought these might be the kinds of scenarios being described in the
standard, but wanted to be sure.  It isn't readily apparent to me why they
can lead to undefined behavior.

Thanks,
Greg



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]