Thread

Topic: Is a single quotation mark at the end of a translation unit a partial pp-token?
Author: kristov@arcor.de ("Christoph Schulz")
Date: Wed, 16 Jul 2003 16:54:17 +0000 (UTC) Raw View
Hello,

I have a question with regard to recognizing of preprocessing tokens.
Consider following translation unit:

  "

(The translation unit consists of only one character, the double quote,
and the implicit newline character that follows.)

The question is, whether the translation unit requires a conforming
implementation to emit a diagnostic message, or whether the program
engenders undefined behaviour.

Paragraph 2.4/2 in the standard says:
> The categories of preprocessing tokens are: [...], character literals,
> string literals, [...], and single non-white-space characters that do
> not lexically match the other preprocessing token categories. If a '
> or a " character matches the last category, the behaviour is
undefined.

Paragraph 2.1/1/3 says:
> [...] A source file shall not end in a partial preprocessing token
> [...].

In my opinion, the first both paragraphs contradict each other.

First of all, the standard does not define what is understood by the
term
"partial preprocessing token". Preprocessing tokens are described as
lexical production rules. So either a sequence of characters match a
specific production rule for a preprocessing token - then a
preprocessing
token has been recognized -, or they do not match any rule - then no
preprocessing token could be formed with the available characters.

But intuitively, one (at least I) can imagine that a partial
preprocessing
token is something that has been recognized as the beginning of a
potential
preprocessing token. Footnote 14 on page 9 describes an example ("a
header
name that is missing the closing " or >"). If we apply this definition
to string and character literals, each character sequence that starts
with a " or a ' character and does not end with the matching quotation
mark is a partial preprocessing token.

Under this interpretation the contradiction has become visible: 2.1/1/3
says indirectly that a conforming implementation shall issue a
diagnostic
message with respect to the presented input, whereas 2.4/2 classifies
the
described case as an example of undefined behaviour, because there is
indeed a single non-white-space character (") that does not lexically
match the other preprocessing token categories. So who is right?

The problem is worsen even more by the fact that it seems clear that a
translation unit

  "x

(where x is a single character) always causes a diagnostic message to be
generated, because only 2.1/1/3 could apply in that case.

Note that the situation that a " matches the description of a "single
non-
white-space character" can only occur at the end of a translation unit.
This is different for single quotes ('), because the construction

  ''

(two following single quotes without any characters between) anywhere
in a translation unit does not match the character literal production in
2.13.2, and so both characters are classified as "single non-white-space
characters". This makes the problem even more interesting, because now
we
find that

  ''

engenders undefined behaviour, whereas

  '

could be interpreted as a partial preprocessing token (namely a partial
character literal) that requires a diagnostic message to be generated.

Could anybody throw light on this topic?


Best regards,
  Christoph


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]