Topic: header-names and tokenization (DR needed?)
Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Tue, 4 Jun 2002 12:59:23 GMT Raw View
Hi everybody,
this is IMHO a problem in both the C99 and the C++ standard.
I pointed it out at least twice on this group, but no satisfying
answer came up. After that, I also discovered that the C committee
coped with the first part of it, so I'm pretty sure that it needs a DR
and prepared the sketch below. *Before* submitting it, I just want to
do a last attempt to be said that I'm wrong. Here's the text.
___ DR sketch ______
PROBLEM: par. 3 of 2.1 [lex.phases] of the C++ standard states: "The
process of dividing a source file's characters into preprocessing
tokens is context-dependent. [Example: see the handling of < within a
#include preprocessing directive. ]". It clearly refers to the lexing
of header-names, but this is not reflected in 2.4 [lex.pptoken] that
describes in general how tokens are formed.
1) Here's 2.4p3:
"If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token, even if
that would cause further lexical analysis to fail".
Verbatim this would mean that
if (a<3 and b>5)
yields a header-name token, which is of course not the intent.
This is what I have called above 'the first part of the problem'.
2) Considering the previous point, one would be tempted to modify
par.3 adding the following text: "There's one exception to this rule:
header-name preprocessing tokens are only formed within #include
directives" (which is the solution of C99)
Unfortunately this leaves a big problem. Accepted that "exception",
consider:
#define NAME "file.h"
#include NAME
Now, when parsing the first line, I can only lex "file.h" as a
string-literal (I'm in the context of a #define, not of a #include).
Therefore, the line
#include NAME
will end up, after substitution of NAME, in
{#} {include} {"file.h"}
which doesn't match any of the forms in 16.2p2 and 16.3p3 and
therefore is ill-formed (and not implementation defined)
This is, again, not the intent as you can see from the example in
16.2p8.
The way I think we could save this from a logical perspective is to
"ignore" (for the purpose of, and at the moment of, the #include
execution) the fact that what follows the #include itself is a
string-literal, and reconsider its pre-tokenized nature of
char-sequence. This, indeed, seems the intent to me (trying to smell
something from the note to 16.2p4), but as far as I know it's not
written anywhere :(
The situation outlined in 2) is BTW the same of C99 which, as I said,
has the exception considered above (in 6.4/4). As a practical matter,
the vast majority of compilers have the expected behaviour with
#include NAME
anyhow I think the conceptually the problem remains.
____ end sketch
Genny.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]