Topic: header-names and tokenization (DR needed?)


Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Tue, 4 Jun 2002 12:59:23 GMT
Raw View
Hi everybody,

this is IMHO a problem in both the C99 and the C++ standard.
I pointed it out at least twice on this group, but no satisfying
answer came up. After that, I also discovered that the C committee
coped with the first part of it, so I'm pretty sure that it needs a DR
and prepared the sketch below. *Before* submitting it, I just want to
do a last attempt to be said that I'm wrong. Here's the text.


___ DR sketch ______


PROBLEM: par. 3 of 2.1 [lex.phases] of the C++ standard states: "The
process of dividing a source file's characters into preprocessing
tokens is context-dependent. [Example: see the handling of < within a
#include preprocessing directive. ]". It clearly refers to the lexing
of header-names, but this is not reflected in 2.4 [lex.pptoken] that
describes in general how tokens are formed.

1) Here's 2.4p3:
"If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token, even if
that would cause further lexical analysis to fail".

Verbatim this would mean that

if (a<3 and b>5)

yields a header-name token, which is of course not the intent.


This is what I have called above 'the first part of the problem'.

2) Considering the previous point, one would be tempted to modify
par.3 adding the following text: "There's one exception to this rule:
header-name preprocessing tokens are only formed within #include
directives" (which is the solution of C99)

Unfortunately this leaves a big problem. Accepted that "exception",
consider:

#define NAME    "file.h"
#include NAME

Now, when parsing the first line, I can only lex "file.h" as a
string-literal (I'm in the context of a #define, not of a #include).
Therefore, the line

#include NAME

will end up, after substitution of NAME, in

{#} {include} {"file.h"}

which doesn't match any of the forms in 16.2p2 and 16.3p3 and
therefore is ill-formed (and not implementation defined)

This is, again, not the intent as you can see from the example in
16.2p8.


The way I think we could save this from a logical perspective is to
"ignore" (for the purpose of, and at the moment of,  the #include
execution) the fact that what follows the #include itself is a
string-literal, and reconsider its pre-tokenized nature of
char-sequence. This, indeed, seems the intent to me (trying to smell
something from the note to 16.2p4), but as far as I know it's not
written anywhere :(


The situation outlined in 2) is BTW the same of C99 which, as I said,
has the exception considered above (in 6.4/4). As a practical matter,
the vast majority of compilers have the expected behaviour with

#include NAME

anyhow I think the conceptually the problem remains.


____ end sketch


Genny.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]