Thread

Topic: The preprocessor and token boundaries

Author: non-existent@iobox.com ("Sergey P. Derevyago")
Date: Wed, 18 Sep 2002 01:48:59 +0000 (UTC) Raw View

Hyman Rosen wrote:
> >       BTW, have you tried
> >  #define char_lit(arg) (#arg [0])
>
> That's not an integer constant expression, though.
 No, it isn't. But was it required, though? :)
 BTW it seems like we have some C/C++ lack here: both "a"[0] and *"a" values
are known at compile time and, in principle, can be used as switch targets.
--
         With all respect, Sergey.          http://cpp3.virtualave.net/
         mailto : ders at skeptik.net


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 13 Sep 2002 16:15:43 +0000 (UTC) Raw View

ron@sensor.com ("Ron Natalie") wrote in message
news:<jAMf9.164355$l_4.144211@atlpnn01.usenetserver.com>...
> "Hyman Rosen" <hyrosen@mail.com> wrote in message
> news:1031757740.975293@master.nyc.kbcfp.com...
> > For example, before ## came along, you could paste in some
> > preprocessors by doing this:

> > #define paste(a,b) a/**/b

And in others using:

    #define paste(a,b) a\
    b

(with an escaped new line, and no other white space).

> Actually, that was before the language rules were chnaged to say that
> comments were to be treated as whitespace.  The ## was a result of
> this kludge falling apart.

I don't think that this kludge ever worked for all compilers.

> > And before # came along, you could sometimes quote by doing this:

> > #define quote(a) "a"

And sometimes, you couldn't quote at all.

> > And the current preprocessor still has no way of doing this:

> > #define char(a) 'a'

> These were actually some ambiguity in the wording of K&R as to whether
> the replacement should occur WITHIN the literals.

I don't know if K&R specified it, but the compilers which were shipped
with stock AT&T Unix didn't.  The Rieser preprocessor (shipped with the
compilers from Berkley Unix) did.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 11 Sep 2002 16:50:43 +0000 (UTC) Raw View

cpdaniel@pacbell.net ("Carl Daniel") wrote in message
news:<pjtf9.445$B25.8179913@newssvr14.news.prodigy.com>...
> ""Ron Natalie"" <ron@sensor.com> wrote in message
> news:SGsf9.117209$l_4.62514@atlpnn01.usenetserver.com...
> > > What is the rationale for having preprocessing occur after
> > > tokenization, when existing practice at the time the standard was
> > > written would certainly have been just the opposite?

> > You're wrong there, existing practice was to tokenize first and to
> > have a (redundant) tokenizing step after the preprocessor ran.  But
> > even in that case, tokens DID NOT JUST glue together.  It would have
> > been problematic.

> I should have written more precisely - yes, that is what I assumed
> would have been common practice - the preprocessor tokenizes (using
> some set of lexical rules) before doing substitutions.  Intuitively,
> I've always thought of the output of the preprocessor as simply text,
> which is then re-tokenized by the compiler (using a possibly different
> set of lexical rules).

Historically, the preprocessor tokenized, preprocessed, then converted
the tokens back to text.  But even before C was being standardized,
there were compilers which skipped this extra step, and fed the tokens
directly into the parser.

> Ancient history question then: how, in the days of the preprocessor
> being a stand-alone "filter" (a true pre-processor) did tokens not
> "just glue together"?  (Or perhaps that's the point - they did, and it
> was problematic?)

The problem is that by the time C was being standardized, there were
several existing preprocessors, and the way the regenerated text
varied.  And there were already compilers with integrated preprocessors,
which fed the token stream directly to the parser.

Accidental token pasting is a potential problem, so given the choice,
the C standards committee decided that any token pasting must be
intentional.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Wed, 11 Sep 2002 18:30:50 +0000 (UTC) Raw View

cpdaniel@pacbell.net ("Carl Daniel") writes:

> Which still doesn't answer the question I asked: Why?  What's the rationale
> for the preprocessor to have been defined this way, when existing practice
> at the time the standard was written was very likely character-based?

The existing practice was *not* that the preprocessor is character
based. Instead, every existing preprocessor did perform a tokenization
of the input stream. Without that, you cannot recognize preprocessor
directives. Many (but not all) implementations used to produce a byte
stream output, which was then re-tokenized by the compiler.

If the standard had mandated that untokenization/retokenization is
mandated, the formulation of the standard would have been more
comlicated. In addition, the implementation of advanced features (like
macro debugging) would have become more difficult.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Wed, 11 Sep 2002 18:30:55 +0000 (UTC) Raw View

cpdaniel@pacbell.net ("Carl Daniel") writes:

> Ancient history question then: how, in the days of the preprocessor being a
> stand-alone "filter" (a true pre-processor) did tokens not "just glue
> together"?  (Or perhaps that's the point - they did, and it was
> problematic?)

Many preprocessors add white spaces when generating byte stream
output, to avoid accidental token concatenation.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hyrosen@mail.com (Hyman Rosen)
Date: Wed, 11 Sep 2002 18:32:54 +0000 (UTC) Raw View

Carl Daniel wrote:
> Ancient history question then: how, in the days of the preprocessor being a
> stand-alone "filter" (a true pre-processor) did tokens not "just glue
> together"?  (Or perhaps that's the point - they did, and it was
> problematic?)

If you run some sample code through gcc's current preprocessor,
you will notice that it emits a space between tokens which would
otherwise coalesce. Presumably, old preprocessors could have done
the same, although there was such wide variation in pre-standard
behavior that the C standard committee just went ahead and invented.

For example, before ## came along, you could paste in some
preprocessors by doing this:

 #define paste(a,b) a/**/b

And before # came along, you could sometimes quote by doing this:

 #define quote(a) "a"

And the current preprocessor still has no way of doing this:

 #define char(a) 'a'

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: ron@sensor.com ("Ron Natalie")
Date: Wed, 11 Sep 2002 20:08:45 +0000 (UTC) Raw View

"Hyman Rosen" <hyrosen@mail.com> wrote in message news:1031757740.975293@master.nyc.kbcfp.com...
> For example, before ## came along, you could paste in some
> preprocessors by doing this:
>
> #define paste(a,b) a/**/b

Actually, that was before the language rules were chnaged to say that comments
were to be treated as whitespace.   The ## was a result of this kludge falling apart.
>
> And before # came along, you could sometimes quote by doing this:
>
> #define quote(a) "a"
>
> And the current preprocessor still has no way of doing this:
>
> #define char(a) 'a'
>
These were actually some ambiguity in the wording of K&R as to whether
the replacement should occur WITHIN the literals.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hyrosen@mail.com (Hyman Rosen)
Date: Wed, 11 Sep 2002 22:15:54 +0000 (UTC) Raw View

Ron Natalie wrote:
> These were actually some ambiguity in the wording of K&R as to whether
> the replacement should occur WITHIN the literals.

Which is why I said "sometimes". I saw both kinds back then.
The ones that did the replacement were better, since there
was no other way to get the effect.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: non-existent@iobox.com ("Sergey P. Derevyago")
Date: Sat, 14 Sep 2002 22:11:24 +0000 (UTC) Raw View

Hyman Rosen wrote:
> And before # came along, you could sometimes quote by doing this:
>
>         #define quote(a) "a"
>
> And the current preprocessor still has no way of doing this:
>
>         #define char(a) 'a'
>
 BTW, have you tried

 #define char_lit(arg) (#arg [0])

?
--
         With all respect, Sergey.          http://cpp3.virtualave.net/
         mailto : ders at skeptik.net


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hyrosen@mail.com (Hyman Rosen)
Date: Sun, 15 Sep 2002 13:19:37 +0000 (UTC) Raw View

Sergey P. Derevyago wrote:
>  BTW, have you tried
>  #define char_lit(arg) (#arg [0])

That's not an integer constant expression, though.
So no 'case char(A):', for instance.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hyrosen@mail.com (Hyman Rosen)
Date: Sun, 15 Sep 2002 22:58:00 +0000 (UTC) Raw View

James Kanze wrote:
> I don't know if K&R specified it, but the compilers which were shipped
> with stock AT&T Unix didn't.  The Rieser preprocessor (shipped with the
> compilers from Berkley Unix) did.

It was all a big giant mess. The C Standard people very sensibly
invented how the preprocessor should work, and didn't worry much
about being compatible with existing implementatioms since none
of them were compatible with each other.

And I really like tokens which are "painted blue" :-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hyrosen@mail.com (Hyman Rosen)
Date: Tue, 10 Sep 2002 08:44:07 +0000 (UTC) Raw View

Carl Daniel wrote:
> 1. Should it have compiled in the first place?

Yes.

> 2. Why?

Read 2.1 on phases of translation. The source is divided
into preprocessing tokens before macro expansion is done.
Once this happens, mere adjacency does not concatenate two
tokens into one.

It might be nice if the "just preprocess" option would
insert whitespace between adjacent tokens that coalesce,
but there would probably be many objections from people
who are actually seeking this effect.

There's this case as well. Does it return 0 or 1? :-)

#define div(a,b) a/b
int main() { int a=1, b=2, *p=&b; return div(a,*p) /**/; }

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 10 Sep 2002 20:04:29 +0000 (UTC) Raw View

cpdaniel@pacbell.net ("Carl Daniel") wrote in message
news:<wD2f9.291$TF2.26050823@newssvr21.news.prodigy.com>...
> Consider:

> ----------------
> #include <vector>

> #define ALLOC(T) std::allocator<T>

> typedef std::vector<char,ALLOC(char)> V;
> -----------------

> On one popular compiler (and who knows how many others), this unit
> will compile as-is.

With all, I hope.

> With this same compiler, if one chooses to "Preprocess to a file", the
> file containing the preprocessor output will not compile, because
> std::vector<char,ALLOC(char)> has become
> std::vector<char,std::allocator<char>>.

So?  Preprocessor output is interesting to see in certain cases, but the
actual "output" of the preprocessor is a token stream, not text, and any
textual output that you can obtain is implementation dependant.

> Questions:
> 1. Should it have compiled in the first place?

Yes.  There are no ## operators to tell the preprocessor to concatenate
the tokens.

> 2. If the answer to 1 is 'yes', that would seem to imply that the
> preprocessor is not making simple character-level substitutions and
> begs the question: Why?

The preprocessor does NOT make simple character-level substitutions.
The preprocessor works on tokens, not characters.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: cpdaniel@pacbell.net ("Carl Daniel")
Date: Tue, 10 Sep 2002 20:29:32 +0000 (UTC) Raw View

"Hyman Rosen" <hyrosen@mail.com> wrote in message
news:1031596554.887595@master.nyc.kbcfp.com...
> Carl Daniel wrote:
> > 2. Why?
>
> Read 2.1 on phases of translation. The source is divided
> into preprocessing tokens before macro expansion is done.
> Once this happens, mere adjacency does not concatenate two
> tokens into one.

I was pretty sure this was the case, actually (just too lazy to go look it
up...).  But that goesn't really answer the question:  why?

What is the rationale for having preprocessing occur after tokenization,
when existing practice at the time the standard was written would certainly
have been just the opposite?

-cd

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: cpdaniel@pacbell.net ("Carl Daniel")
Date: Tue, 10 Sep 2002 20:30:00 +0000 (UTC) Raw View

"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0209100007.7f1b283e@posting.google.com...
> cpdaniel@pacbell.net ("Carl Daniel") wrote in message
> > Why?
>
> The preprocessor does NOT make simple character-level substitutions.
> The preprocessor works on tokens, not characters.

Which still doesn't answer the question I asked: Why?  What's the rationale
for the preprocessor to have been defined this way, when existing practice
at the time the standard was written was very likely character-based?

-cd


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: ron@sensor.com ("Ron Natalie")
Date: Tue, 10 Sep 2002 20:40:53 +0000 (UTC) Raw View

""Carl Daniel"" <cpdaniel@pacbell.net> wrote in message news:Pumf9.3462$1L.76478582@newssvr13.news.prodigy.com...

> I was pretty sure this was the case, actually (just too lazy to go look it
> up...).  But that goesn't really answer the question:  why?

Because if you don't tokenize first, then how do you figure out where the preprocessor
is supposed to identify the replacement tokens, example:

#define FOO 32

int FOOBAR;

You don't want it making the input stream to the compiler:
    int 32BAR;

>
> What is the rationale for having preprocessing occur after tokenization,
> when existing practice at the time the standard was written would certainly
> have been just the opposite?
>
You're wrong there, existing practice was to tokenize first and to have a (redundant)
tokenizing step after the preprocessor ran.  But even in that case, tokens DID NOT JUST
glue together.   It would have been problematic.

..


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: cpdaniel@pacbell.net ("Carl Daniel")
Date: Wed, 11 Sep 2002 02:56:04 +0000 (UTC) Raw View

""Ron Natalie"" <ron@sensor.com> wrote in message
news:SGsf9.117209$l_4.62514@atlpnn01.usenetserver.com...
> > What is the rationale for having preprocessing occur after tokenization,
> > when existing practice at the time the standard was written would
certainly
> > have been just the opposite?
> >
> You're wrong there, existing practice was to tokenize first and to have a
(redundant)
> tokenizing step after the preprocessor ran.  But even in that case, tokens
DID NOT JUST
> glue together.   It would have been problematic.

I should have written more precisely - yes, that is what I assumed would
have been common practice - the preprocessor tokenizes (using some set of
lexical rules) before doing substitutions.  Intuitively, I've always thought
of the output of the preprocessor as simply text, which is then re-tokenized
by the compiler (using a possibly different set of lexical rules).

Ancient history question then: how, in the days of the preprocessor being a
stand-alone "filter" (a true pre-processor) did tokens not "just glue
together"?  (Or perhaps that's the point - they did, and it was
problematic?)

-cd


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: cpdaniel@pacbell.net ("Carl Daniel")
Date: Mon, 9 Sep 2002 17:50:50 +0000 (UTC) Raw View

Consider:

----------------
#include <vector>

#define ALLOC(T) std::allocator<T>

typedef std::vector<char,ALLOC(char)> V;
-----------------

On one popular compiler (and who knows how many others), this unit will
compile as-is.

With this same compiler, if one chooses to "Preprocess to a file",  the file
containing the preprocessor output will not compile, because
std::vector<char,ALLOC(char)> has become
std::vector<char,std::allocator<char>>.

Questions:
1. Should it have compiled in the first place?
2. If the answer to 1 is 'yes', that would seem to imply that the
preprocessor is not making simple character-level substitutions and begs the
question: Why?

-cd


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]