Thread

Topic: alternative tokens

Author: llewelly.at@xmission.dot.com (llewelly)
Date: Wed, 21 Apr 2004 15:54:10 +0000 (UTC) Raw View

ank@despammed.com (Alexander Krotov) writes:

[snip]
> Further, section 2.12 says:
> <<
> preprocessing-op-or-punc: one of
> ..
>  new     delete ...
>  and     and_eq  bitand  bitor   compl   not     not_eq  or      or_eq
>           xor     xor_eq
>
> ..

Reserved words such as new and delete are different from alternative
    tokens. Reserved words are unconditionally treated as keywords in
    phase 7, but recieve no special treatment in phase 4, where
    preprocessing directives are executed and macros are
    expanded. (See section 2.1, as well as 2.12 .)

>>>
>
> Does this mean that "new", "delete", as well as alternative tokens
> cannot be used as macro names (as they are preprocessing-op-or-puncs,
> not identifiers) ?
>
> Section 2.12 says:
> <<
> 2 In all respects of the language, each alternative  token  behaves  the
>   same,  respectively,  as its primary token, except for its spelling5).
>   The set of alternative tokens is defined in Table 2.
>>>
>
> Does this mean that alternative tokens can be used in preprocessor
> conditional include conditions (#if) ?
>
> Test:
> <<
> #define not_eq X

ill-formed.

> #define new delete

well-formed.

>
> not_eq
> new
>
> #if 1 and 2

well-formed, condition is true.

> Foo.
> #endif
>
> #if 1 not_eq 2

well-formed, condition is true.

> Bar.
> #endif
>>>
>
> GCC 3.3 complains:
> tp.c:1:9: "not_eq" cannot be used as a macro name as it is an operator in C++
> and understands alternative tokens in #if conditions.

GCC is correct here. It is also correct to accept the two #if
    preprocessoring conditionals.

> ICC 8.0 (as far as I know EDG based) complains
> tp.c(9): error #14: extra text after expected end of preprocessing directive
>   #if 1 not_eq 2

ICC is wrong in this case.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: ank@despammed.com (Alexander Krotov)
Date: Mon, 15 Mar 2004 18:39:39 +0000 (UTC) Raw View

James Kuyper <kuyper@wizard.net> wrote:
>> >> Does this mean that alternative tokens can be used in preprocessor
>> >> conditional include conditions (#if) ?
>> >
>> > Yes, as in your sample code such as
>> >     #if 1 and 2
>> >
>> >
>> > The key to your questions is section 2.1, "Phases of translation". The
>> > alternative tokens are interpreted before any kind of macro processing.
>>
>> This is not that simple: 2.1 does not define when alternative tokens
>> are converted to tokens (it only defines when trigraphs are replaced).
>
> Yes it does. That occurs in phase 3: "The source file is decomposed
> into pre-processing tokens." The alternative tokens are not converted
> into their equivalents; the alternative tokens are simply an
> alternative way of spelling the same token, and that spelling gets
> recognised at the same time as the other spelling.

Thank you for this comment. It really answers all my questions but one.

The question remains: according to 2.12 "new" and "delete" match
lexical forms of preprocessing-op-or-punc and identifier.
Does it mean that these tokens should not be used as macro name
or macro parameter name ?

Does following rule from 16.1(Conditional inclusion) apply to
"new" and "delete" ?
<<
After all replacements due to
  macro  expansion  and  the defined unary operator have been performed,
  all remaining identifiers are replaced with the pp-number 0, and  then
  each  preprocessing  token  is  converted into a token.
>>

[Both gcc 3.3 and ICC 8.0 do treat the tokens in question as identifiers].

If they are legal identifiers at phase 4 (could be used as macro names):
is there any particular reason why "new" and "delete"
are listed in preprocessing-op-or-punc definition ?

-ank

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: ank@despammed.com (Alexander Krotov)
Date: Fri, 12 Mar 2004 17:02:10 +0000 (UTC) Raw View

The standard 2.11 says:
<<
2 Furthermore, the alternative representations shown in Table 4 for cer-
  tain operators and punctuators (_lex.digraph_) are reserved and  shall
  not be used otherwise:

                   Table 4--alternative representations

            +------------------------------------------------+
            |and      and_eq   bitand   bitor   compl    not |
            |not_eq   or       or_eq    xor     xor_eq       |
            +------------------------------------------------+
>>

Does this mean that these alternative tokens cannot be used as
macro names ?

Further, section 2.12 says:
<<
preprocessing-op-or-punc: one of
..
 new     delete ...
 and     and_eq  bitand  bitor   compl   not     not_eq  or      or_eq
          xor     xor_eq

..
>>

Does this mean that "new", "delete", as well as alternative tokens
cannot be used as macro names (as they are preprocessing-op-or-puncs,
not identifiers) ?

Section 2.12 says:
<<
2 In all respects of the language, each alternative  token  behaves  the
  same,  respectively,  as its primary token, except for its spelling5).
  The set of alternative tokens is defined in Table 2.
>>

Does this mean that alternative tokens can be used in preprocessor
conditional include conditions (#if) ?

Test:
<<
#define not_eq X
#define new delete

not_eq
new

#if 1 and 2
Foo.
#endif

#if 1 not_eq 2
Bar.
#endif
>>

GCC 3.3 complains:
tp.c:1:9: "not_eq" cannot be used as a macro name as it is an operator in C++
and understands alternative tokens in #if conditions.

ICC 8.0 (as far as I know EDG based) complains
tp.c(9): error #14: extra text after expected end of preprocessing directive
  #if 1 not_eq 2

Which means both compilers agree that alternative tokens can be used
in #if conditions, both agree that "new" can be used as macro name,
and disagree if "not_eq" can be used as macro name or not.

-ank

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Fri, 12 Mar 2004 21:49:58 +0000 (UTC) Raw View

In article <20040312155344.GA19274@solidtech.com>, Alexander Krotov
<ank@despammed.com> writes
>Which means both compilers agree that alternative tokens can be used
>in #if conditions, both agree that "new" can be used as macro name,
>and disagree if "not_eq" can be used as macro name or not.

It would be a little odd if that were not the case because the primary
motive for introducing those alternatives was because some of the
relevant groups of operators used symbols that were not part of ISO646
(if I remember why certain Nordic platforms lacked some symbols on their
keyboards)

--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: stephen.clamage@sun.com (Steve Clamage)
Date: Fri, 12 Mar 2004 21:51:30 +0000 (UTC) Raw View

Alexander Krotov wrote:
> The standard 2.11 says:
> <<
> 2 Furthermore, the alternative representations shown in Table 4 for cer-
>   tain operators and punctuators (_lex.digraph_) are reserved and  shall
>   not be used otherwise:  ...
>
> Does this mean that these alternative tokens cannot be used as
> macro names ?
> ...
>
> Does this mean that "new", "delete", as well as alternative tokens
> cannot be used as macro names (as they are preprocessing-op-or-puncs,
> not identifiers) ?
>

It depends on your definition of "can". The meaning of "shall not" is that if
the compiler allows the code to compile, the results are undefined. See section
1.4, "Implementation compliance."


> Section 2.12 says:
> <<
> 2 In all respects of the language, each alternative  token  behaves  the
>   same,  respectively,  as its primary token, except for its spelling5).
>   The set of alternative tokens is defined in Table 2.
>
>
> Does this mean that alternative tokens can be used in preprocessor
> conditional include conditions (#if) ?

Yes, as in your sample code such as
     #if 1 and 2


The key to your questions is section 2.1, "Phases of translation". The
alternative tokens are interpreted before any kind of macro processing.

---
Steve Clamage, stephen.clamage@sun.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: ank@despammed.com (Alexander Krotov)
Date: Sun, 14 Mar 2004 00:17:19 +0000 (UTC) Raw View

Steve Clamage <stephen.clamage@sun.com> wrote:
> Alexander Krotov wrote:
>> The standard 2.11 says:
>> <<
>> 2 Furthermore, the alternative representations shown in Table 4 for cer-
>>   tain operators and punctuators (_lex.digraph_) are reserved and  shall
>>   not be used otherwise:  ...
>>
>> Does this mean that these alternative tokens cannot be used as
>> macro names ?
>> ...
>>
>> Does this mean that "new", "delete", as well as alternative tokens
>> cannot be used as macro names (as they are preprocessing-op-or-puncs,
>> not identifiers) ?
>>
>
> It depends on your definition of "can". The meaning of "shall not" is that if
> the compiler allows the code to compile, the results are undefined. See section
> 1.4, "Implementation compliance."

Thank you.

>> Section 2.12 says:
>> <<
>> 2 In all respects of the language, each alternative  token  behaves  the
>>   same,  respectively,  as its primary token, except for its spelling5).
>>   The set of alternative tokens is defined in Table 2.
>>
>>
>> Does this mean that alternative tokens can be used in preprocessor
>> conditional include conditions (#if) ?
>
> Yes, as in your sample code such as
>     #if 1 and 2
>
>
> The key to your questions is section 2.1, "Phases of translation". The
> alternative tokens are interpreted before any kind of macro processing.

This is not that simple: 2.1 does not define when alternative tokens
are converted to tokens (it only defines when trigraphs are replaced).
2.5 footnote 5 says:
<<
5)  Thus the "stringized" values (_cpp.stringize_) of [ and <: will be
  different, maintaining the source spelling, but the tokens can  other-
  wise be freely interchanged.
>>

2.12 says:
<<
 Each preprocessing-op-or-punc is converted to a single token in trans-
  lation phase 7 (_lex.phases_).
>>
while macro processing happens on phase 4.

-ank

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net (James Kuyper)
Date: Sun, 14 Mar 2004 18:22:54 +0000 (UTC) Raw View

ank@despammed.com (Alexander Krotov) wrote in message news:<20040313133541.GA27668@solidtech.com>...
> Steve Clamage <stephen.clamage@sun.com> wrote:
> > Alexander Krotov wrote:
..
> >> Section 2.12 says:
> >> <<
> >> 2 In all respects of the language, each alternative  token  behaves  the
> >>   same,  respectively,  as its primary token, except for its spelling5).
> >>   The set of alternative tokens is defined in Table 2.
> >>
> >>
> >> Does this mean that alternative tokens can be used in preprocessor
> >> conditional include conditions (#if) ?
> >
> > Yes, as in your sample code such as
> >     #if 1 and 2
> >
> >
> > The key to your questions is section 2.1, "Phases of translation". The
> > alternative tokens are interpreted before any kind of macro processing.
>
> This is not that simple: 2.1 does not define when alternative tokens
> are converted to tokens (it only defines when trigraphs are replaced).

Yes it does. That occurs in phase 3: "The source file is decomposed
into pre-processing tokens." The alternative tokens are not converted
into their equivalents; the alternative tokens are simply an
alternative way of spelling the same token, and that spelling gets
recognised at the same time as the other spelling.

> while macro processing happens on phase 4.

Which comes after phase 3.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Christopher Eltschka <celtschk@web.de>
Date: Mon, 2 Jul 2001 18:22:17 GMT Raw View

MikeAlpha@NoSpam_csi.com (Martin Aupperle) writes:

> Hello,
>
> I read in the Standard that there is something like "alternative
> tokens".  And it says that these should be keywords of the language.
>
> MSVC refuses to recognize these, but provides a file iso646.h, where
> some of them are #defined.  Comeau does recognize them without
> anything else necessary.
>
> Now:
>
> 1.) Is it allowed to implement the Alternative Tokens with the help of
> the iso646.h-file, or should the Alternative Tokens be part of the
> language?

Language. I cannot imagine how a header can provide the token "<:" (of
course things like "and" or "bit_or" are mostly doable with
#define).

>
> 2.) What is the difference between a trigraph and an alternative
> token?

A trigraph is a character written as three characters.
An alternative token is just another token with the same meaning.

Trigraphs are somewhat similar to Quoted Printable =xy: There are
three characters written, but they really mean one character, which
possibly cannot be written in the used character set (BTW, are such
character sets still in use?). You could run sed over a file to
produce/remove trigraphs. Indeed, in the very first step the compiler
removes the trigraphs by replacing them with the real characters.

Alternative tokens are just ordinary tokens; the only special thing
about them is that they have the exact same meaning as some other
token.

If your compiler handles trigraphs correctly, you might get some
surprises, f.ex.:

  std::cout << "What's this???/no clue"

will output

What's this?
o clue

because the sequence ??/ is a trigraph encoding the backslash,
and therefore the above line really reads

  std::cout << "What's this?\no clue"

>
> 3.) What about replacement in string literals? AFAIK, trigraphs should
> be replaced everywhere, but alternative tokens should not? Am I right
> here?

Correct. Since trigraphs are character encodings, they apply
everywhere. At the time trigraphs are replaced, the compiler doesn't
even yet know about string constants and such things.

Alternate tokens, OTOH, are tokens, and they are not replaced. They
are just treated as identical to the original token in the grammar.
And the content of string literals is not tokenized anyway; the
complete string literal is one token in itself.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: scott douglass <sdouglass@arm.com>
Date: Wed, 20 Jun 2001 16:19:04 GMT Raw View

"James Kuyper Jr." wrote:
>
> Martin Aupperle wrote:
> >[...]
> > 2.) What is the difference between a trigraph and an alternative
> > token?
>
> Alternative tokens are in all respects exactly equivalent to the the
> ones they're alternatives to. [...]

They are exactly equivalent in all respects except the result of "stringizing":
    #include <iostream>
    #define STR(x) #x
    int main() {
      cout << STR(&&) << endl;
      cout << STR(and) << endl;
      cout << STR(#) << endl;
      cout << STR(%:) <<endl;
    }
This should print
    &&
    and
    #
    %:

> A trigraph gets replaced by the appropriate token during translation
> phase one, and there are some subtle ways, that aren't very important,
> in which this differs from being identical to the replacement.

I'm curious: How can a trigraph behave differently than it's replacement
character?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 20 Jun 2001 23:41:27 GMT Raw View

scott douglass wrote:
>
> "James Kuyper Jr." wrote:
> >
> > Martin Aupperle wrote:
> > >[...]
> > > 2.) What is the difference between a trigraph and an alternative
> > > token?
> >
> > Alternative tokens are in all respects exactly equivalent to the the
> > ones they're alternatives to. [...]
>
> They are exactly equivalent in all respects except the result of "stringizing":

Correct.

>     #include <iostream>
>     #define STR(x) #x
>     int main() {
>       cout << STR(&&) << endl;
>       cout << STR(and) << endl;
>       cout << STR(#) << endl;
>       cout << STR(%:) <<endl;
>     }
> This should print
>     &&
>     and
>     #
>     %:
>
> > A trigraph gets replaced by the appropriate token during translation
> > phase one, and there are some subtle ways, that aren't very important,
> > in which this differs from being identical to the replacement.
>
> I'm curious: How can a trigraph behave differently than it's replacement
> character?

That was a bad choice of words on my part; I should have said
"equivalent" rather than "identical". The main way in which replacement
differs from equivalence lies precisely in the fact that a trigraph acts
the same as its replacement character. You probably already know what
I'm referring to, but just in case, I'll give an example that is just an
extension of yours:

 cout << STR([) << endl;
 cout << STR(??() << endl;
 cout << STR(<:) << endl;

Should print

 [
 [
 <:

As I said - subtle, and not very important difference.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: MikeAlpha@NoSpam_csi.com (Martin Aupperle)
Date: Mon, 18 Jun 2001 07:19:45 GMT Raw View

Hello,

I read in the Standard that there is something like "alternative
tokens".  And it says that these should be keywords of the language.

MSVC refuses to recognize these, but provides a file iso646.h, where
some of them are #defined.  Comeau does recognize them without
anything else necessary.

Now:

1.) Is it allowed to implement the Alternative Tokens with the help of
the iso646.h-file, or should the Alternative Tokens be part of the
language?

2.) What is the difference between a trigraph and an alternative
token?

3.) What about replacement in string literals? AFAIK, trigraphs should
be replaced everywhere, but alternative tokens should not? Am I right
here?

(Not that I want to use these things - just want to understand).

Thanks - Martin

------------------------------------------------
Martin Aupperle
MikeAlpha@NoSpam_csi.com
(remove NoSpam_)
------------------------------------------------

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Mon, 18 Jun 2001 18:16:21 GMT Raw View

In article <3b2cd088.250443889@news.nikoma.de>, Martin Aupperle
<MikeAlpha@NoSpam_csi.com> writes
>Hello,
>
>I read in the Standard that there is something like "alternative
>tokens".  And it says that these should be keywords of the language.
>
>MSVC refuses to recognize these, but provides a file iso646.h,

That file is for C as amended in 1995. It has nothing to do with C++
that tentatively had alternative spellings several years earlier.

>where
>some of them are #defined.  Comeau does recognize them without
>anything else necessary.

Correctly.
>
>Now:
>
>1.) Is it allowed to implement the Alternative Tokens with the help of
>the iso646.h-file, or should the Alternative Tokens be part of the
>language?

That is the way they should be provided in C89 amended in 95 and in C99
but in C++ they are part of the language and should not depend on
including a header.

>
>2.) What is the difference between a trigraph and an alternative
>token?

A H*** of a lot. They are parsed in quite different ways. Not least that
a trigraph gets to mess with strings. Trigraphs were an emergency
measure introduced because of late objections to the proposed C standard
from various Nordic/Scandinavian countries based on C using characters
that were not in the ISO 646 character set.

>
>3.) What about replacement in string literals? AFAIK, trigraphs should
>be replaced everywhere, but alternative tokens should not? Am I right
>here?
Alternative tokens are exactly that, they are not literals that get
replaced by processing. 'and' and '&&' generate the same token. A string
is also a single token (just as any other literal - value) so:

cout << "a && b" << 'a' && 'b';

generates the same output as:

cout << "a && b" << 'a' and 'b';

but different out from:

cout << "a and b" << 'a' and 'b';



Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]

Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 18 Jun 2001 23:54:52 GMT Raw View

Martin Aupperle wrote:
>
> Hello,
>
> I read in the Standard that there is something like "alternative
> tokens".  And it says that these should be keywords of the language.
>
> MSVC refuses to recognize these, but provides a file iso646.h, where
> some of them are #defined.  Comeau does recognize them without
> anything else necessary.
>
> Now:
>
> 1.) Is it allowed to implement the Alternative Tokens with the help of
> the iso646.h-file, or should the Alternative Tokens be part of the
> language?

They are supposed to be part of the language. Never make the mistake of
assuming that MS cares much about conformance to the standard. It's in
their best interests to get as many people as possible to write code
that can't be ported to any other compiler.

> 2.) What is the difference between a trigraph and an alternative
> token?

Alternative tokens are in all respects exactly equivalent to the the
ones they're alternatives to. They are provided only to allow for more
easily readable code. It's never actually necessary to use them. They're
not particularly controversial.

A trigraph gets replaced by the appropriate token during translation
phase one, and there are some subtle ways, that aren't very important,
in which this differs from being identical to the replacement. The
important difference is in their purpose. Trigraphs are intended to
allow C to be written to compile even on platforms where the
corresponding replacement characters are not allowed in source code
files. For those platforms, you have no choice but to use trigraphs
(except for #, [, ], {, }, and ~, where alternative tokens are also
available).

Trigraphs are very controversial - there are people who believe that
they were the wrong solution to the problem. Even many people who
believe they might be necessary in some contexts, believe that they are
such a clumsy solution, that they should never have been standardized.
They could have been allowed only as an extension. Code meant to be
ported between such platforms and the rest of the world would have to go
through a converter that does/undoes exactly the conversion specified
for trigraphs in translation phase 1. Such a utility is needed anyway,
because most people don't actually use trigraphs.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]