Topic: Parsing member function bodies within a class declaration
Author: Theo Norvell <theo@engr.mun.ca>
Date: Sat, 17 Feb 2001 22:59:12 GMT Raw View
William: Thanks for your detailed reply.
William M Miller wrote:
>
> theo@engr.mun.ca (Theo Norvell) wrote in <3A8AE46D.B8A4E94D@engr.mun.ca>:
>
> >[Discussion of rewriting technique described in the ARM]
> >
> >In the ISO standard, I can't seem to find a similar statement about
> >rewriting. Instead there is a somewhat confusing (to me, at least)
> >section on class scope (section 3.3.6).
>
> The rewriting idea was dropped for a couple of reasons. First, it
> sounded very much like an implementation technique, and it's not
> the job of the Standard to prescribe implementation techniques.
Fair enough, but of course you don't write language standards without
thinking about implementation techniques. It seems that the ISO
standard
tries to allow the parsing of the member function bodies either
when they are encountered, or after the whole class declaration
has been seen (the rewriting technique). The first way
suggests that most analysis must be postponed until after the
parsing is complete; the second way allows parsing and analysis
in the same pass. Since I want to avoid building a syntax tree,
I'm trying to make the rewriting technique work, by saving the tokens
of the member function's body. Is this a really bad idea?
> Second, it doesn't really work for local classes -- there's no
> place you could write the definition of the member function of a
> local class outside the class definition, so it doesn't make
> sense to describe the effect of such a member function as being
> "just like" something that's impossible.
The ARM makes the same point. Can the rewritten definition not be
postponed until the compiler is outside all classes and functions?
I guess if a local member function refers to a variable or function
declared within the function, then it can't be done. Am I missing
something else?
> One effect of this change was to make your example ill-formed.
> Because of the context-sensitive nature of the C++ grammar, type
> names must be known to be such before they are used. Because
> the declaration of MemberType has not been seen at the point at
> which it is used in bar(), the text "MemberType i = 41;" is a
> syntax error, because you can't have two non-type identifiers
> together that way.
Interestingly neither Digital C++ nor Gnu g++ makes a diagnostic.
tera 74 | cat member-posted1.cpp
class Foo
{ public : int bar() { MemberType i = 41 ; return ++i ; }
private : typedef int MemberType ;
};
tera 78 | cxx -strict_ansi_std -c member-posted1.cpp
tera 79 | cxx -V
DIGITAL C++ V6.0-010 on DIGITAL UNIX V4.0 (Rev. 878)
tera 80 | g++ -ansi -pedantic -c member-posted1.cpp
tera 81 | g++ --version
2.8.1
tera 82 |
As an aside, compiling by parsing first and then analysing an abstract
syntax tree looks
like a real pain to me. Consider this example from the standard:
typedef int c ;
class X {
int f() { return sizeof(c); } // Ok X::c
char c ; } ;
sizeof applied to a type and sizeof applied to an expression could
result
in different kinds of AST nodes, and the parser will produce the wrong
kind
of node. If the above is legal, is this?
typedef int c ;
class X {
int f() { c (b) ; }
char c(int a) {return 'z'i}
int b ;
} ;
Should c(b) be parsed as a function call or a declaration? It seems to
me that in
both these examples, reordering the member declarations yields an
"alternate valid
program", and hence they do not conform to the standard. Yet the first
example is
right out of the standard. Obviously I am not understanding something.
Cheers,
Theodore Norvell
(theo@engr.mun.ca)
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
[ Note that the FAQ URL has changed! Please update your bookmarks. ]
Author: wmm@fastdial.net
Date: Mon, 19 Feb 2001 17:38:22 GMT Raw View
In article <3A8E950A.A9CE4619@engr.mun.ca>, Theo Norvell <theo@engr.mun.ca>
writes:
>William: Thanks for your detailed reply.
>
>William M Miller wrote:
>>
>> theo@engr.mun.ca (Theo Norvell) wrote in <3A8AE46D.B8A4E94D@engr.mun.ca>:
>>
>> >[Discussion of rewriting technique described in the ARM]
>> >
>> >In the ISO standard, I can't seem to find a similar statement about
>> >rewriting. Instead there is a somewhat confusing (to me, at least)
>> >section on class scope (section 3.3.6).
>>
>> The rewriting idea was dropped for a couple of reasons. First, it
>> sounded very much like an implementation technique, and it's not
>> the job of the Standard to prescribe implementation techniques.
>
>Fair enough, but of course you don't write language standards without
>thinking about implementation techniques. It seems that the ISO
>standard
>tries to allow the parsing of the member function bodies either
>when they are encountered, or after the whole class declaration
>has been seen (the rewriting technique). The first way
>suggests that most analysis must be postponed until after the
>parsing is complete; the second way allows parsing and analysis
>in the same pass. Since I want to avoid building a syntax tree,
>I'm trying to make the rewriting technique work, by saving the tokens
>of the member function's body. Is this a really bad idea?
No, that's fine -- we obviously had that technique in mind when
we were devising the new specification. We didn't want to
invalidate it, just not to require it.
>> Second, it doesn't really work for local classes -- there's no
>> place you could write the definition of the member function of a
>> local class outside the class definition, so it doesn't make
>> sense to describe the effect of such a member function as being
>> "just like" something that's impossible.
>
>The ARM makes the same point. Can the rewritten definition not be
>postponed until the compiler is outside all classes and functions?
>I guess if a local member function refers to a variable or function
>declared within the function, then it can't be done. Am I missing
>something else?
void f() {
struct S {
void g();
};
// Can't define S::g() here because there are
// no nested function definitions
}
// Can't define S::g() here because S isn't in scope
// any more.
>> One effect of this change was to make your example ill-formed.
>> Because of the context-sensitive nature of the C++ grammar, type
>> names must be known to be such before they are used. Because
>> the declaration of MemberType has not been seen at the point at
>> which it is used in bar(), the text "MemberType i = 41;" is a
>> syntax error, because you can't have two non-type identifiers
>> together that way.
>
>Interestingly neither Digital C++ nor Gnu g++ makes a diagnostic.
>
>tera 74 | cat member-posted1.cpp
>class Foo
> { public : int bar() { MemberType i = 41 ; return ++i ; }
> private : typedef int MemberType ;
> };
>tera 78 | cxx -strict_ansi_std -c member-posted1.cpp
>tera 79 | cxx -V
>DIGITAL C++ V6.0-010 on DIGITAL UNIX V4.0 (Rev. 878)
>tera 80 | g++ -ansi -pedantic -c member-posted1.cpp
>tera 81 | g++ --version
>2.8.1
>tera 82 |
>
>As an aside, compiling by parsing first and then analysing an abstract
>syntax tree looks
>like a real pain to me. Consider this example from the standard:
> typedef int c ;
> class X {
> int f() { return sizeof(c); } // Ok X::c
> char c ; } ;
>sizeof applied to a type and sizeof applied to an expression could
>result
>in different kinds of AST nodes, and the parser will produce the wrong
>kind
>of node. If the above is legal, is this?
> typedef int c ;
> class X {
> int f() { c (b) ; }
> char c(int a) {return 'z'i}
> int b ;
> } ;
>Should c(b) be parsed as a function call or a declaration? It seems to
>me that in
>both these examples, reordering the member declarations yields an
>"alternate valid
>program", and hence they do not conform to the standard. Yet the first
>example is
>right out of the standard. Obviously I am not understanding something.
No, I was wrong about this part. Your original example is fine,
not an error as I incorrectly claimed, and both the examples you
cite above use the class members, not the global declarations.
I'm not sure exactly how I got off the beam, but the significant
part of the specification is found in 3.3.6p1, point #1:
The potential scope of a name declared in a class consists
not only of the declarative region following the name s
declarator, but also of all function bodies, default
arguments, and constructor ctor-initializers in that class
(including such things in nested classes).
That means that a "forward reference" of any sort, even one that
involves the typename grammar dependency, from inside a member
function body is okay. The same goes for default arguments and
ctor-initializers. Every other use of a name in a class definition,
however, is lexically scoped and thus subject to the "shall refer
to the same declaration" and "alternate valid program" restrictions.
That reminds me of the other reason for the change from the
rewriting rule which I had forgotten: parameter lists are handled
in a way that isn't easily describable in terms of rewriting.
Consider a class with a "typedef int T;" at the very end. A
member function declared before the typedef can use "T" as part
of a default argument but not as part of the type of a parameter.
That is,
void f(T);
is a syntax error, but
void f(int = T());
is fine. To describe this in terms of a rewrite rule would mean
that the default arguments, not just the body, of the function
would need to be removed from the in-class declaration and placed
into the rewritten definition. That's kind of awkward to describe,
so it also contributed to the motivation for the change away from
the rewrite rule.
Sorry for misleading you, and thanks for following up to help me
straighten out my thinking.
-- William M. Miller
----- Posted via NewsOne.Net: Free (anonymous) Usenet News via the Web -----
http://newsone.net/ -- Free reading and anonymous posting to 60,000+ groups
NewsOne.Net prohibits users from posting spam. If this or other posts
made through NewsOne.Net violate posting guidelines, email abuse@newsone.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
[ Note that the FAQ URL has changed! Please update your bookmarks. ]
Author: Theodore Norvell <theo@engr.mun.ca>
Date: Mon, 19 Feb 2001 22:05:35 GMT Raw View
Just to finish this up, here is an interesting example:
typedef int c ;
class X {
public: int f() { c (b) ; return 1 ;} // Call c.
private: char c(int a) {return 'z'} // Missing ;
private: static int b // Another missing ;
} ;
This example contains two syntax errors. Both Digital C++ and g++ report
the second one first and the first one second. Microsoft C++ reports only
the second, but will report the first, if the second is corrected.
Replacing the syntax errors with lexical errors result in no such inversion.
This suggests that all these compiler are squirreling the tokens of
the function defintion away until after the class specifier is complete.
Cheers,
Theodore Norvell
----------------------------
Dr. Theodore Norvell theo@engr.mun.ca
Electrical and Computer Engineering http://www.engr.mun.ca/~theo
Engineering and Applied Science Phone: (709) 737-8962
Memorial University of Newfoundland Fax: (709) 737-4042
St. John's, NF, Canada, A1B 3X5
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
[ Note that the FAQ URL has changed! Please update your bookmarks. ]
Author: "William M Miller" <wmm@fastdial.net>
Date: Thu, 15 Feb 2001 17:50:04 GMT Raw View
theo@engr.mun.ca (Theo Norvell) wrote in <3A8AE46D.B8A4E94D@engr.mun.ca>:
>Hi. In the ARM, Stroustrup and Ellis are pretty clear that
>member function definitions that occur within a class
>declaration should appear to be parsed and analysed at the end of
>the class declaration. (See section 9.3.2) E.g.
>
> class Foo
> { public : int bar() { MemberType i = 41 ; return ++i ; }
> private : typedef int MemberType ;
> } ;
>
>is perfectly legitimate. It is as if the compiler had rewritten
>the class as:
>
> class Foo
> { public : int bar() ;
> private : typedef int MemberType ;
> } ;
> inline Foo::bar() { MemberType i = 41 ; return ++i ; }
>
>Using this idea, a compiler for C++ can clearly be one-pass, as long
>as it saves the tokens of the function bodies and parses those bodies
>right after the end of the outermost class declaration containing the
>function.
>
>In the ISO standard, I can't seem to find a similar statement about
>rewriting. Instead there is a somewhat confusing (to me, at least)
>section on class scope (section 3.3.6).
The rewriting idea was dropped for a couple of reasons. First, it
sounded very much like an implementation technique, and it's not
the job of the Standard to prescribe implementation techniques.
Second, it doesn't really work for local classes -- there's no
place you could write the definition of member function of a
local class outside the class definition, so it doesn't make
sense to describe the effect of such a member function as being
"just like" something that's impossible. Instead, the concept
of class scope was defined to make forward references to other
class members permissible.
One effect of this change was to make your example ill-formed.
Because of the context-sensitive nature of the C++ grammar, type
names must be known to be such before they are used. Because
the declaration of MemberType has not been seen at the point at
which it is used in bar(), the text "MemberType i = 41;" is a
syntax error, because you can't have two non-type identifiers
together that way.
(This example illustrates the point about implementation
techniques. Because of the context-sensitive grammar, supporting
your example almost requires the rewriting implementation. That
would preclude compilers from an otherwise-reasonable approach of
parsing in-class member functions in situ and just patching up
potential member references at the end of the class definition.
We didn't see a pressing need to enforce the choice of one of
those implementation techniques.)
>I'm currently planning a front-end for a C++ subset interpreter.
>
>My questions are:
> --Does the ISO standard preclude the rewriting strategy
> suggested by Stroustrup and Ellis?
No, but a conforming implementation is required to issue a
diagnostic for the example above because it violates a syntax
rule. You'd have to have code to check for it explicitly. In
your case, however, since you're only supporting a subset of the
language anyway, accepting this code without a diagnostic would
probably be justifiable (the implementation wouldn't be conforming
regardless of the treatment of this syntax issue).
> --What exactly is the difference between the ISO standard
> and the ARM on this point?
I can't think of any other differences than the ones I mentioned
above.
> --Is there any other reason not to compile C++ in one-pass?
You need a similar kind of technique for templates.
> --The C standard came with a lovely companion called "the
>rationale".
> Is there a similar rationale for the ISO standard?
Unfortunately, no. The original intention of the Committee was to
produce a Rationale document like J11's, but there was difficulty
finding a volunteer to be responsible for it and the idea was
reluctantly dropped. However, combining the ARM with Stroustrup's
_Design and Evolution of C++_ covers the majority of what would
have been in the Rationale document.
--
William M. Miller, wmm@fastdial.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
[ Note that the FAQ URL has changed! Please update your bookmarks. ]