Topic: Operator Precedence (C++0x)
Author: "David Thompson" <david.thompson1@worldnet.att.net>
Date: Sun, 9 Sep 2001 00:52:50 GMT Raw View
Niklas Matthies <news_comp.std.c++_expires-2001-10-01@nmhq.net> wrote
and for some reason my M$OE refuses to quote correctly,
although it works fine on other messages ??!, so I blockquote >>>
On Wed, 29 Aug 2001 21:59:58 GMT, Hans Aberg <remove.haberg@matematik.su.se>
wrote:
> In article <mtXi7.386002$EF2.47911591@typhoon.nyroc.rr.com>,
> gregod@cs.rpi.edu wrote:
[ ]
> >I picked up the terminology from concept specifications in the Tecton
> >language. I'll ask one of the authors where the term 'confix' came from.
....
Personally, I'd find "circumfix" a good word. Everyone will understand
at once that it refers to something that is "around" something other.
<<<
I haven't been following this thread, but if you mean parentheses
or something similar, the usual term for that IME is "matchfix".
--
- David.Thompson 1 now at worldnet.att.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 10 Sep 2001 16:53:35 GMT Raw View
In article <wVym7.12451$Uf1.1065574@bgtnsc06-news.ops.worldnet.att.net>,
"David Thompson" <david.thompson1@worldnet.att.net> wrote:
>>Personally, I'd find "circumfix" a good word. Everyone will understand
>>at once that it refers to something that is "around" something other.
...
>I haven't been following this thread, but if you mean parentheses
>or something similar, the usual term for that IME is "matchfix".
I don't know what "IME" is, but the terminology seems to vary. It's good
with some more suggestions, even though the naming question is a side
issue.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: kgw-zamboni-news@stiscan.com
Date: Tue, 28 Aug 2001 21:02:14 GMT Raw View
On Tue, 28 Aug 2001 16:41:12, Douglas Gregor <gregod@cs.rpi.edu>
wrote: WEEKDAY
>Hans Aberg wrote:
>> In article <t31ym14cxl.fsf@watts.itp.tuwien.ac.at>, Christopher Eltschka
>> <celtschk@web.de> wrote:
>>
>>>remove.haberg@matematik.su.se (Hans Aberg) writes:
>>>Well, it's always the problem to find the right balance between being
>>>simple and being general and flexible. Precedence numbers are the
>>>simplest, but most unflexible solution. Operator pair rules are the
>>>most general and flexible, but also the most complex solution. I'm
>>>sure there are lots of solutions in between.
>>
C++ is already pushing parseing away from context free to the extreme!
If I have two classes each defining the same operators but with
different precedence and associativity there is no way anyone can
parse an expression without knowing the types of the variables
involved.
Implicit casting makes it even more difficult.
C and C++ already has too many design mistakes which lead to programer
errors.
Yes, I would like to add new operators to my programs,
but the operators MUST have one global precedence for the entire
program.
This would make it difficult to use "libraries" unless the precedence
and associativity where predefined for all possible operators.
I am entirely happy with this but see no other way out the the
problem!
--
Remove -zamboni to reply
All the above is hearsay and the opinion of no one in particular
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Tue, 28 Aug 2001 21:11:46 GMT Raw View
In article <wrPi7.382121$EF2.47461394@typhoon.nyroc.rr.com>,
gregod@cs.rpi.edu wrote:
>> A one in between suggestion is that operators are locally totally ordered,
>> that is, one might specify say "=" < "+", which would meant that p(=, +) =
>> shift, p(+, =) = reduce. For binary operators one will have to indicate
>> p(=, =) = shift, p(+, +) = reduce by declaring "=" right and "+" left
>> associative. If "=" < "+" and "+" < "*", then one will assume that "=" <
>> "*". If "+" = "-", then I used the rule p("+", "-") := p("+", "+").
>
>This solution strikes me as most natural. I recently implemented a parser
>for user-defined operators similar to what I would want to see in C++, and
>it used simple precedence relationships < (lower precedence than), = (equal
>precedence to), and > (greater precedence than). These precedence
>relationships are transitive, ...
This sounds great. If you would be able (cf. below) to make an
implementation that simulates sets of precedence rules attached to
namespaces (showing how to merge precedence graphs), then one would get a
very interesting proposal for C++. There would remain the syntax name
overloading question though.
BS said he did no want to have a major additions of C++ in this revision
of the standard, but funnily enough, I think it might be a rather minor
addition from the practical point of view of compiler writing, because it
is about adding a feature how to interpret an expression and nothing else.
So if there is some example code available, showing how to do it, it
should not be so difficult for compiler writers to implement it.
>Infix operators are defined to be left, right, or non-associative. If
>operators of the same precedence need to be compared, two left-associative
>operators result in left associativity; two right-associative operators
>result in right-associativity; all other combinations result in a parse
>error.
These rules sound OK with me. I think that for C++, one should consider to
drop non-associativity, because that could be handled by the type system.
I allowed one to mix right and left associative operators of the same
precedence, but one can just as well require the precedences to be
unequal, as you do.
>> Then one idea might be to associate such precedence rule with the
>> namespaces, in which case one will have to figure out to merge them when
>> more than one namespace is used. Conflicts can always be resolve by
>> inserting parentheses,
>> but one other suggestion is that one can write the operator names
>> explicitly, like x std::+ y if there is a conflict present. One should
>> also be able to override old precedence declarations.
>
>I'm going to reassert the possibility of using a precedence graph. A
>precedence graph is a directed graph where:
Yes, it is clear that one ends up with a (colored) directed graph.
> - Each vertex represents a set of operators at equal precedence. So a
>precedence relation '+' = '-' would merge the vertices of '+' and '-'.
> - Each edge (u, v) means that the operators in u have a lower precedence
>than the operators in v.
OK, unless one gets into some problems when merging the graphs. Then, as a
point of departure one could tray say having exactly one vertex for every
operator. If say '+' and '-' have the same precedence, then there will be
a special bidirectional arrow (or an edge) '+' <-> '-'. (The classical
graph thing would be to have two arrows '+' -> '-' and '-' -> '+'.)
As infix operators can be compared with themselves, give their vertex a
color "left" or "right". Then your rule above says that the combination
'+' <-> '='
left '+'
right '='
is not allowed.
Then one do simplifications as needed.
>Given two operators, x and y, if y is reachable from x, then y has a higher
>precedence than x; if x is reachable from y, then x has a higher precedence
>and y. If there is a cycle between x and y or if neither is reachable from
>the other, the relative precedence is unknown.
If there is a cycle, excluding equal precedence, then there should be an
error, as it violates the transitivity. But the error should only be
generated if there is an expression making us of the cycle. This may
require some thought, for example
A: infix ^ < infix + -- standard C precedence
B: infix + < infix ^ -- '^' as exponentiation
Then one would expect
namespace A {
a + b ^ c; // OK as + and ^ both are in namespace A.
}
namespace B {
a + b ^ c; // OK as + and ^ both are in namespace B.
}
a + b ^ c; // Error: an expression occurred with the loop
// infix ^ -> infix + -> infix ^
>Each namespace will have its own precedence graph. In an expression, the
>precedence graph from the current namespace is merged with the precedence
>graphs from all namespaces found through Koenig lookup. Merging of
>precedence graphs requires merging vertices with the same operator.
The problem is that one wants to allow loops if the merged graphs if the
set of operators involved are not used in an expression. So one needs a
clever system to detect these loops dynamically while parsing an
expression.
>Associativities are easy to merge. When merging infix operators x and y,
>then resulting associativity is:
> left: if x and y are both left-associative
> right: if x and y are both right-associative
> otherwise non-associative.
I think that one should have an error if the associativities disagree.
>Parentheses may always be used to disambiguate expressions, so I don't
>believe we need an explicit mechanism to specify which operator '+' we want
>in, e.g., x + y. That's for semantic analysis and not parsing.
This problem of resolving ambiguities is anyway independent of the problem
of merging precedence graphs. Otherwise, I like the idea to be able to
qualify the precedence explicitly by namespace, like in the example above,
writing say
a + b A::^ c;
The reason I think it might be needed is this: Suppose that you load the
polynomials package in namespace B, and then write
polynomial a, b, x;
...
x = a + b ^ 2;
Then the compiler will give an
Error: Precedence conflict A::^ and B::^
The quick way to fix this is to then merely insert a B::^, without
thinking about parsing rules.
But people do not think it is needed, I do not care so much.
>Note that this scheme requires a two-stage parser. The first stage just
>collects a list of tokens. For any operand token, the precedence graph for
>the namespace of that operand's type is merged into the current precedence
>graph for the current expression. Once all precedence graphs have been
>merged, the list of tokens can be reparsed as an expression.
No, I think of a one-step parsing procedure: Each namespace has its own
precedence graph. Whenever an expression is parsed, when two operators
needs to be precedence compared, one makes a lookup in the combined graph
system.
>Non-associativity naturally occurs when there is a conflict in
>associativity of operators. If I have a left-associative operator + and a
>right-associative operator * at the same precedence, and the expression
>x * y + z, the relative associativity of * and + is non-associative.
This just means that the operators cannot be compared, and one gets an
error, not that a new operator that is non-associative has been defined.
But this is just a detail on how one chooses to implement it.
>> The problem is really in figuring out which combinations that are
>> allowable. The problem of separating prefix, postfix, and binary name
>> overloading is done by checking both the token before and the one after,
>> which means that it will not fit into a LR(1) parser without tweaking the
>> lexer. So some such analysis must be done.
>
>It can be done with a one-token lookahead. One can construct a DFA with two
>states: pre-operand and post-operand. Transitions are based on the type of
>the incoming token, which can be an operand, prefix operator, postfix
>operator, infix operator, etc, etc. Intersecting the set of valid outgoing
>states with the possible types of the incoming token resolves many
>ambiguities. For instance:
I think what I do is the same as you, because I record each token, and
then use that information to determine which token the lexer should return
to the parser.
I did the overloading of {prefix, postfix, infix}, and some of other
overloading like "(x)" and "f(x)". But I do not know the solution of the
general problems when one puts all operators in a bag.
>I think it would be reasonable to avoid forcing compiler writers to deal
>with arbitrary lookahead.
As for C++, it is reasonable and probably a must to have restrictions that
makes the implementation simple. When I worked on it I restricted myself
to a simple Flex/Bison combination which sorts out the operator type, and
then I wrote a program with a separate operator stack that sorts out
precedence computations.
This way one cannot capture the syntax name overloading problem without a
careful explicit analysis which must be provided separately, but the
resulting implementation is very simple.
> Humans are parsers too, so constructs that are
>really nasty to parse programmatically are often nasty to read. When
>discussing the ambiguity problems with user-defined operators with a friend
>a few years ago, we kept coming back to the same ugly example:
>|x| | |y|
>
>where the confix
I use the terminology "delimited" for what you call "confix". I cannot
find "confix" in my dictionary, and the prefix "con-" does not seem right,
as it means negation, whereas for example the prefix "en-" can mean "put
into". So if you should invent a new word "enfix" seems better.
> operator | | represented absolute value, and the infix
>operator | represented bitwise or. Parsing this requires arbitrary
>lookahead, which we wanted to avoid.
I am not sure what you mean by arbitrary lookahead here: When the second
"|" is read, the lookahead token is "|", so that one can determine that
the first |x| must be complete. And LR(n) grammars can be rewritten as
LR(1), so your claim must be that it is not LR(n) for any n.
> After a while we realized how truly
>ugly that is. It's hard to read for a human, too, and I don't think it
>would be wrong to make the user write:
>abs(x) | abs(y)
But for C++, or any implementation, some restrictions seems suitable. I
think an important point is to keep the implementation simple from the
point of view of compiler writing..
>I've implemented an expression parser for user-defined operators based on
>the precedence graph and what I've mentioned above. It supports operators
>that are prefix, infix (left-, right-, or non-associative), postfix, confix
>(grouping), and function application.
If you should have success with C++, you must include ternary operators,
because of the a?b:c operator. Here are the example of ternary operators
that I found:
ternary:
infix
left
right x ? y : z -- Conditional expression in C/C++
prefix if x then y else z
postfix
delimited <x|A|y> -- Feynman "bra-ket" notation of Hilbert space
operators in physics.
I could not find any examples for higher order n-ary operators, except for
the obvious
n-ary
infix x_1, ..., x_n
delimited (x_1, ..., x_n)
So I assume that for C++ it would suffice to include up to ternary
operators. This simplifies the syntactic name overloading analysis: One
could probably plug in the different cases in Bison grammar, thus getting
a simplified model which is also very easy to implement.
> One of the included example/test
>programs is a language for user-defined operators. One uses simple
>statements to create operators and define their relative precedences.
>Following my signature is a partial definition for mathematical operators
>that can be used to parse expressions using those operators and emit the
>fully-parenthesized form of the expression tree.
>
>My hope is that I can define most (all?) of C++ expressions using this
>language. The parser is written in C++ and is available at:
>http://www.cs.rpi.edu/~gregod/Sugar/Sugar.tgz
In my theoretic model I included using token sequences in order to capture
C++ expressions like
if (x) y
a binary prefix operator where the first two tokens are "if" and "("
followed by the argument x, the token ")", and the argument y. Probably
one could capture most or all of the C++ grammar by some such extensions.
But as for a proposal for C++, I think one should restrict oneself to the
expressions: This is a relatively simple problem. And it should help a lot
in terms of code expressiveness.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Douglas Gregor <gregod@cs.rpi.edu>
Date: Wed, 29 Aug 2001 08:42:02 GMT Raw View
Hans Aberg wrote:
> In article <wrPi7.382121$EF2.47461394@typhoon.nyroc.rr.com>,
> gregod@cs.rpi.edu wrote:
>>This solution strikes me as most natural. I recently implemented a parser
>>for user-defined operators similar to what I would want to see in C++, and
>>it used simple precedence relationships < (lower precedence than), =
>>(equal precedence to), and > (greater precedence than). These precedence
>>relationships are transitive, ...
>
> This sounds great. If you would be able (cf. below) to make an
> implementation that simulates sets of precedence rules attached to
> namespaces (showing how to merge precedence graphs), then one would get a
> very interesting proposal for C++. There would remain the syntax name
> overloading question though.
An implementation will naturally follow once the syntax name overloading
question is answered. I've flip-flopped between several solutions a few
times:
1) allowing operator qualifications, i.e., std::^
2) 'expression contexts' that are analagous to (but distinct from)
namespaces
3) just require parentheses to disambiguate
I'm leaning again toward #1, but I haven't yet devised a merging scheme for
it.
> BS said he did no want to have a major additions of C++ in this revision
> of the standard, but funnily enough, I think it might be a rather minor
> addition from the practical point of view of compiler writing, because it
> is about adding a feature how to interpret an expression and nothing else.
> So if there is some example code available, showing how to do it, it
> should not be so difficult for compiler writers to implement it.
I'm not sure I agree about how 'minor' this really is. Look at partial
specialization: it is theoretically trivial to implement. Matching partial
specializations is just a simple case unification, and unification has been
well-understood for at least 25 years, yet three years after the C++
Standard was accepted we still have mainstream compilers that do not handle
partial specialization. Perhaps having a sample implementation tips the
scales favorably for user-defined operators, but making the language harder
to parse is going to have its opponents.
>>Infix operators are defined to be left, right, or non-associative. If
>>operators of the same precedence need to be compared, two left-associative
>>operators result in left associativity; two right-associative operators
>>result in right-associativity; all other combinations result in a parse
>>error.
>
> These rules sound OK with me. I think that for C++, one should consider to
> drop non-associativity, because that could be handled by the type system.
> I allowed one to mix right and left associative operators of the same
> precedence, but one can just as well require the precedences to be
> unequal, as you do.
In any case, non-associativity is a minor detail. It may be omitted from
the specification and simulated with types, or is trivial to implement if
desired. I think there are operators that are naturally non-associative,
i.e., the "\models" operator in "std::string \models CopyConstructible",
and they should be represented as such syntactically. In any case, there's
no reason to get hung up over this becaues it's trivial to handle either
resolution.
>>> Then one idea might be to associate such precedence rule with the
>>> namespaces, in which case one will have to figure out to merge them when
>>> more than one namespace is used. Conflicts can always be resolve by
>>> inserting parentheses,
>>> but one other suggestion is that one can write the operator names
>>> explicitly, like x std::+ y if there is a conflict present. One should
>>> also be able to override old precedence declarations.
>>
>>I'm going to reassert the possibility of using a precedence graph. A
>>precedence graph is a directed graph where:
>
> Yes, it is clear that one ends up with a (colored) directed graph.
>
>> - Each vertex represents a set of operators at equal precedence. So a
>>precedence relation '+' = '-' would merge the vertices of '+' and '-'.
>> - Each edge (u, v) means that the operators in u have a lower precedence
>>than the operators in v.
>
> OK, unless one gets into some problems when merging the graphs. Then, as a
> point of departure one could tray say having exactly one vertex for every
> operator. If say '+' and '-' have the same precedence, then there will be
> a special bidirectional arrow (or an edge) '+' <-> '-'. (The classical
> graph thing would be to have two arrows '+' -> '-' and '-' -> '+'.)
An earlier formulation used weighted edges. An edge of weight zero denoted
equality, so if precedence '+' = '-', there would be edges (+, -) and
(-, +) each with weight zero. If precedence '+' < '*', then there would be
an edge ('+', '*') with weight one.
I then would use a shortest-paths algorithm. If a path x->y has weight
zero, they had equal precedence. If a path x->y had a weight in
(0, infinity), x had a lower precedence than y. If a path x->y had infinity
weight (i.e., there is a cycle or there is no path), then the precedence is
unknown.
Allowing operator disambiguation using, e.g., std::^ is moving me back
toward this implementation. Merging two precedence graphs A and B then
becomes:
for every (x, y) in A X B such that x syntactically equals y (i.e., same
token and position), add edges (x, y) and (y, x) with weight zero.
Then if we explicitly quality an operator, we ignore those edges added by
the above algorithm (perhaps color the merge edges differently from the
normal edges). I believe that this will achieve the desired effect while
still keeping a relatively simple implementation.
> As infix operators can be compared with themselves, give their vertex a
> color "left" or "right". Then your rule above says that the combination
> '+' <-> '='
> left '+'
> right '='
> is not allowed.
>
> Then one do simplifications as needed.
Two points to clarify:
- I didn't state in my formulation that cycles in the graph were not
allowed. Indeed they are allowed, and they only cause problems if, for
instance, you are looking for the relative precedence of 'x' and 'y' and
there is a cycle containing 'x' and 'y'. Then the relative precedence is
unknown.
- Associativity is not taking into account in the precedence graph.
Associativity is merely a property of infix operators, and is resolved
_after_ precedence is resolved, and by different means.
So it is perfectly reasonable to have the above:
'+ <-> '='
left '+'
right '='
It's only when you have an expression like "x= y + z" where a comparison
must be made that you will run into a parse error. And I think that makes
perfect sense.
>>Given two operators, x and y, if y is reachable from x, then y has a
>>higher precedence than x; if x is reachable from y, then x has a higher
>>precedence and y. If there is a cycle between x and y or if neither is
>>reachable from the other, the relative precedence is unknown.
>
> If there is a cycle, excluding equal precedence, then there should be an
> error, as it violates the transitivity. But the error should only be
> generated if there is an expression making us of the cycle. This may
> require some thought, for example
> A: infix ^ < infix + -- standard C precedence
> B: infix + < infix ^ -- '^' as exponentiation
> Then one would expect
> namespace A {
> a + b ^ c; // OK as + and ^ both are in namespace A.
> }
> namespace B {
> a + b ^ c; // OK as + and ^ both are in namespace B.
> }
>
> a + b ^ c; // Error: an expression occurred with the loop
> // infix ^ -> infix + -> infix ^
Yes, this is covered by both formulations of the precedence graph.
>>Each namespace will have its own precedence graph. In an expression, the
>>precedence graph from the current namespace is merged with the precedence
>>graphs from all namespaces found through Koenig lookup. Merging of
>>precedence graphs requires merging vertices with the same operator.
>
> The problem is that one wants to allow loops if the merged graphs if the
> set of operators involved are not used in an expression. So one needs a
> clever system to detect these loops dynamically while parsing an
> expression.
No need for any cleverness here :). The loops really aren't a problem at
all.
>>Associativities are easy to merge. When merging infix operators x and y,
>>then resulting associativity is:
>> left: if x and y are both left-associative
>> right: if x and y are both right-associative
>> otherwise non-associative.
>
> I think that one should have an error if the associativities disagree.
Why throw an error unless the user tries to actually use the associativity?
Say I have:
left-associative '+' in namespace A
right-associativity '+' in namespace B
Now I'm using both A and B, so an expression "x + y + z" should cause a
parse error (which associativity is that '+'?), but expressions
"x A::+ y A::+ z" and "(x + y) + z" should not trigger any errors.
Declaring the merged operator '+' non-associative does this for us, and
lets the user resolve the problem if it does come up.
>>Parentheses may always be used to disambiguate expressions, so I don't
>>believe we need an explicit mechanism to specify which operator '+' we
>>want in, e.g., x + y. That's for semantic analysis and not parsing.
>
> This problem of resolving ambiguities is anyway independent of the problem
> of merging precedence graphs. Otherwise, I like the idea to be able to
> qualify the precedence explicitly by namespace, like in the example above,
> writing say
> a + b A::^ c;
>
> The reason I think it might be needed is this: Suppose that you load the
> polynomials package in namespace B, and then write
> polynomial a, b, x;
> ...
> x = a + b ^ 2;
> Then the compiler will give an
> Error: Precedence conflict A::^ and B::^
> The quick way to fix this is to then merely insert a B::^, without
> thinking about parsing rules.
>
> But people do not think it is needed, I do not care so much.
I do (now) think it is needed. The question isn't completely orthogonal
from precedence graph merging because qualifying an operator essentially
undoes part of the merge. That's why I'm reconsidering a precedence graph
merge that makes it easy to ignore selective parts of the merge (i.e.,
qualifying one operator doesn't mean that all merges are undone).
>>Note that this scheme requires a two-stage parser. The first stage just
>>collects a list of tokens. For any operand token, the precedence graph for
>>the namespace of that operand's type is merged into the current precedence
>>graph for the current expression. Once all precedence graphs have been
>>merged, the list of tokens can be reparsed as an expression.
>
> No, I think of a one-step parsing procedure: Each namespace has its own
> precedence graph. Whenever an expression is parsed, when two operators
> needs to be precedence compared, one makes a lookup in the combined graph
> system.
Where does the combined graph system come from? If I have an expression
"-x", where do I look for the '-' operator? I believe that the most natural
C++ solution would be to use Koenig lookup, so we would be looking in the
current namespace and the namespace of "x". This is what necessitates the
two-step approach.
>>Non-associativity naturally occurs when there is a conflict in
>>associativity of operators. If I have a left-associative operator + and a
>>right-associative operator * at the same precedence, and the expression
>>x * y + z, the relative associativity of * and + is non-associative.
>
> This just means that the operators cannot be compared, and one gets an
> error, not that a new operator that is non-associative has been defined.
> But this is just a detail on how one chooses to implement it.
Right. I implement unknown/no associativity as "non associative", but that
shouldn't cloud the theoretical view.
>>> The problem is really in figuring out which combinations that are
>>> allowable. The problem of separating prefix, postfix, and binary name
>>> overloading is done by checking both the token before and the one after,
>>> which means that it will not fit into a LR(1) parser without tweaking
>>> the lexer. So some such analysis must be done.
>>
>>It can be done with a one-token lookahead. One can construct a DFA with
>>two states: pre-operand and post-operand. Transitions are based on the
>>type of the incoming token, which can be an operand, prefix operator,
>>postfix operator, infix operator, etc, etc. Intersecting the set of valid
>>outgoing states with the possible types of the incoming token resolves
>>many ambiguities. For instance:
>
> I think what I do is the same as you, because I record each token, and
> then use that information to determine which token the lexer should return
> to the parser.
>
> I did the overloading of {prefix, postfix, infix}, and some of other
> overloading like "(x)" and "f(x)". But I do not know the solution of the
> general problems when one puts all operators in a bag.
Using the intersection method, it's easy to enumerate all of the cases that
cause problems. The "parser design" document I referenced in the last
message described all of the corner cases for the operator types supported.
>>I think it would be reasonable to avoid forcing compiler writers to deal
>>with arbitrary lookahead.
>
> As for C++, it is reasonable and probably a must to have restrictions that
> makes the implementation simple. When I worked on it I restricted myself
> to a simple Flex/Bison combination which sorts out the operator type, and
> then I wrote a program with a separate operator stack that sorts out
> precedence computations.
Sounds reasonable. The two-stack approach (one for operators, one for
operands) is relatively easy to implement.
> This way one cannot capture the syntax name overloading problem without a
> careful explicit analysis which must be provided separately, but the
> resulting implementation is very simple.
It fits in very nicely with assuring the correctness of the two-stack
approach. One needs to track the state to ensure that operators and
operands are coming in the right order. For instance, the input stream
could contain:
x -
where x is an operand and '-' is a prefix operator. The basic two-stack
parser will shift the operand, shift the '-', and then perform a reduction.
Of course, this string should be rejected because of the ordering of
operand and operator, and the easy way to ensure this rejection is with the
state machine I described. So disambiguation of operator types (e.g., for
prefix or infix '-' in mathematics) fits nicely into the algorithm.
>> Humans are parsers too, so constructs that are
>>really nasty to parse programmatically are often nasty to read. When
>>discussing the ambiguity problems with user-defined operators with a
>>friend a few years ago, we kept coming back to the same ugly example:
>>|x| | |y|
>>
>>where the confix
>
> I use the terminology "delimited" for what you call "confix". I cannot
> find "confix" in my dictionary, and the prefix "con-" does not seem right,
> as it means negation, whereas for example the prefix "en-" can mean "put
> into". So if you should invent a new word "enfix" seems better.
I picked up the terminology from concept specifications in the Tecton
language. I'll ask one of the authors where the term 'confix' came from.
>> operator | | represented absolute value, and the infix
>>operator | represented bitwise or. Parsing this requires arbitrary
>>lookahead, which we wanted to avoid.
>
> I am not sure what you mean by arbitrary lookahead here: When the second
> "|" is read, the lookahead token is "|", so that one can determine that
> the first |x| must be complete.
Perhaps a better example would clarify. Consider this:
| x | - y ... ?
So the first | is clearly the beginning of the absolute value of something.
Then we get x, then another |. What is this second '|'? Is it the close
confix/enfix/delimiter so we have |x|? Or is it an infix operator? The '-'
doesn't help us a bit in this case, because it could be prefix (x | (-y))
or infix ((|x|) - y). In fact, we might not know what this second '|' is
until we reach the end of the string, which may be any arbitrary length.
Thus, we must look ahead an arbitrary distance to find out what that '|'
really is.
>And LR(n) grammars can be rewritten as
> LR(1), so your claim must be that it is not LR(n) for any n.
For any n, let the '...' in the above string by n symbols and the lookahead
isn't long enough, so I'm claiming that it isn't LR(n) for any n.
>> After a while we realized how truly
>>ugly that is. It's hard to read for a human, too, and I don't think it
>>would be wrong to make the user write:
>>abs(x) | abs(y)
>
> But for C++, or any implementation, some restrictions seems suitable. I
> think an important point is to keep the implementation simple from the
> point of view of compiler writing..
Fair enough.
>>I've implemented an expression parser for user-defined operators based on
>>the precedence graph and what I've mentioned above. It supports operators
>>that are prefix, infix (left-, right-, or non-associative), postfix,
>>confix (grouping), and function application.
>
> If you should have success with C++, you must include ternary operators,
> because of the a?b:c operator. Here are the example of ternary operators
> that I found:
> ternary:
> infix
> left
> right x ? y : z -- Conditional expression in C/C++
> prefix if x then y else z
> postfix
> delimited <x|A|y> -- Feynman "bra-ket" notation of Hilbert space
> operators in physics.
>
> I could not find any examples for higher order n-ary operators, except for
> the obvious
> n-ary
> infix x_1, ..., x_n
> delimited (x_1, ..., x_n)
>
> So I assume that for C++ it would suffice to include up to ternary
> operators. This simplifies the syntactic name overloading analysis: One
> could probably plug in the different cases in Bison grammar, thus getting
> a simplified model which is also very easy to implement.
I haven't yet ruled out (perhaps only for my own satisfaction) the
possibility of using the current set of operator types to simulate ternary
(or n-ary) operators. For instance, your 'delimited' case (x_1, ..., x_n)
could be handled using a delimited (confix) operator ( ). Then the comma
operator just builds a list, and simple tree reshaping can turn it into an
'n-tuple'. The common rebuttal is that an expression such as:
((a, b), c) will be parsed as (a, b, c), which it will by current C++
rules, but confix/delimited operators are first-class citizens in this
expression parser, so ((a, b), c) will stay as ((a, b), c) and will be
different from the imaginary parentheses that group "a, b, c" based on
associativity.
>> One of the included example/test
>>programs is a language for user-defined operators. One uses simple
>>statements to create operators and define their relative precedences.
>>Following my signature is a partial definition for mathematical operators
>>that can be used to parse expressions using those operators and emit the
>>fully-parenthesized form of the expression tree.
>>
>>My hope is that I can define most (all?) of C++ expressions using this
>>language. The parser is written in C++ and is available at:
>>http://www.cs.rpi.edu/~gregod/Sugar/Sugar.tgz
>
> In my theoretic model I included using token sequences in order to capture
> C++ expressions like
> if (x) y
> a binary prefix operator where the first two tokens are "if" and "("
> followed by the argument x, the token ")", and the argument y. Probably
> one could capture most or all of the C++ grammar by some such extensions.
>
> But as for a proposal for C++, I think one should restrict oneself to the
> expressions: This is a relatively simple problem. And it should help a lot
> in terms of code expressiveness.
I'm a big fan of just sticking with C++ expressions :)
Doug
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Wed, 29 Aug 2001 21:59:58 GMT Raw View
In article <mtXi7.386002$EF2.47911591@typhoon.nyroc.rr.com>,
gregod@cs.rpi.edu wrote:
>> This sounds great. If you would be able (cf. below) to make an
>> implementation that simulates sets of precedence rules attached to
>> namespaces (showing how to merge precedence graphs), then one would get a
>> very interesting proposal for C++. There would remain the syntax name
>> overloading question though.
>
>An implementation will naturally follow once the syntax name overloading
>question is answered. I've flip-flopped between several solutions a few
>times:
> 1) allowing operator qualifications, i.e., std::^
> 2) 'expression contexts' that are analagous to (but distinct from)
>namespaces
> 3) just require parentheses to disambiguate
These questions are quite independent: The syntax name overloading
question just tells how to interpret operators of different syntax with
overlapping name components. This is then this independent of the question
how to merge precedence graphs. Which in its turn is independent of the
question if precedence graphs should be attached to namespaces or
something else. Which is rather independent of the question how to resolve
ambiguities.
The simplest thing is probably to just have 3) above, just require
parentheses to disambiguate, and then add something else if it appears to
be needed. It is more important to figure out how to merge the precedence
graphs and to solve the syntax name overloading problem.
>> BS said he did no want to have a major additions of C++ in this revision
>> of the standard, but funnily enough, I think it might be a rather minor
>> addition from the practical point of view of compiler writing...
...
>I'm not sure I agree about how 'minor' this really is. ...
Well, I do not know for sure, as I do not write C++ compilers. But
essentially, the parsing of an expression sorts it out to the same
semantic objects as before, so therefore it should not require any major
alterations: Just flip the precedence machinery in, and it should work. On
the semantic level, it does not alter any of the old function overloading.
>> OK, unless one gets into some problems when merging the graphs. Then, as a
>> point of departure one could tray say having exactly one vertex for every
>> operator. If say '+' and '-' have the same precedence, then there will be
>> a special bidirectional arrow (or an edge) '+' <-> '-'. (The classical
>> graph thing would be to have two arrows '+' -> '-' and '-' -> '+'.)
>An earlier formulation used weighted edges. An edge of weight zero denoted
>equality, so if precedence '+' = '-', there would be edges (+, -) and
>(-, +) each with weight zero. If precedence '+' < '*', then there would be
>an edge ('+', '*') with weight one.
>I then would use a shortest-paths algorithm. If a path x->y has weight
>zero, they had equal precedence. If a path x->y had a weight in
>(0, infinity), x had a lower precedence than y. If a path x->y had infinity
>weight (i.e., there is a cycle or there is no path), then the precedence is
>unknown.
One will have to fiddle around with some such models in order to find a
good implementation: The picture I have in my mind is that one has
precedence graphs hooks up to environments. Then when two operators need
to be precedence compared, it should be relatively fast to look at the
combined graph of some graph up to the environment root.
>Allowing operator disambiguation using, e.g., std::^ is moving me back
>toward this implementation. Merging two precedence graphs A and B then
>becomes:
> for every (x, y) in A X B such that x syntactically equals y (i.e., same
>token and position), add edges (x, y) and (y, x) with weight zero.
So rather than bother std::^ qualifications, I think one should think of a
quick dynamic environment precedence graph lookup (but it could happen
that one lands on the same equation).
>Two points to clarify:
> - I didn't state in my formulation that cycles in the graph were not
>allowed. Indeed they are allowed, and they only cause problems if, for
>instance, you are looking for the relative precedence of 'x' and 'y' and
>there is a cycle containing 'x' and 'y'. Then the relative precedence is
>unknown.
OK. This is what one wants, I think.
> - Associativity is not taking into account in the precedence graph.
>Associativity is merely a property of infix operators, and is resolved
>_after_ precedence is resolved, and by different means.
>So it is perfectly reasonable to have the above:
>'+ <-> '='
>left '+'
>right '='
>It's only when you have an expression like "x= y + z" where a comparison
>must be made that you will run into a parse error. And I think that makes
>perfect sense.
OK. That is fine with me.
>> The problem is that one wants to allow loops if the merged graphs if the
>> set of operators involved are not used in an expression. So one needs a
>> clever system to detect these loops dynamically while parsing an
>> expression.
>No need for any cleverness here :). The loops really aren't a problem at
>all.
OK. What mechanism do you use in order to detect the loops?
>> I think that one should have an error if the associativities disagree.
>Why throw an error unless the user tries to actually use the associativity?
>Say I have:
>left-associative '+' in namespace A
>right-associativity '+' in namespace B
>Now I'm using both A and B, so an expression "x + y + z" should cause a
>parse error (which associativity is that '+'?), but expressions
>"x A::+ y A::+ z" and "(x + y) + z" should not trigger any errors.
All such dynamic checking is fine with me.
>Declaring the merged operator '+' non-associative does this for us, and
>lets the user resolve the problem if it does come up.
This seems to be merely a question on how to implement is: In view of the
conflicting right/left associativities, the precedence function call
p('+', '+') will indicate that the comparison is not allowed whenever one
tries to use it.
>> This problem of resolving ambiguities is anyway independent of the problem
>> of merging precedence graphs. Otherwise, I like the idea to be able to
>> qualify the precedence explicitly by namespace, like in the example above,
>> writing say
>> a + b A::^ c;
...
>> But people do not think it is needed, I do not care so much.
>I do (now) think it is needed. The question isn't completely orthogonal
>from precedence graph merging because qualifying an operator essentially
>undoes part of the merge. That's why I'm reconsidering a precedence graph
>merge that makes it easy to ignore selective parts of the merge (i.e.,
>qualifying one operator doesn't mean that all merges are undone).
Well, you changed me to your former view, at least in the sense it is
important to get a working system without it first. Then one can consider
adding it. A more dynamic precedence lookup system will probably affect it
as well.
>>>Note that this scheme requires a two-stage parser. The first stage just
>>>collects a list of tokens. For any operand token, the precedence graph for
>>>the namespace of that operand's type is merged into the current precedence
>>>graph for the current expression. Once all precedence graphs have been
>>>merged, the list of tokens can be reparsed as an expression.
>> No, I think of a one-step parsing procedure: Each namespace has its own
>> precedence graph. Whenever an expression is parsed, when two operators
>> needs to be precedence compared, one makes a lookup in the combined graph
>> system.
>Where does the combined graph system come from? If I have an expression
>"-x", where do I look for the '-' operator? I believe that the most natural
>C++ solution would be to use Koenig lookup, so we would be looking in the
>current namespace and the namespace of "x". This is what necessitates the
>two-step approach.
By "Koenig lookup", I figure you mean the stuff in BS's "DEC++", sec. 6.3.1.
I do not see why you need to consider a whole expression and then reparse it:
In the course of the parsing, some operators s, t need to compared in
order to decide whether to shift or reduce, which done by a function p(s,
t). Say that we have
namespace A: infix ^ < infix +
namespace B: infix + < infix ^
and in the global namespace, we want to parse
a + b ^ c
Then, at some point in the parsing, we need to compute p(+, ^). So then we
compute in the admissible namespaces to compute p(+, ^).
Why would one have to consider other operators of the expression in that
computation?
>> I did the overloading of {prefix, postfix, infix}, and some of other
>> overloading like "(x)" and "f(x)". But I do not know the solution of the
>> general problems when one puts all operators in a bag.
>Using the intersection method, it's easy to enumerate all of the cases that
>cause problems. The "parser design" document I referenced in the last
>message described all of the corner cases for the operator types supported.
I had a look at it. I used a Flex/Bison combination, mainly because I
wanted to achieve a simple approach, is the state machine is hidden away
in the state machines that Flex and Bison produce.
For use with C++, I think both approaches might be important, because
compiler writers may use different approaches. It surely, in some sense
working out the details, as you do, as one then can get more control over
it.
>> As for C++, it is reasonable and probably a must to have restrictions that
>> makes the implementation simple. When I worked on it I restricted myself
>> to a simple Flex/Bison combination which sorts out the operator type, and
>> then I wrote a program with a separate operator stack that sorts out
>> precedence computations.
>Sounds reasonable. The two-stack approach (one for operators, one for
>operands) is relatively easy to implement.
That is the point of it, simplicity, even though I considered make a
single LR style stack in order to handle the syntax name overloading
problem. (If somebody wants the sources to play with, just let me know
via email.)
Another way to go might be to modify Bison, so that it can handle dynamic
precedences, but I haven't figured out that one yet. :-)
>> This way one cannot capture the syntax name overloading problem without a
>> careful explicit analysis which must be provided separately, but the
>> resulting implementation is very simple.
>It fits in very nicely with assuring the correctness of the two-stack
>approach. One needs to track the state to ensure that operators and
>operands are coming in the right order. For instance, the input stream
>could contain:
>x -
>where x is an operand and '-' is a prefix operator. The basic two-stack
>parser will shift the operand, shift the '-', and then perform a reduction.
>Of course, this string should be rejected because of the ordering of
>operand and operator, and the easy way to ensure this rejection is with the
>state machine I described. So disambiguation of operator types (e.g., for
>prefix or infix '-' in mathematics) fits nicely into the algorithm.
So in this case I resolve it by recording the tokens, and by that letting
the lexer that Flex generates sort it out to be a prefix operator. This
passes over a operator name table lookup, which checks which operators are
available.
>I picked up the terminology from concept specifications in the Tecton
>language. I'll ask one of the authors where the term 'confix' came from.
The Merriam "Webster's Third International Dictionary" includes word
etymology, and it says the "con-" is a negation, as opposed to "pro-"
which is a positivation. The prefixes "en-" and "em-" both derives from
the same words as "in-", but can also be used as "put into" as in
"encircle".
>>> operator | | represented absolute value, and the infix
>>>operator | represented bitwise or. Parsing this requires arbitrary
>>>lookahead, which we wanted to avoid.
>> I am not sure what you mean by arbitrary lookahead here: When the second
>> "|" is read, the lookahead token is "|", so that one can determine that
>> the first |x| must be complete.
>Perhaps a better example would clarify. Consider this:
>| x | - y ... ?
Ok. That is better.
>So the first | is clearly the beginning of the absolute value of something.
>Then we get x, then another |. What is this second '|'? Is it the close
>confix/enfix/delimiter so we have |x|? Or is it an infix operator? The '-'
>doesn't help us a bit in this case, because it could be prefix (x | (-y))
>or infix ((|x|) - y). In fact, we might not know what this second '|' is
>until we reach the end of the string, which may be any arbitrary length.
>Thus, we must look ahead an arbitrary distance to find out what that '|'
>really is.
I am not sure really how to resolve this, and I think it does not really
matter, as long as one strives for a simple implementation: After all, we
are not out to capture all corny syntax practises, and as you pointed out,
such practises do not help human parsing.
Simple rules would be to declare to always one-lookahead reduce or always
shift, or to prohibit such name overloading.
>> If you should have success with C++, you must include ternary operators,
>> because of the a?b:c operator.
...
>> I could not find any examples for higher order n-ary operators, except for
>> the obvious
>> n-ary
>> infix x_1, ..., x_n
>> delimited (x_1, ..., x_n)
>I haven't yet ruled out (perhaps only for my own satisfaction) the
>possibility of using the current set of operator types to simulate ternary
>(or n-ary) operators. For instance, your 'delimited' case (x_1, ..., x_n)
>could be handled using a delimited (confix) operator ( ).
As for ternary operators, I think they must be included (not simulated): A
frequent argument against using precedence rules in C++ is that it does
not cover the x?y:z operator. So I think it will be important to
demonstrate that one covers those in order to have any success with a
proposal.
And as for C++ and n-ary operators, I think is suffices with breaking up
(x_1, ..., x_n) into "(x)" and "y,z", as is the case now. Otherwise, in
math one would expect
x_1, ..., x_n
(x_1, ..., x_n)
<x_1, ..., x_n>
etc., to be all different. But for use in C++, one isn't trying to capture
all standard mathematical practises, so it won't matter much, I believe.
One could add it as an extra at some point, if the analysis isn't too
hard.
>>>My hope is that I can define most (all?) of C++ expressions using this
>>>language. The parser is written in C++ and is available at:
>>>http://www.cs.rpi.edu/~gregod/Sugar/Sugar.tgz
>> In my theoretic model I included using token sequences in order to capture
>> C++ expressions like
>> if (x) y
>> a binary prefix operator where the first two tokens are "if" and "("
>> followed by the argument x, the token ")", and the argument y. Probably
>> one could capture most or all of the C++ grammar by some such extensions.
>> But as for a proposal for C++, I think one should restrict oneself to the
>> expressions: This is a relatively simple problem. And it should help a lot
>> in terms of code expressiveness.
>I'm a big fan of just sticking with C++ expressions :)
As for C++ expression, the part I was not exactly sure of was the sizeof
and type cats, which have their own precedence rules. But other than that
(as it is not difficult to include ternary operators), I don't think there
were any problems.
With respect to C++ expressions, do you have any specific syntactic
problems in your mind?
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Douglas Gregor <gregod@cs.rpi.edu>
Date: Thu, 30 Aug 2001 12:12:55 GMT Raw View
Hans Aberg wrote:
> In article <mtXi7.386002$EF2.47911591@typhoon.nyroc.rr.com>,
> These questions are quite independent: The syntax name overloading
> question just tells how to interpret operators of different syntax with
> overlapping name components. This is then this independent of the question
> how to merge precedence graphs. Which in its turn is independent of the
> question if precedence graphs should be attached to namespaces or
> something else. Which is rather independent of the question how to resolve
> ambiguities.
I misunderstood your term 'syntax overloading'. I had taken it as, for
instance, the ambiguity between prefix '-' and infix '-'. I now see that
you are referring to the problem of having, for instance, an infix '*', a
prefix '*', and an infix '**', so that a**b could be parsed as (a ** b) or
(a * (*b)).
This is surely a tough problem: the 'simplest' resolution is, of course, to
ban operator names that could cause ambiguities in existing code (e.g., **,
*-, etc.). For new operators, no such ban would exist but whitespace would
be required between operators. This could be seen as a stopgap measure to
keep compatibility with C and prior versions of C++, and could later be
removed without any ill effects.
>>> BS said he did no want to have a major additions of C++ in this revision
>>> of the standard, but funnily enough, I think it might be a rather minor
>>> addition from the practical point of view of compiler writing...
> ...
>>I'm not sure I agree about how 'minor' this really is. ...
>
> Well, I do not know for sure, as I do not write C++ compilers. But
> essentially, the parsing of an expression sorts it out to the same
> semantic objects as before, so therefore it should not require any major
> alterations: Just flip the precedence machinery in, and it should work. On
> the semantic level, it does not alter any of the old function overloading.
My concern is that the addition of the precedence machinery will cause
extra ambiguities. Templates appear to be the most problematic, because of
the reuse of < and >. I haven't studied this enough yet to determine if it
really is a problem, but my gut reaction is that we may be walking into a
minefield of parsing ambiguities.
>>Allowing operator disambiguation using, e.g., std::^ is moving me back
>>toward this implementation. Merging two precedence graphs A and B then
>>becomes:
>> for every (x, y) in A X B such that x syntactically equals y (i.e., same
>>token and position), add edges (x, y) and (y, x) with weight zero.
>
> So rather than bother std::^ qualifications, I think one should think of a
> quick dynamic environment precedence graph lookup (but it could happen
> that one lands on the same equation).
The qualifications may still be needed in any case. If an environment
contains two '+' operators from different namespaces that have different
precedence levels, it may cause an unknown precedence due to a cycle;
however, disambiguating a '+' by stating, e.g., regex::+could eliminate
this cycle because the edges linking the two '+''s would be ignored.
One view of this that appears to work is that each namespace represents a
subgraph in a giant graph that represents the entire compilation unit. An
"environment" is a set of edges (with weight zero) that connect
syntactically indistinguishable operators. These connections disappear if
the user disambiguates the operator using a namespace qualification. As an
example:
namespace A {
infix operator+;
}
namespace B {
infix operator+;
}
A is represented by a subgraph with a single vertex for '+'. B is
represented by a subgraph with a single (different) vertex for '+'.
If we are in an environment where both A's '+' and B's '+' are imported, we
add edges (A::+, B::+) and (B::+, A::+) with weight zero (i.e., equal
precedence). If in this environment a '+' is qualified as 'A::+' or 'B::+',
we ignore the edges (A::+, B::+) and (B::+, A::+) for that lookup _only_.
>>> The problem is that one wants to allow loops if the merged graphs if the
>>> set of operators involved are not used in an expression. So one needs a
>>> clever system to detect these loops dynamically while parsing an
>>> expression.
>
>>No need for any cleverness here :). The loops really aren't a problem at
>>all.
>
> OK. What mechanism do you use in order to detect the loops?
No need to detect the loops directly. When we look up the relative
precedence of two operators 'x' and 'y', we find the shortest path from x
to y and vice-versa. If non-infinite paths exist from x to y and y to x,
we've found a loop that actually causes a problem. Internal loops (i.e., if
there is a loop on the path from x to y that doesn't contain both x and y)
aren't even noticed by the shortest-path algorithm.
>>Declaring the merged operator '+' non-associative does this for us, and
>>lets the user resolve the problem if it does come up.
>
> This seems to be merely a question on how to implement is: In view of the
> conflicting right/left associativities, the precedence function call
> p('+', '+') will indicate that the comparison is not allowed whenever one
> tries to use it.
Agreed.
>>>>Note that this scheme requires a two-stage parser. The first stage just
>>>>collects a list of tokens. For any operand token, the precedence graph
>>>>for the namespace of that operand's type is merged into the current
>>>>precedence graph for the current expression. Once all precedence graphs
>>>>have been merged, the list of tokens can be reparsed as an expression.
>
>>> No, I think of a one-step parsing procedure: Each namespace has its own
>>> precedence graph. Whenever an expression is parsed, when two operators
>>> needs to be precedence compared, one makes a lookup in the combined
>>> graph system.
>
>>Where does the combined graph system come from? If I have an expression
>>"-x", where do I look for the '-' operator? I believe that the most
>>natural C++ solution would be to use Koenig lookup, so we would be looking
>>in the current namespace and the namespace of "x". This is what
>>necessitates the two-step approach.
>
> By "Koenig lookup", I figure you mean the stuff in BS's "DEC++", sec.
> 6.3.1.
>
> I do not see why you need to consider a whole expression and then reparse
> it:
I'm referring to 3.4.2 "Argument-dependent name lookup" in the C++
standard. It's the clause that states that the compiler should look for
declarations of functions and operators in the namespaces where the types
in the current expression reside. For instance:
--------------------------
namespace A {
class X;
}
namespace B {
class Y;
}
A::X x;
B::Y y;
x + y; // looks in namespaces A and B for operator+
--------------------------
> In the course of the parsing, some operators s, t need to compared in
> order to decide whether to shift or reduce, which done by a function p(s,
> t). Say that we have
> namespace A: infix ^ < infix +
> namespace B: infix + < infix ^
> and in the global namespace, we want to parse
> a + b ^ c
> Then, at some point in the parsing, we need to compute p(+, ^). So then we
> compute in the admissible namespaces to compute p(+, ^).
And what are the admissible namespaces? I would claim that they are the
same as for Koenig lookup, as above.
> Why would one have to consider other operators of the expression in that
> computation?
It's not the other operators that are considered, it's determining all of
the operators that could possibly occur based on the types in the
expression.
>>> I did the overloading of {prefix, postfix, infix}, and some of other
>>> overloading like "(x)" and "f(x)". But I do not know the solution of the
>>> general problems when one puts all operators in a bag.
>
>>Using the intersection method, it's easy to enumerate all of the cases
>>that cause problems. The "parser design" document I referenced in the last
>>message described all of the corner cases for the operator types
>>supported.
>
> I had a look at it. I used a Flex/Bison combination, mainly because I
> wanted to achieve a simple approach, is the state machine is hidden away
> in the state machines that Flex and Bison produce.
I'd like to look at this approach at some point.
> For use with C++, I think both approaches might be important, because
> compiler writers may use different approaches. It surely, in some sense
> working out the details, as you do, as one then can get more control over
> it.
Multiple, sufficiently different implementations would help greatly.
>>> As for C++, it is reasonable and probably a must to have restrictions
>>> that makes the implementation simple. When I worked on it I restricted
>>> myself to a simple Flex/Bison combination which sorts out the operator
>>> type, and then I wrote a program with a separate operator stack that
>>> sorts out precedence computations.
>
>>Sounds reasonable. The two-stack approach (one for operators, one for
>>operands) is relatively easy to implement.
>
> That is the point of it, simplicity, even though I considered make a
> single LR style stack in order to handle the syntax name overloading
> problem. (If somebody wants the sources to play with, just let me know
> via email.)
Interesting. It doesn't seem like it would work well when semantics need to
be considered: x**y could mean exponentiation (if x, y are ints) or it
could be x * (*y) (if y is a pointer type). And, of course, there could be
types in which both interpretations are valid.
> Another way to go might be to modify Bison, so that it can handle dynamic
> precedences, but I haven't figured out that one yet. :-)
Sounds like fun :)
>>where x is an operand and '-' is a prefix operator. The basic two-stack
>>parser will shift the operand, shift the '-', and then perform a
>>reduction. Of course, this string should be rejected because of the
>>ordering of operand and operator, and the easy way to ensure this
>>rejection is with the state machine I described. So disambiguation of
>>operator types (e.g., for prefix or infix '-' in mathematics) fits nicely
>>into the algorithm.
>
> So in this case I resolve it by recording the tokens, and by that letting
> the lexer that Flex generates sort it out to be a prefix operator. This
> passes over a operator name table lookup, which checks which operators are
> available.
Fair enough. So the lexer is required to be dynamic, then, to handle the
addition of new prefix operators?
>>I picked up the terminology from concept specifications in the Tecton
>>language. I'll ask one of the authors where the term 'confix' came from.
>
> The Merriam "Webster's Third International Dictionary" includes word
> etymology, and it says the "con-" is a negation, as opposed to "pro-"
> which is a positivation. The prefixes "en-" and "em-" both derives from
> the same words as "in-", but can also be used as "put into" as in
> "encircle".
The OBJ family of languages (which supports "mixfix" operators, where the _
character is used as a placeholder for an operand, and keywords can go
before, after, or between those operands) calls them "outfix" operators.
> As for ternary operators, I think they must be included (not simulated): A
> frequent argument against using precedence rules in C++ is that it does
> not cover the x?y:z operator. So I think it will be important to
> demonstrate that one covers those in order to have any success with a
> proposal.
I think I agree with you on this.
> And as for C++ and n-ary operators, I think is suffices with breaking up
> (x_1, ..., x_n) into "(x)" and "y,z", as is the case now. Otherwise, in
> math one would expect
> x_1, ..., x_n
> (x_1, ..., x_n)
> <x_1, ..., x_n>
> etc., to be all different. But for use in C++, one isn't trying to capture
> all standard mathematical practises, so it won't matter much, I believe.
> One could add it as an extra at some point, if the analysis isn't too
> hard.
We can make all of the above different: a comma-separated list can be built
using just an infix comma operator. Then the ( ) or < > "outfix" operators
can be given that comma-separated list operand to achieve a specific
meaning. The use of < > as an outfix operator might need to be banned
anyway because of templates.
>>I'm a big fan of just sticking with C++ expressions :)
>
> As for C++ expression, the part I was not exactly sure of was the sizeof
> and type cats, which have their own precedence rules. But other than that
> (as it is not difficult to include ternary operators), I don't think there
> were any problems.
>
> With respect to C++ expressions, do you have any specific syntactic
> problems in your mind?
Just the possibility of templates causing havoc with < and > operators. If
it is possible to take the C++ expression grammar and correctly separate
operands from operators without using precedence, there will be no problems
introducing user-defined operators. I haven't yet tried this transformation.
Doug
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news_comp.std.c++_expires-2001-10-01@nmhq.net (Niklas Matthies)
Date: Thu, 30 Aug 2001 15:05:13 GMT Raw View
On Wed, 29 Aug 2001 21:59:58 GMT, Hans Aberg <remove.haberg@matematik.su.=
se> wrote:
> In article <mtXi7.386002$EF2.47911591@typhoon.nyroc.rr.com>,
> gregod@cs.rpi.edu wrote:
[=B7=B7=B7]
> >I picked up the terminology from concept specifications in the Tecton=20
> >language. I'll ask one of the authors where the term 'confix' came fro=
m.
> =20
> The Merriam "Webster's Third International Dictionary" includes word
> etymology, and it says the "con-" is a negation, as opposed to "pro-"
> which is a positivation. The prefixes "en-" and "em-" both derives fro=
m
> the same words as "in-", but can also be used as "put into" as in
> "encircle".
While con- can express negation, it (more?) often has the meaning of
"together"/"combined"/"along with". Examples: consent, context,
continent, contour, contraction, contribute, convention, convoy
confederation, concatenate, concentrate, concurrence, consequence,
consolidation, consonant, constituent, conspiration, construct, ...
Personally, I'd find "circumfix" a good word. Everyone will understand
at once that it refers to something that is "around" something other.
Btw, I follow your discussion with great interest, having had similar
thoughts with regard to operators myself. Please do go on. :)
-- Niklas Matthies
--=20
No question is too silly to ask, but, of course, some are too silly to
answer. -- Perl book
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 31 Aug 2001 17:30:37 GMT Raw View
In article
<slrn9oqrl3.ua.news_comp.std.c++_expires-2001-10-01@nightrunner.nmhq.net>,
news_comp.std.c++_expires-2001-10-01@nmhq.net (Niklas Matthies) wrote:
>While con- can express negation, it (more?) often has the meaning of
>"together"/"combined"/"along with".
Right. I perhaps didn't look too carefully in the dictionary: con-, com-,
col-, cor- means "with, together, jointly".
>Personally, I'd find "circumfix" a good word. Everyone will understand
>at once that it refers to something that is "around" something other.
Word like "circumscribe", "delimit", "encircle", and "fix", can be used as
synonyms, so "fix" variations like "enfix" and "circumfix" seems possible.
And "confix" would then mean both "prefix" and "postfix", I gather. The
picture I have in my mind is that it is "delimited", because the
delimiters sets boundaries for the parsing of the operator, and they are
called left/right delimiters in math.
This is really a side issue, but then again, it is important to people
that the terminology is such that they feel comfortable with it.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 27 Aug 2001 18:41:33 GMT Raw View
In article <t31ym14cxl.fsf@watts.itp.tuwien.ac.at>, Christopher Eltschka
<celtschk@web.de> wrote:
>remove.haberg@matematik.su.se (Hans Aberg) writes:
>
>> In article <t3n14qfqoy.fsf@watts.itp.tuwien.ac.at>, Christopher Eltschka
>> <celtschk@web.de> wrote:
>> >Interesting. But then, one could replace the concept of a global
>> >precedence by just defining for each pair of "open right" and "open
>> >left" separately which one binds stronger.
>>
>> Right: With such local precedence rules, one would only get errors when
>> uncombinable operators appear next to each other during the course of the
>> reduction of an expression. In a system trying to combine different sets
>> of precedence rules, such a local precedence function could prove
>> important in order to help detecting errors.
>>
>> I just avoided the issue, using precedence numbers, because it is simpler,
>> and the question which precedence function one should use is irrelevant
>> for the aspects that I studied: When an expression is reduced, operators
>> may need to be compared, and it is irrelevant what function one is using..
>>
>> I think it would be great if somebody else would start to think about
>> designing such a precedence function.
>
>Well, it's always the problem to find the right balance between being
>simple and being general and flexible. Precedence numbers are the
>simplest, but most unflexible solution. Operator pair rules are the
>most general and flexible, but also the most complex solution. I'm
>sure there are lots of solutions in between.
A one in between suggestion is that operators are locally totally ordered,
that is, one might specify say "=" < "+", which would meant that p(=, +) =
shift, p(+, =) = reduce. For binary operators one will have to indicate
p(=, =) = shift, p(+, +) = reduce by declaring "=" right and "+" left
associative. If "=" < "+" and "+" < "*", then one will assume that "=" <
"*". If "+" = "-", then I used the rule p("+", "-") := p("+", "+").
Then one idea might be to associate such precedence rule with the
namespaces, in which case one will have to figure out to merge them when
more than one namespace is used. Conflicts can always be resolve by
inserting parentheses,
but one other suggestion is that one can write the operator names
explicitly, like x std::+ y if there is a conflict present. One should
also be able to override old precedence declarations.
>> ...What is needed is the following: Let V be the set of operators. Then
>> one needs a function p: S -> { reduce, shift }, where S is subset of V x
>> V. To "reduce" means to compute immediately, and to "shift" means to stack
>> the operator and compute it later. For example, p('+', '+') = reduce, and
>> p('=', '=') = shift.
>
>Well, shift/reduce are parser operations. While this is certainly what
>it finally reduces to (no pun intended), I prefer to remain in the
>problem domain as much as possible.
This is just a way to describe what is needed. One can think of more
codomain of p values than just shift/reduce; for example, in order to
express non-associativity, one might want to have "uncombinable". This is
then different form p(x, y) not defined, as the compiler will give
different diagnostics.
Once one has settled for a good model, one can wrap it up in a more user
friendly terminology.
>> Even though I figured that in the case of C++, one might not need
>> to define non-associativity, because that is taken care of the type
>> system. For example, if one does not want to allow a = b = c, then one can
>> define operator= for the assignment b = c return say "void", and if the
>> operator= of the class that a belongs to does not accept void as an
>> argument, then a = b = c will not be possible. This might then simplify
>> the assignment of operator precedence in C++.
>
>But that's not quite the same: Here, by returning void, you not only
>forbid "a = b = c", but also "(a = b) = c" and "a = (b = c)". In my
>power operator example above, neither "(a power b) power c" nor
>"a power (b power c)" are forbidden. Only "a power b power c" is. And
>a power operator which returns vois wouldn't be too useful anyway :-)
OK. In fact it speaks for not specifying non-associativity, and use types
instead. I think one should not try to put too much into the syntax and
instead use semantics when appropriate, as the syntax analysis tends to be
too crude.
>> Note that there is one additional tricky question, syntactic name
>> overloading. For example, the postfix binary "f(x)" is used side by side
>> with the delimited unary "(x)", and so on. How to decide which such name
>> overloadings are legal? -- I got distracted, doing other things, so I
>> haven't studied this part fully yet.
>
>This one is IMHO not really much different to the overloading of "-a"
>and "a - b", or of "a++" and "++a". However, there's another
>interesting case:
The problem is really in figuring out which combinations that are
allowable. The problem of separating prefix, postfix, and binary name
overloading is done by checking both the token before and the one after,
which means that it will not fit into a LR(1) parser without tweaking the
lexer. So some such analysis must be done.
>"f(a,b)" is a ternary operator with the arguments f, a and b. However,
>one could syntactically also interpret it as the binary operator
>"f(x)" applied to an expression formed by the binary comma operator.
Right. This is part of the syntactic name overloading problem. There is
also an additional operator which I did not mention in the list, used to
produce arguments of arbitrary length (like lists, etc), like in
<x_1|...|x_n>
With the syntactic components "<", "|", ">".
>Here C++ explicitly selects the first interpretation over the second
>one.
As for C++, I do not know if exactly all of its expressions can be pushed
through with an mechanism like the one we have discussed, so one may have
to augment it with a traditional grammar extensions.
>Looking at current C++ and applying an "operator oriented" view, I see
>the following rules:
>
>- Operands are separated by exactly one token; also there's at most
> one initial or terminal token.
Unless one wants to include "if (x) ... " and such. But for C++
expressions, this seems to be true.
>- In case of ambiguity, the longest operator token sequence wins
> (f.ex. "postfix (,)" wins over "postfix ()"). This is somewhat
> similar to the token forming rules.
No, in the case of C++, no such rule is necessary: When comparing "f(x,
y)" and "f(x)", at the point the parser has read "x", it will compare ","
and ")" in order to make an unambiguous decision.
>- In no case is one expression directly following another (there is
> no "whitespace operator").
All expressions are terminated by a ";", which is important for the
parsing: When the ";" arrives, one knows that all operators that have been
shifted can be reduced.
>There's also another problem: To identify which tokens are operators,
>and which tokens are operands.
The simplest way to solve this is to have a lookup table that the lexer uses.
Currently in C++ there is a set of
>tokens which is explicitly reserved for operators, and another set
>which can be used as operand. If one would allow to define arbitrary
>operators, one could either maintain the distinction (for example by
>naming operators "`foo'" instead of plain "foo", so "a `cross' b" is
>unambiguously identified as infix operator `cross' applied to a and
>b), or allow ordinary identifiers as operators, together with rules to
>identify the operator tokens.
The most advanced rule that I think of is to simply read one character at
a time and check the so found sequence of characters against the token
table; when the token table indicates that there are no longer matches,
one has found a token (or none, in which case there is an error).
For C++ it might be reasonable to have two classes operator names, and
identifiers. It would be simplest to not allow them mix, but one can think
of allowing identifier names to be operators, like in "x = a mod 8".
Then the simplest rule would be to prohibit identifier names used as
operators (like "mod" above) to be used as identifiers. That is, in
binary right mod; // Declare "mod" to be binary operator.
int operator mod(int x, int y) { ... } // Define "mod"
int x = 10 mod 8; // Legal
int mod = 7; // Illegal because "mod" has been declared as an operator.
The problem with this is clearly that is may screw up some code by
importing the declaration of the "mod" operator. But perhaps this can be
resolved if operator declarations are attached to namespaces. Then one
would have say
namespace std {
binary right mod; // Declare "mod" to be binary operator.
int operator mod(int x, int y) { ... } // Define "mod"
}
int x = 10 std::mod 8; // Legal
int mod = 7; // Legal
using std::mod;
mod = 8; // Illegal, confusing with the operator std::mod.
>Of course the question which operators should be overloadable still
>remains. For example, should one be allowed to overload operators
>which fit existing expressions? One might find overloading "infix * +"
>useful to provide an optimized version of the combined operator. But
>then it also poses problems: What if the expression a * b + c is
>encountered where a, b and c don't fit this operator, but * and +
>separately would work? We wouldn't really want the expression to fail.
>But then, should the operator be split off? Or should the parsing of
>the operator tokens be deferred until the types of the operands are
>known? But what if the operands are expressions themselves?
This is part of the syntax name overloading problem I have not yet fully
analyzed. I think one can resolve it by working towards a LR(1) parser.
Then it would look down the stack plus a lookahead token for matches until
an unambiguous choice.
The question is really how advanced one wants it to be: It should be as
simple as possible, so that the writer of the C++ compiler does not have
to implement a full dynamic LR(1) parser generator just for this problem.
>Another thing one would probably not want to allow (although it would
>technically make no problems) would be to overload e.g. "operator [ ,"
>because this would lead to unpaired '['. So one wants some additional
>restrictions on operator overloading. Maybe one should restrict just
>to explicitly named operators beside the standard ones.
In C++, operators such a "f[x]" would be predefined. But then one can add
new operators as long as there are no conflicts with the parsing system.
This seems simplest. (The question is otherwise somewhat like: "Should not
C++ prohibit identifier names that could be interpreted as offensive?")
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Douglas Gregor <gregod@cs.rpi.edu>
Date: Tue, 28 Aug 2001 16:41:12 GMT Raw View
Hans Aberg wrote:
> In article <t31ym14cxl.fsf@watts.itp.tuwien.ac.at>, Christopher Eltschka
> <celtschk@web.de> wrote:
>
>>remove.haberg@matematik.su.se (Hans Aberg) writes:
>>Well, it's always the problem to find the right balance between being
>>simple and being general and flexible. Precedence numbers are the
>>simplest, but most unflexible solution. Operator pair rules are the
>>most general and flexible, but also the most complex solution. I'm
>>sure there are lots of solutions in between.
>
> A one in between suggestion is that operators are locally totally ordered,
> that is, one might specify say "=" < "+", which would meant that p(=, +) =
> shift, p(+, =) = reduce. For binary operators one will have to indicate
> p(=, =) = shift, p(+, +) = reduce by declaring "=" right and "+" left
> associative. If "=" < "+" and "+" < "*", then one will assume that "=" <
> "*". If "+" = "-", then I used the rule p("+", "-") := p("+", "+").
This solution strikes me as most natural. I recently implemented a parser
for user-defined operators similar to what I would want to see in C++, and
it used simple precedence relationships < (lower precedence than), = (equal
precedence to), and > (greater precedence than). These precedence
relationships are transitive, so:
'+' < '*';
'+' = '-';
'/' = '*';
Also implies '-' < '/', etc.
Infix operators are defined to be left, right, or non-associative. If
operators of the same precedence need to be compared, two left-associative
operators result in left associativity; two right-associative operators
result in right-associativity; all other combinations result in a parse
error.
> Then one idea might be to associate such precedence rule with the
> namespaces, in which case one will have to figure out to merge them when
> more than one namespace is used. Conflicts can always be resolve by
> inserting parentheses,
> but one other suggestion is that one can write the operator names
> explicitly, like x std::+ y if there is a conflict present. One should
> also be able to override old precedence declarations.
I'm going to reassert the possibility of using a precedence graph. A
precedence graph is a directed graph where:
- Each vertex represents a set of operators at equal precedence. So a
precedence relation '+' = '-' would merge the vertices of '+' and '-'.
- Each edge (u, v) means that the operators in u have a lower precedence
than the operators in v.
Given two operators, x and y, if y is reachable from x, then y has a higher
precedence than x; if x is reachable from y, then x has a higher precedence
and y. If there is a cycle between x and y or if neither is reachable from
the other, the relative precedence is unknown.
Each namespace will have its own precedence graph. In an expression, the
precedence graph from the current namespace is merged with the precedence
graphs from all namespaces found through Koenig lookup. Merging of
precedence graphs requires merging vertices with the same operator. For
instance, if I have two precedence graphs:
A: {infix +, infix -} < {infix *, infix /} < {prefix +} < {infix pow}
B: {infix +} < {infix *} < {postfix +, postfix *}
We will merge vertices based on operator names and positions:
A's {infix +, infix -} with B's {infix +}
A's {infix *, infix /} with B's {infix *}
The resulting precedence graph A+B will have edges:
({infix +, infix -}, {infix *, infix /})
({infix *, infix /}, {prefix +})
({infix *, infix /}, {postfix +, postfix *})
({prefix +}, {infix pow})
Associativities are easy to merge. When merging infix operators x and y,
then resulting associativity is:
left: if x and y are both left-associative
right: if x and y are both right-associative
otherwise non-associative.
Parentheses may always be used to disambiguate expressions, so I don't
believe we need an explicit mechanism to specify which operator '+' we want
in, e.g., x + y. That's for semantic analysis and not parsing.
Note that this scheme requires a two-stage parser. The first stage just
collects a list of tokens. For any operand token, the precedence graph for
the namespace of that operand's type is merged into the current precedence
graph for the current expression. Once all precedence graphs have been
merged, the list of tokens can be reparsed as an expression.
[snip]
>>> Even though I figured that in the case of C++, one might not need
>>> to define non-associativity, because that is taken care of the type
>>> system. For example, if one does not want to allow a = b = c, then one
>>> can define operator= for the assignment b = c return say "void", and if
>>> the operator= of the class that a belongs to does not accept void as an
>>> argument, then a = b = c will not be possible. This might then simplify
>>> the assignment of operator precedence in C++.
>>
>>But that's not quite the same: Here, by returning void, you not only
>>forbid "a = b = c", but also "(a = b) = c" and "a = (b = c)". In my
>>power operator example above, neither "(a power b) power c" nor
>>"a power (b power c)" are forbidden. Only "a power b power c" is. And
>>a power operator which returns vois wouldn't be too useful anyway :-)
>
> OK. In fact it speaks for not specifying non-associativity, and use types
> instead. I think one should not try to put too much into the syntax and
> instead use semantics when appropriate, as the syntax analysis tends to be
> too crude.
Non-associativity naturally occurs when there is a conflict in
associativity of operators. If I have a left-associative operator + and a
right-associative operator * at the same precedence, and the expression
x * y + z, the relative associativity of * and + is non-associative.
Associativity is a syntactic property, not a semantic one. Using a void
return type to make an associative operator non-associative is a hack that
should not be required.
>>> Note that there is one additional tricky question, syntactic name
>>> overloading. For example, the postfix binary "f(x)" is used side by side
>>> with the delimited unary "(x)", and so on. How to decide which such name
>>> overloadings are legal? -- I got distracted, doing other things, so I
>>> haven't studied this part fully yet.
>>
>>This one is IMHO not really much different to the overloading of "-a"
>>and "a - b", or of "a++" and "++a". However, there's another
>>interesting case:
>
> The problem is really in figuring out which combinations that are
> allowable. The problem of separating prefix, postfix, and binary name
> overloading is done by checking both the token before and the one after,
> which means that it will not fit into a LR(1) parser without tweaking the
> lexer. So some such analysis must be done.
It can be done with a one-token lookahead. One can construct a DFA with two
states: pre-operand and post-operand. Transitions are based on the type of
the incoming token, which can be an operand, prefix operator, postfix
operator, infix operator, etc, etc. Intersecting the set of valid outgoing
states with the possible types of the incoming token resolves many
ambiguities. For instance:
If we are in the pre-operand state, we can only get an operand, a prefix
operator, or a confix open operator (e.g., open parentheses). Thus if we
get an operator token that can be infix, prefix, or postfix (like '+' in
the above example), we intersect these sets and find that '+' in this
context can only be prefix.
See
http://www.cs.rpi.edu/~gregod/Sugar/doc/design.html#state_machine
for a diagram of the state machine I use.
One example of an ambiguity that requires a one-token lookahead is with a
postfix and infix operator. For instance, if we have a '+' operator that
can be infix or postfix, and the expression x + + y, we don't know that the
first '+' is postfix until we see the second '+'. This can be resolved
using the state machine mentioned above:
when we see the first '+', intersection of {postfix, infix} with the set of
valid transitions from the post-operand state and still have {postfix,
infix}. We know that the postfix transition keeps the state machine in the
post-operand state, whereas the infix transition moves to the pre-operand
state. So we look ahead to the next token: if that token always results in
an invalid state for one of the interpretations of the current token but
results in a valid state for the other interpretation of the current token,
we choose to interpret the current token so that a valid parse continues.
Continuing the example: when we see the second '+', we intersect its
positions {postfix, infix} with the valid positions outgoing from the
pre-operand state {prefix, confix open, operand} and get the empty set: so,
if the first '+' was an infix operator, we would end up in the pre-operand
state and have nowhere to go when we see the second '+'.
We then intersect '+''s positions with the valid positions outgoing from
the post-operand state {postfix, infix, function open, function close,
confix close} and get {postfix, infix}. Thus, the first '+' can only be
interpreted as a postfix operator to get to a successful parse.
Some operator combinations require arbitrary lookahead, of course. They are
enumerated in the document mentioned above.
>>"f(a,b)" is a ternary operator with the arguments f, a and b. However,
>>one could syntactically also interpret it as the binary operator
>>"f(x)" applied to an expression formed by the binary comma operator.
>
> Right. This is part of the syntactic name overloading problem. There is
> also an additional operator which I did not mention in the list, used to
> produce arguments of arbitrary length (like lists, etc), like in
> <x_1|...|x_n>
> With the syntactic components "<", "|", ">".
>
>>Here C++ explicitly selects the first interpretation over the second
>>one.
>
> As for C++, I do not know if exactly all of its expressions can be pushed
> through with an mechanism like the one we have discussed, so one may have
> to augment it with a traditional grammar extensions.
Reshaping the parse tree to resolve this syntax seems perfectly reasonable.
[snip]
>>Another thing one would probably not want to allow (although it would
>>technically make no problems) would be to overload e.g. "operator [ ,"
>>because this would lead to unpaired '['. So one wants some additional
>>restrictions on operator overloading. Maybe one should restrict just
>>to explicitly named operators beside the standard ones.
>
> In C++, operators such a "f[x]" would be predefined. But then one can add
> new operators as long as there are no conflicts with the parsing system.
> This seems simplest. (The question is otherwise somewhat like: "Should not
> C++ prohibit identifier names that could be interpreted as offensive?")
I think it would be reasonable to avoid forcing compiler writers to deal
with arbitrary lookahead. Humans are parsers too, so constructs that are
really nasty to parse programmatically are often nasty to read. When
discussing the ambiguity problems with user-defined operators with a friend
a few years ago, we kept coming back to the same ugly example:
|x| | |y|
where the confix operator | | represented absolute value, and the infix
operator | represented bitwise or. Parsing this requires arbitrary
lookahead, which we wanted to avoid. After a while we realized how truly
ugly that is. It's hard to read for a human, too, and I don't think it
would be wrong to make the user write:
abs(x) | abs(y)
It's a matter of taste, of course, but at some point the ugly examples that
make parsing hard are just that: ugly, and not productive.
I've implemented an expression parser for user-defined operators based on
the precedence graph and what I've mentioned above. It supports operators
that are prefix, infix (left-, right-, or non-associative), postfix, confix
(grouping), and function application. One of the included example/test
programs is a language for user-defined operators. One uses simple
statements to create operators and define their relative precedences.
Following my signature is a partial definition for mathematical operators
that can be used to parse expressions using those operators and emit the
fully-parenthesized form of the expression tree.
My hope is that I can define most (all?) of C++ expressions using this
language. The parser is written in C++ and is available at:
http://www.cs.rpi.edu/~gregod/Sugar/Sugar.tgz
It requires the Boost (http://www.boost.org) libraries.
Doug
prefix operator+;
prefix operator-;
left_assoc operator+;
left_assoc operator-;
left_assoc operator*;
left_assoc operator/;
right_assoc operator^;
postfix operator!;
left_assoc operator,;
confix operator [ ];
confix operator ( ];
confix operator ( );
confix operator [ );
precedence infix + = infix -;
precedence * = /;
precedence * > infix +;
precedence * < ^;
precedence ! > *;
precedence ! < ^;
precedence , < infix +;
precedence prefix + = prefix -;
precedence prefix - < ^;
precedence prefix - > *;
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Christopher Eltschka <celtschk@web.de>
Date: Fri, 24 Aug 2001 17:08:15 GMT Raw View
remove.haberg@matematik.su.se (Hans Aberg) writes:
> In article <t3n14qfqoy.fsf@watts.itp.tuwien.ac.at>, Christopher Eltschka
> <celtschk@web.de> wrote:
> >Interesting. But then, one could replace the concept of a global
> >precedence by just defining for each pair of "open right" and "open
> >left" separately which one binds stronger.
>
> Right: With such local precedence rules, one would only get errors when
> uncombinable operators appear next to each other during the course of the
> reduction of an expression. In a system trying to combine different sets
> of precedence rules, such a local precedence function could prove
> important in order to help detecting errors.
>
> I just avoided the issue, using precedence numbers, because it is simpler,
> and the question which precedence function one should use is irrelevant
> for the aspects that I studied: When an expression is reduced, operators
> may need to be compared, and it is irrelevant what function one is using..
>
> I think it would be great if somebody else would start to think about
> designing such a precedence function.
Well, it's always the problem to find the right balance between being
simple and being general and flexible. Precedence numbers are the
simplest, but most unflexible solution. Operator pair rules are the
most general and flexible, but also the most complex solution. I'm
sure there are lots of solutions in between. Which one is the right
one is the hard question. Precedence numbers seem to unflexible to me,
but then I can see that local pair rules might be too complex.
However, local pair rules are what each way to do it finally describes
- with certain restrictions on the set of rules. Therefore I guess the
right way to analyse the problem is on the level of pair rules,
identifying the restriction one wants to impose, and then trying to
find the simplest way to specify those without loosing too much
flexibility.
>
> > If none binds stronger,
> >it's an error to have them "share" an argument.
>
> Note that binary operators like '+' and '=' can be described as left and
> right associative by telling whether the left hand precedence is
> higher/lower than the right hand precedence of the same operator. But
> other than this left/right associativity, operators infix operators used
> in practise seem to have the same left/right precedence.
>
> >Let's make an example with the following made-up syntax:
> >
> >"foo :( bar" means "'foo arg bar' is 'foo (arg bar)'"
> >"foo :) bar" means "'foo arg bar' is '(foo arg) bar'"
> >
> >This would also allow to remove the left/right distinction, by
> >just pairing the operator with itself:
> >
> >infix + :) infix + "a + b + c is (a + b) + c"
> >infix = :( infix = "a = b = c is a = (b = c)"
> >prefix * :( postfix ++ "*p++ is *(p++)
> >infix = :( infix ?: "a = b ? c : d is a = (b ? c : d)"
> >
> >Now if you have two operators which don't have such a pair defined,
> >you cannot have them "share" an argument.
>
> Right. What is needed is the following: Let V be the set of operators. Then
> one needs a function p: S -> { reduce, shift }, where S is subset of V x
> V. To "reduce" means to compute immediately, and to "shift" means to stack
> the operator and compute it later. For example, p('+', '+') = reduce, and
> p('=', '=') = shift.
Well, shift/reduce are parser operations. While this is certainly what
it finally reduces to (no pun intended), I prefer to remain in the
problem domain as much as possible. The problem is not the parser
mechanics, the problem is the interpretation of the expression (if I
interpret an expression, I'm quite sure that my brain does neither
shift nor reduce). Of course, the fact that this interpretation can be
translated 1:1 into parser rules is not unimportant (after all, it
should be parsed at some time).
>
> > For example, you might not
> >want to give a power operator a defined associativity, then you don't
> >define such a pair for power with itself, and therefore the expression
> >"a power b power c" is not allowed, but you have to explicitly write
> >either "(a power b) power c" or "a power (b power c)".
>
> Right. Even though I figured that in the case of C++, one might not need
> to define non-associativity, because that is taken care of the type
> system. For example, if one does not want to allow a = b = c, then one can
> define operator= for the assignment b = c return say "void", and if the
> operator= of the class that a belongs to does not accept void as an
> argument, then a = b = c will not be possible. This might then simplify
> the assignment of operator precedence in C++.
But that's not quite the same: Here, by returning void, you not only
forbid "a = b = c", but also "(a = b) = c" and "a = (b = c)". In my
power operator example above, neither "(a power b) power c" nor
"a power (b power c)" are forbidden. Only "a power b power c" is. And
a power operator which returns vois wouldn't be too useful anyway :-)
>
> Note that there is one additional tricky question, syntactic name
> overloading. For example, the postfix binary "f(x)" is used side by side
> with the delimited unary "(x)", and so on. How to decide which such name
> overloadings are legal? -- I got distracted, doing other things, so I
> haven't studied this part fully yet.
This one is IMHO not really much different to the overloading of "-a"
and "a - b", or of "a++" and "++a". However, there's another
interesting case:
"f(a,b)" is a ternary operator with the arguments f, a and b. However,
one could syntactically also interpret it as the binary operator
"f(x)" applied to an expression formed by the binary comma operator.
Here C++ explicitly selects the first interpretation over the second
one.
Looking at current C++ and applying an "operator oriented" view, I see
the following rules:
- Operands are separated by exactly one token; also there's at most
one initial or terminal token. (For example, operator += is one
token, instead of built by the separate tokens + and =).
Side remark: This one-token rule is the source of the template
error "foo<bar<baz>>": if operator>> were built from two '>',
there would not be the token '>>', and the template specification
above would work.
- In case of ambiguity, the longest operator token sequence wins
(f.ex. "postfix (,)" wins over "postfix ()"). This is somewhat
similar to the token forming rules.
- In no case is one expression directly following another (there is
no "whitespace operator").
There's also another problem: To identify which tokens are operators,
and which tokens are operands. Currently in C++ three is a set of
tokens which is explicitly reserved for operators, and another set
which can be used as operand. If one would allow to define arbitrary
operators, one could either maintain the distinction (for example by
naming operators "`foo'" instead of plain "foo", so "a `cross' b" is
unambiguously identified as infix operator `cross' applied to a and
b), or allow ordinary identifiers as operators, together with rules to
identify the operator tokens. The latter gets more complex, since even
if applying the "one-token-rule" of C++ built-in tokens, there are six
different ways to interpret the expression "a b c":
"prefix a(prefix b(c))"
"prefix a(postfix c(b))"
"postfix c(prefix a(b))"
"delimited a c(b)"
"prefix c(prefix b(a))"
"infix b(a,b)"
Now C++ already has this sort of stuff (what is "a * b;"?), but that
doesn't say that it's a good idea to add more such stuff. I'd use
clearly separated operator names ("`foo'" is just one possibility;
another one would be "@foo"; it's hard to find something which isn't
currently legal C++ syntax in any case)
Of course the question which operators should be overloadable still
remains. For example, should one be allowed to overload operators
which fit existing expressions? One might find overloading "infix * +"
useful to provide an optimized version of the combined operator. But
then it also poses problems: What if the expression a * b + c is
encountered where a, b and c don't fit this operator, but * and +
separately would work? We wouldn't really want the expression to fail.
But then, should the operator be split off? Or should the parsing of
the operator tokens be deferred until the types of the operands are
known? But what if the operands are expressions themselves?
Generally this problem occurs as soon as two operators of the same
"left kind" (i.e. both left delimited, or both left open) start with
the same token sequence. AFAIK the function call operators are the
only case where this occurs in C++. Here the resolution is clearly to
fail ("f(a,b)" is not resolved to "f((a,b))" if no two-argument
function, but an one-argument function as available).
Another thing one would probably not want to allow (although it would
technically make no problems) would be to overload e.g. "operator [ ,"
because this would lead to unpaired '['. So one wants some additional
restrictions on operator overloading. Maybe one should restrict just
to explicitly named operators beside the standard ones.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Christopher Eltschka <celtschk@web.de>
Date: Thu, 23 Aug 2001 21:52:37 GMT Raw View
remove.haberg@matematik.su.se (Hans Aberg) writes:
> In article <PseWnUAxTPT7Ewfy@ntlworld.com>, Francis Glassborow
> <francisG@robinton.demon.co.uk> wrote:
> >>Or what exactly do you have in your mind?
>
> >Simply that the concept of precedence works for C, but it does not for
> >C++ (specifically the problem is with assignments and the conditional
> >operator)
>
> I did have some worries about that too in the beginning, but it turns out
> that it is not a problem. I made a version handling that specifically, one
> defines for ? : precedence and associativity left/right.
>
> I work now on a more general operator precedence theory. One advantage,
> apart from the generality, is that the actual implementation becomes
> simpler, because one needs not work through so many special cases, as that
> is handled by the general theory. Some details:
>
> An n-ary operator f is defined to be a sequence of tokens (f_0, ..., f_n),
> where f_0 and f_n, the left/right delimiters may be empty. If one of the
> left/right delimiters are not present, one must have a precedence number
> available. It is supposed to act on the arguments x_1, ..., x_n
> syntactically as
> f_0 x_1 f_1 ... f_{n-1} x_n f_n
>
> If neither of the left/right delimiters are present, then the operator is
> called infix. Then the operator can bind left or right in the parsing
> (associativity). If only the left (resp. right) delimiter is present, the
> operator is called prefix (resp. postfix). If both left and right
> delimiters are present, it is called delimited.
>
> Thus an operator can have the attributes
> left right prefix postfix delimited
> And for the different arities n, one has the following possible types:
> nullary (n = 0): delimited.
> unary (n = 1): prefix, postfix, delimited.
> n >= 2: infix, prefix, postfix, delimited.
>
> And some examples of operators: Write arguments as x, y, z, f, A.
> nullary:
> delimited identifier producing a value.
>
> unary:
> prefix -x
> postfix x++ in C/C++
> delimited (x)
>
> binary:
> infix:
> left x + y
> right x = y -- C/C++ assignment
> prefix (x)f -- Function operand written to the left.
> if x then y
> postfix f(x) -- Function operand written to the right.
> delimited <x|y> -- Scalar product.
>
> ternary:
> infix:
> left
> right x ? y : z -- Conditional expression in C/C++
> prefix if x then y else z
> postfix
> delimited <x|A|y> -- Feynman "bra-ket" notation of Hilbert
> space operators in physics.
Interesting. But then, one could replace the concept of a global
precedence by just defining for each pair of "open right" and "open
left" separately which one binds stronger. If none binds stronger,
it's an error to have them "share" an argument.
Let's make an example with the following made-up syntax:
"foo :( bar" means "'foo arg bar' is 'foo (arg bar)'"
"foo :) bar" means "'foo arg bar' is '(foo arg) bar'"
This would also allow to remove the left/right distinction, by
just pairing the operator with itself:
infix + :) infix + "a + b + c is (a + b) + c"
infix = :( infix = "a = b = c is a = (b = c)"
prefix * :( postfix ++ "*p++ is *(p++)
infix = :( infix ?: "a = b ? c : d is a = (b ? c : d)"
Now if you have two operators which don't have such a pair defined,
you cannot have them "share" an argument. For example, you might not
want to give a power operator a defined associativity, then you don't
define such a pair for power with itself, and therefore the expression
"a power b power c" is not allowed, but you have to explicitly write
either "(a power b) power c" or "a power (b power c)".
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 24 Aug 2001 10:09:34 GMT Raw View
In article <t3n14qfqoy.fsf@watts.itp.tuwien.ac.at>, Christopher Eltschka
<celtschk@web.de> wrote:
>Interesting. But then, one could replace the concept of a global
>precedence by just defining for each pair of "open right" and "open
>left" separately which one binds stronger.
Right: With such local precedence rules, one would only get errors when
uncombinable operators appear next to each other during the course of the
reduction of an expression. In a system trying to combine different sets
of precedence rules, such a local precedence function could prove
important in order to help detecting errors.
I just avoided the issue, using precedence numbers, because it is simpler,
and the question which precedence function one should use is irrelevant
for the aspects that I studied: When an expression is reduced, operators
may need to be compared, and it is irrelevant what function one is using..
I think it would be great if somebody else would start to think about
designing such a precedence function.
> If none binds stronger,
>it's an error to have them "share" an argument.
Note that binary operators like '+' and '=' can be described as left and
right associative by telling whether the left hand precedence is
higher/lower than the right hand precedence of the same operator. But
other than this left/right associativity, operators infix operators used
in practise seem to have the same left/right precedence.
>Let's make an example with the following made-up syntax:
>
>"foo :( bar" means "'foo arg bar' is 'foo (arg bar)'"
>"foo :) bar" means "'foo arg bar' is '(foo arg) bar'"
>
>This would also allow to remove the left/right distinction, by
>just pairing the operator with itself:
>
>infix + :) infix + "a + b + c is (a + b) + c"
>infix = :( infix = "a = b = c is a = (b = c)"
>prefix * :( postfix ++ "*p++ is *(p++)
>infix = :( infix ?: "a = b ? c : d is a = (b ? c : d)"
>
>Now if you have two operators which don't have such a pair defined,
>you cannot have them "share" an argument.
Right. What is needed is the following: Let V be the set of operators. Then
one needs a function p: S -> { reduce, shift }, where S is subset of V x
V. To "reduce" means to compute immediately, and to "shift" means to stack
the operator and compute it later. For example, p('+', '+') = reduce, and
p('=', '=') = shift.
> For example, you might not
>want to give a power operator a defined associativity, then you don't
>define such a pair for power with itself, and therefore the expression
>"a power b power c" is not allowed, but you have to explicitly write
>either "(a power b) power c" or "a power (b power c)".
Right. Even though I figured that in the case of C++, one might not need
to define non-associativity, because that is taken care of the type
system. For example, if one does not want to allow a = b = c, then one can
define operator= for the assignment b = c return say "void", and if the
operator= of the class that a belongs to does not accept void as an
argument, then a = b = c will not be possible. This might then simplify
the assignment of operator precedence in C++.
Note that there is one additional tricky question, syntactic name
overloading. For example, the postfix binary "f(x)" is used side by side
with the delimited unary "(x)", and so on. How to decide which such name
overloadings are legal? -- I got distracted, doing other things, so I
haven't studied this part fully yet.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 16 Jul 2001 21:40:27 GMT Raw View
In article <eDcSHHgdulIC-pn2-TKRW4cELdrMW@zamboni.stiscan.com>,
kgw-zamboni-news@stiscan.com wrote:
>>But one way to resolve it without namespace poly could be
>> poly::polynomial p, x;
>> p = x poly::^ 2 + 5;
>>that is indicating the full name of the operator. Or somehow indicating
>>that one is using the syntax of the namespace poly.
>>
>>It seems possible to work this out.
>>
>But would it be useful?
>If you start cluttering up expressions with qualifiers it becomes
>unreadable and you might as well use methods.
If ambiguities occur, one wants a method that guarantees top resolve it.
In addition, one wants some more convenient method for normal use.
In math, if $a + b$ becomes ambiguous, one can write out which '+' is
intended by using subscripts: $a +_\C b$, etc. I just made the nearest
equivalent in C++.
>(x->operator^( 2))->operator+(5);
>
>At least the class of the operand is not needed.
In this case you would still have to write
(x->poly::operator^( 2))->poly::operator+(5);
unless you have some additional mechanism like "using". So there is no
particular gain here, and with increased syntax, one is merely facing the
same kind of problems as before.
>The bigest problem to me is:
>
>x ^ y # z can have different precedence depending on the types.
Having such a system where precedence depends on types seems excluded for
use with C++ at least, because it will be too complicated.
>Maybe except for predefined operators, all operators not ending in
>= would have the same precedence less than the predefineds and be left
>associative.
>And operators ending in = would have the same precedence and
>associativitly as assignment.
It depends on how special system you want: If one somehow should enforce
new syntaxes to be C++ like, or if one should settle for something more
general.
I have no particular opinion on that: My motivation for investigating this
was to find something useful in other contexts than C++.
But one can note that for example <= can be also read as an arrow "implied
by". In a Mini-Prolog version I write on now, one can write
all x: x in natural => s(x) in natural.
or
all x: s(x) in natural <= x in natural.
instead of the standard Prolog
natural(s(x)) :- natural(x).
So, from a more general point of view, it may be bad to impose such defaults.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 16 Jul 2001 21:42:53 GMT Raw View
In article <ydZ37.39782$T97.6172438@typhoon.nyroc.rr.com>,
gregod@cs.rpi.edu wrote:
>First of all, I'm assuming precedence relations are transitive.
I will describe exactly what is needed from the precedence system, in case
you or somebody else want to work out a suggestion:
One need only consider cases of operator components f_r and g_l (r =
right, l = left) as I consider operators with multiple components)
appearing syntactically next to each other, f before g, during the course
of evaluation such that f is not a right delimiter and g is not a left
delimiter. The expression "appearing syntactically next to each other"
need some clarification, as it may happen that an evaluation reduces away
an intermediate operator. For example, in (using C/C++ precedence)
a ^ b * c + d
first '^' and '*' will be stacked, but when '+' arrives, the '*' will be
computed leaving one with the situation
a ^ e + d
where e = b * c. So in the next comparison, f_r = '^' and g_l = '+'
despite the fact that they did not appear next to each other.
Now what is needed is a precedence function p(f, g) that for each such
operators where f does not have right delimiter and g does not have a left
delimiter, at the time such a comparison (f_r, g_l) happens in the course
of the evaluation can tell whether to reduce (immediately compute f
applied to its arguments) or shift (put g on the stack, deferring the
computation until later).
For example, the fact that '+' (resp. '=') is left (resp. right)
associative can be viewed as a consequence of that
p('+', '+') = reduce (resp. p('=', '=') = shift)
In when parsing a + b + c (resp. a = b = c), first 'a', '+' (resp. '=')
and 'b' are stacked, and when comparing p('+', '+') (resp. p('=', '='))
the move will be to compute immediately (resp. later on). So the parsing
becomes
(a+b)+c (resp. a=(b=c))
One way to think of p(f, g) is as saying that the right precedence of f is
> than the left precedence of g.
> A possible
>implementation is to use a directed acyclic graph.
So this is possible; one only have to add a left or right precedence rule
for infix operators, so that they can be compared with themselves. -- If
operators of the same precedence are mixed, I set an arbitrary precedence
order (lower to higher) right, left, prefix, postfix.
>A cycle denotes an intransitivity and should be considered an error
>condition.
That seems OK.
> Note that if we are in multiple namespaces (both 'integer' and
>'string', for instance'), we need only merge the two graphs and check for
>cycles.
Right. That is the trick: how to do it.
First finding a method that always can resolve the ambiguities, so that
the code is usable. Then finding some convenient shorthands that will work
better in general use.
>> But one way to resolve it without namespace poly could be
>> poly::polynomial p, x;
>> p = x poly::^ 2 + 5;
>> that is indicating the full name of the operator. Or somehow indicating
>> that one is using the syntax of the namespace poly.
>>
>> It seems possible to work this out.
>
>Yes, qualifying the operators would work. It would likely also help to
>include a variation of the 'using' declaration:
>using operator poly::^;
So the poly::^ qualification would be the method that always works. The
"using" variations would be the more convenient shorthands for general
use.
I think that the problems will probably be quite local: As long as there
are no precedence conflicts between operators actually occurring in an
expression, it does not matter how much precedence relations conflict.
So one probably needs a system to fix up some expressions locally here and
there.
>What one would really want would be the equivalent of argument-dependent
>lookup, but that gets very complicated very quickly.
That seems excluded: Apart form a significantly more complicated
implementation, it will be pretty difficult for the users to figure out
the ambiguities.
>... the question becomes: how hard is it to 'fix' a precedence
>problem between libraries? With only precedence relationships and the above
>approach using a directed acyclic graph it isn't very complicated.
You still need figure out how to do it. You suggest to use something like
a partially ordered set. I get the impression that your suggestion should
work like a system implicit type conversions, with some way to ensure that
ambiguities can be resolved.
>I'll use concrete examples [for demonstrating problems with number
precedences] for libraries A and B instead of this abstract
>mumbo-jumbo.
...
>Let's look at some possible choices of precedence numbers for each library.
>Parser library:
> '|' = 5
> '>>' = 6
>
>Regex:
> '+' = 1
> '*' = 2
>
>Within each library the precedence relations are correct, but when you use
>the libraries together the precedence between regex and BNF is backwards.
>You can fix it by changing the numbers, but that requires modification of
>the library.
It is clear that when using precedence numbers, one will have to find a
way to modify them when merging the two libraries: Merely redefine those
precedence numbers.
>At some point, _something_ will have to be done so that users don't have to
>change library code to make the precedence work favorably. Library authors
>code all agree on inter-library precedences and have a giant database of
>precedence numbers for each domain, or perhaps more likely the precedence
>numbers would be based on macros that the user gets to customize:
So this would not be needed then.
I think you will face the same problem with your system, namely you must
have a mechanism to override defined precedences in libraries, even though
that it may happen that the occasion that one may need to use them will be
rarer.
All that remains is that you (or someone) knocks out some code doing it: I
think it can be done. I may also turn out to be a very nice system.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Douglas Gregor <gregod@cs.rpi.edu>
Date: Fri, 13 Jul 2001 15:23:55 GMT Raw View
Hans Aberg wrote:
> In article <eDcSHHgdulIC-pn2-Vshkzj4EmxVn@zamboni.stiscan.com>,
> kgw-zamboni-news@stiscan.com wrote:
>>>So what do you suggest should happen when operators do not compare.
>>>Should they produce errors?
>
>>I do not think that will happen much.
>
> But when it does happen, the compiler must know how to handle it.
>
>>Within one "domain", its operators and the standard ones should be
>>totally ordered.
>
> So then one could just as well use precedence numbers.
>
> It suffices though that there is an defined order between each pair that
> is actually used in an expression.
>
> There is nothing wrong with your I idea -- I have considered it before. It
> is only a little more complicated to implement, because one needs a
> mechanism to infer the operator precedences (like checking that there are
> not intransitivities) rather than merely compare two numbers.
>
>>Between two independent domains, either they would not be used in the
>>same expression or
>>their relationship is not well defined and parentheses should be used
>>anyway.
>
> Most likely, some will write modules using different syntaxes, and
> somebody else would want to import such modules. Then it must be possible
> to resolve the conflicting syntaxes, or the system will be pretty
> useless..
>
[snip]
Using floating point numbers for precedence in lieu of individual
precedence relations doesn't easily allow the resolution of conflicting
syntax. Consider two libraries, A and B, developed separately with
different operator sets. Assume there is some precedence relationship
between the operators in A and the operators in B that is most natural for
a user. Now divide groups of users into the 'careful' and 'not careful'
categories.
The careful user would check to see if the expected precedence relationship
holds between the two libraries by looking into the implementation of each
and comparing the floating point values. If it works out, great; otherwise,
how does one solve the conflict? I don't have any good ideas on this.
The non-careful user would assume a natural precedence between the two
libraries and would start coding immediately, only to be surprised when the
code does not work as expected. Eventually noticing the precedence problem,
he will either a) become a careful user next time or b) use lots of
parentheses or function calls to avoid the issue.
Contrast this with the same user categories but with a system that uses
distinct precedence relationships (.e.g., '+' < '*', '+' == '-').
The careful user would check for the expected precedence relationship.
Since the libraries were developed independently, there would be no direct
relationships between the two defined, so she would write the set of
relationships between the operators of A and B that she expects in a
separate conflict-resolution module.
The non-careful user still starts coding away, but this time instead of
surprising results, he gets errors from the compiler telling him that his
expressions are ambiguous. He opens his web browser and searches for
"precedence", "A", and "B", and then borrows the conflict-resolution module
from a careful user that was kind enough to post hers.
I guarantee you that if a floating point value system were to be used,
library authors would invent elaborate preprocessing tricks to try to make
their libraries more adaptive to other libraries.
In summary, the advantages of defining precedence relationships
comparatively instead of using floating point numbers are:
- Conflict resolution is easy and non-intrusive.
- Undefined relationships cause compiler errors.
Doug
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 13 Jul 2001 18:08:21 GMT Raw View
In article <VyC37.30325$T97.5197602@typhoon.nyroc.rr.com>,
gregod@cs.rpi.edu wrote:
>Using floating point numbers for precedence in lieu of individual
>precedence relations...
A note: I do not use floating point numbers, but only numbers
syntactically represented in the range +/-999.999999.
The only reason for using numbers is that it is easy to implement and easy
to use in the sense that it is easy to understand. Resolving syntactic
conflicts is another matter.
>Contrast this with the same user categories but with a system that uses
>distinct precedence relationships (.e.g., '+' < '*', '+' == '-').
>From the point of implementation of what I do, the only thing that is
needed is that when two operators appear immediately after one another,
one can compare them. For example, when '+' and '*' in x + y * z it should
be possible to tell if '+' < '*' or not. Thus, if only the function that
is doing the operator precedence comparisons is available, there is
otherwise no change to the program.
One thing that becomes complicated when expressing precedence using
relations like '+' < '*' is to check for transitivity: If one in addition
defines '^' < '+', does it follow that '^' < '*'? -- If that is supposed
to follow, what happens when somebody later then defines '*' < '^'?
So, assume one is using relations like '+' < '*' in order to define
precedence, the system must be able to detect those intransitivities.
Further, suppose that syntax is defined in environments, perhaps say
attached to the namespaces. Then such intransitives has to be detected
relative the outer environments.
This probably can be resolved, perhaps no more complicated than resolving
implicit type conversions, but one must give that some thinking.
>If it works out, great; otherwise,
>how does one solve the [precedence] conflict? I don't have any good ideas
on this.
The way it will work, I figure, is that the compiler before reading an
expression must know what the precedence relations of adjacent operators
are.
If the compiler then reads the expression, and discovers more than one
precedence relation between operators appearing adjacent in that
expression, it generates an error. One will then have to indicate a way to
resolve the conflict.
If precedence is attached to namespaces, say I define for use with
polynomial exponentiation
namespace poly {
precedence '*' < '^';
...
class polynomial;
}
which conflicts with the C/C++ precedence of '^', which is lower than '+'.
If I use '^' within namespace poly no problem, as the local definitions
take precedence over the global.
But one way to resolve it without namespace poly could be
poly::polynomial p, x;
p = x poly::^ 2 + 5;
that is indicating the full name of the operator. Or somehow indicating
that one is using the syntax of the namespace poly.
It seems possible to work this out.
>Consider two libraries, A and B, developed separately with
>different operator sets. Assume there is some precedence relationship
>between the operators in A and the operators in B that is most natural for
>a user.
...
>The careful user would check to see if the expected precedence relationship
>holds between the two libraries by looking into the implementation of each
>and comparing the floating point values.
...
>The non-careful user would assume a natural precedence between the two
>libraries and would start coding immediately, only to be surprised when the
>code does not work as expected.
You have a point in that when defining modules with local syntax, it is
important that t should be as restricted as possible in order to generate
errors when appropriate. And defining precedence with direct relations
like '+' < '*' may have an advantage over using numbers in that respect.
Otherwise, what I am used to from math is that one always checks the
definitions before using something. I figure that applies to other
sciences, and also for any feature in a computer language. Syntax is not
any different.
There was an expressed interest earlier to an enhanced syntax in C++. It
is a fact of life that the preferred syntax of one area is not the same as
the one of another. It is the same story as with name conflicts.
>I guarantee you that if a floating point value system were to be used,
>library authors would invent elaborate preprocessing tricks to try to make
>their libraries more adaptive to other libraries.
The use of the word "guarantee" here probably means that it is a weak
argument :-). -- I really do not see what you have in your mind here.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Douglas Gregor <gregod@cs.rpi.edu>
Date: Mon, 16 Jul 2001 16:57:50 GMT Raw View
Hans Aberg wrote:
> In article <VyC37.30325$T97.5197602@typhoon.nyroc.rr.com>,
> gregod@cs.rpi.edu wrote:
>>Using floating point numbers for precedence in lieu of individual
>>precedence relations...
>
> A note: I do not use floating point numbers, but only numbers
> syntactically represented in the range +/-999.999999.
>
> The only reason for using numbers is that it is easy to implement and easy
> to use in the sense that it is easy to understand. Resolving syntactic
> conflicts is another matter.
>
>>Contrast this with the same user categories but with a system that uses
>>distinct precedence relationships (.e.g., '+' < '*', '+' == '-').
>
>>From the point of implementation of what I do, the only thing that is
> needed is that when two operators appear immediately after one another,
> one can compare them. For example, when '+' and '*' in x + y * z it should
> be possible to tell if '+' < '*' or not. Thus, if only the function that
> is doing the operator precedence comparisons is available, there is
> otherwise no change to the program.
>
> One thing that becomes complicated when expressing precedence using
> relations like '+' < '*' is to check for transitivity: If one in addition
> defines '^' < '+', does it follow that '^' < '*'? -- If that is supposed
> to follow, what happens when somebody later then defines '*' < '^'?
>
> So, assume one is using relations like '+' < '*' in order to define
> precedence, the system must be able to detect those intransitivities.
>
> Further, suppose that syntax is defined in environments, perhaps say
> attached to the namespaces. Then such intransitives has to be detected
> relative the outer environments.
>
> This probably can be resolved, perhaps no more complicated than resolving
> implicit type conversions, but one must give that some thinking.
First of all, I'm assuming precedence relations are transitive. A possible
implementation is to use a directed acyclic graph. Each vertex is an
equivalence class of operators (i.e., integer '+' and '-' would both be in
the same equivalence class because they have the same precedence). The
source vertex of an edge has a lower precedence than the destination vertex
of the edge. So if we have operators op1 and op2:
If there is a path from op1 to op2, op1 < op2
Else If there is a path from op2 to op1, op2 < op1
Otherwise there is no relationship between op1 and op2, and we'll probably
emit an error.
A cycle denotes an intransitivity and should be considered an error
condition. Note that if we are in multiple namespaces (both 'integer' and
'string', for instance'), we need only merge the two graphs and check for
cycles.
>>If it works out, great; otherwise,
>>how does one solve the [precedence] conflict? I don't have any good ideas
> on this.
>
> The way it will work, I figure, is that the compiler before reading an
> expression must know what the precedence relations of adjacent operators
> are.
> If the compiler then reads the expression, and discovers more than one
> precedence relation between operators appearing adjacent in that
> expression, it generates an error. One will then have to indicate a way to
> resolve the conflict.
My fault for using 'conflict' poorly. When using numbers in the range +/-
999.99999 instead of symbol relationships, if there is an operator from
library A and an operator from library 'B' that are adjacent (so we need to
disambiguate the parsing with the precedence relation), there is always an
ordering between them (because operator precedence is then a total order),
even if this was not intended. I see no way to solve this.
> If precedence is attached to namespaces, say I define for use with
> polynomial exponentiation
> namespace poly {
> precedence '*' < '^';
> ...
> class polynomial;
> }
> which conflicts with the C/C++ precedence of '^', which is lower than '+'.
> If I use '^' within namespace poly no problem, as the local definitions
> take precedence over the global.
>
> But one way to resolve it without namespace poly could be
> poly::polynomial p, x;
> p = x poly::^ 2 + 5;
> that is indicating the full name of the operator. Or somehow indicating
> that one is using the syntax of the namespace poly.
>
> It seems possible to work this out.
Yes, qualifying the operators would work. It would likely also help to
include a variation of the 'using' declaration:
using operator poly::^;
What one would really want would be the equivalent of argument-dependent
lookup, but that gets very complicated very quickly.
>>Consider two libraries, A and B, developed separately with
>>different operator sets. Assume there is some precedence relationship
>>between the operators in A and the operators in B that is most natural for
>>a user.
> ...
>>The careful user would check to see if the expected precedence
>>relationship holds between the two libraries by looking into the
>>implementation of each and comparing the floating point values.
> ...
>>The non-careful user would assume a natural precedence between the two
>>libraries and would start coding immediately, only to be surprised when
>>the code does not work as expected.
>
> You have a point in that when defining modules with local syntax, it is
> important that t should be as restricted as possible in order to generate
> errors when appropriate. And defining precedence with direct relations
> like '+' < '*' may have an advantage over using numbers in that respect.
>
> Otherwise, what I am used to from math is that one always checks the
> definitions before using something. I figure that applies to other
> sciences, and also for any feature in a computer language. Syntax is not
> any different.
Agreed, but the question becomes: how hard is it to 'fix' a precedence
problem between libraries? With only precedence relationships and the above
approach using a directed acyclic graph it isn't very complicated. With
numbers for precedence, you need to change the actual numbers that live in
the library source code.
> There was an expressed interest earlier to an enhanced syntax in C++. It
> is a fact of life that the preferred syntax of one area is not the same as
> the one of another. It is the same story as with name conflicts.
>
>>I guarantee you that if a floating point value system were to be used,
>>library authors would invent elaborate preprocessing tricks to try to make
>>their libraries more adaptive to other libraries.
>
> The use of the word "guarantee" here probably means that it is a weak
> argument :-). -- I really do not see what you have in your mind here.
>
> Hans Aberg * Anti-spam: remove "remove." from email address.
> * Email: Hans Aberg <remove.haberg@member.ams.org>
> * Home Page: <http://www.matematik.su.se/~haberg/>
> * AMS member listing: <http://www.ams.org/cml/>
I'll use concrete examples for libraries A and B instead of this abstract
mumbo-jumbo. Let's assume we have a parser library that uses a form similar
to BNF for description, with | to denote alternatives and >> to denote
concatenation:
Nonterminal<std::string> x = "foo" | "wibble" >> "bar";
The obvious precedence relation is '|' < '>>'.
And we have a lexer library that represents regular expressions with + for
alternatives and uses * for concatenation.
Regex r = "x" + "y"*; // one x or zero or more y's
The precedence relationship here is '+' < '*' (both binary - we don't need
to deal with unary precedence relationships at the moment).
So it would be reasonable to express a lexer and parser succinctly as:
Nonterminal<Regex> A = "foo" * "bar"+ | "wibble" >> A >> "woo" * "foo"*;
We know that the regular expression operators should have higher precedence
than the BNF operators, so we would need to add
'bnf::>>' < 'regex::'+'
to disambiguate this expression.
Let's look at some possible choices of precedence numbers for each library.
Parser library:
'|' = 5
'>>' = 6
Regex:
'+' = 1
'*' = 2
Within each library the precedence relations are correct, but when you use
the libraries together the precedence between regex and BNF is backwards.
You can fix it by changing the numbers, but that requires modification of
the library.
At some point, _something_ will have to be done so that users don't have to
change library code to make the precedence work favorably. Library authors
code all agree on inter-library precedences and have a giant database of
precedence numbers for each domain, or perhaps more likely the precedence
numbers would be based on macros that the user gets to customize:
Parser library:
'|' = BNF_BASE_PRECEDENCE
'>>' = BNF_BASE_PRECEDENCE + BNF_PRECEDENCE_SPREAD
Regex library:
'+' = REGEX_BASE_PRECEDENCE
'*' = REGEX_BASE_PRECEDENCE + BNF_PRECEDENCE_SPREAD
Then the user just passes:
BNF_BASE_PRECEDENCE=1
BNF_PRECEDENCE_SPREAD=1
REGEX_BASE_PRECEDENCE=3
REGEX_PRECEDENCE_SPREAD=1
and everything magically works. That's a lot uglier and more fragile than:
'bnf::>>' < 'regex::'+'
Doug
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: kgw-zamboni-news@stiscan.com
Date: Mon, 16 Jul 2001 16:58:21 GMT Raw View
On Fri, 13 Jul 2001 18:08:21, remove.haberg@matematik.su.se (Hans
Aberg) wrote: WEEKDAY
>In article <VyC37.30325$T97.5197602@typhoon.nyroc.rr.com>,
>gregod@cs.rpi.edu wrote:
>
>But one way to resolve it without namespace poly could be
> poly::polynomial p, x;
> p = x poly::^ 2 + 5;
>that is indicating the full name of the operator. Or somehow indicating
>that one is using the syntax of the namespace poly.
>
>It seems possible to work this out.
>
But would it be useful?
If you start cluttering up expressions with qualifiers it becomes
unreadable
and you might as well use methods.
(x->operator^( 2))->operator+(5);
At least the class of the operand is not needed.
The bigest prblem to me is:
x ^ y # z can have different precedence depending on the types.
Even writting x A::^ y A::# z and xx B::^ yy B::# z does not help the
reader
if the precedences change order and he must keep refering to the
declarations
to resolve them.
Maybe except for predefined operators, all operators not ending in
= would have the same precedence less than the predefineds and be left
associative.
And operators ending in = would have the same precedence and
associativitly as assignment.
--
Remove -zamboni to reply
All the above is hearsay and the opinion of no one in particular
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Wed, 11 Jul 2001 17:00:29 GMT Raw View
In article <IDjn8PCBygS7EwEZ@ntlworld.com>, Francis Glassborow
<francisG@robinton.demon.co.uk> wrote:
>>>You mean like there is between assignment operators and the conditional
>>>operator ( ? : ) ? :)
>>
>>Yes, like the order defined in Harbison & Steele, "C a reference manual",
>>sec 7.2.1 (4th ed).
>
>They were writing about C, C++ is different.
This is right, but adding more operators that exist in C++ but not in C is
no problem and one can also assign types to the operators and that way
restrict their use. In addition, it is easy to complement by adding
portions of a fixed grammar, in case the operators with precedence would
not suffice. So this gives room for playing around in order to capture the
C++ grammar. (I have roughly the portions in the C++ standard, Appendix
A.4, "Expressions", in my mind.)
Or what exactly do you have in your mind?
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Thu, 12 Jul 2001 01:14:09 GMT Raw View
In article <remove.haberg-1007011012000001@du138-226.ppp.su-
anst.tninet.se>, Hans Aberg <remove.haberg@matematik.su.se> writes
>Or what exactly do you have in your mind?
Simply that the concept of precedence works for C, but it does not for
C++ (specifically the problem is with assignments and the conditional
operator)
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 12 Jul 2001 13:09:08 GMT Raw View
In article <PseWnUAxTPT7Ewfy@ntlworld.com>, Francis Glassborow
<francisG@robinton.demon.co.uk> wrote:
>>Or what exactly do you have in your mind?
>Simply that the concept of precedence works for C, but it does not for
>C++ (specifically the problem is with assignments and the conditional
>operator)
I did have some worries about that too in the beginning, but it turns out
that it is not a problem. I made a version handling that specifically, one
defines for ? : precedence and associativity left/right.
I work now on a more general operator precedence theory. One advantage,
apart from the generality, is that the actual implementation becomes
simpler, because one needs not work through so many special cases, as that
is handled by the general theory. Some details:
An n-ary operator f is defined to be a sequence of tokens (f_0, ..., f_n),
where f_0 and f_n, the left/right delimiters may be empty. If one of the
left/right delimiters are not present, one must have a precedence number
available. It is supposed to act on the arguments x_1, ..., x_n
syntactically as
f_0 x_1 f_1 ... f_{n-1} x_n f_n
If neither of the left/right delimiters are present, then the operator is
called infix. Then the operator can bind left or right in the parsing
(associativity). If only the left (resp. right) delimiter is present, the
operator is called prefix (resp. postfix). If both left and right
delimiters are present, it is called delimited.
Thus an operator can have the attributes
left right prefix postfix delimited
And for the different arities n, one has the following possible types:
nullary (n = 0): delimited.
unary (n = 1): prefix, postfix, delimited.
n >= 2: infix, prefix, postfix, delimited.
And some examples of operators: Write arguments as x, y, z, f, A.
nullary:
delimited identifier producing a value.
unary:
prefix -x
postfix x++ in C/C++
delimited (x)
binary:
infix:
left x + y
right x = y -- C/C++ assignment
prefix (x)f -- Function operand written to the left.
if x then y
postfix f(x) -- Function operand written to the right.
delimited <x|y> -- Scalar product.
ternary:
infix:
left
right x ? y : z -- Conditional expression in C/C++
prefix if x then y else z
postfix
delimited <x|A|y> -- Feynman "bra-ket" notation of Hilbert
space operators in physics.
I have found no examples of the ternary left and postfix operator or any
good examples for the higher n-aries except for obvious examples such as
variations of "(x,...,y)", so if you know of some, please let me know.
Also, one can generalize a bit and let the operator components f_i be
sequences of tokens, so that the C/C++ "if (x) y else z" construct becomes
a ternary prefix operator (the first component being the sequence of
tokens "if" and "(").
The parsing technique I use is a special case of LR(1) parsing, so one can
probably extend the parsing a bit by closing in on LR(1).
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 12 Jul 2001 19:45:57 GMT Raw View
In article <PseWnUAxTPT7Ewfy@ntlworld.com>, Francis Glassborow
<francisG@robinton.demon.co.uk> wrote:
>>Or what exactly do you have in your mind?
>Simply that the concept of precedence works for C, but it does not for
>C++ (specifically the problem is with assignments and the conditional
>operator)
I did have some worries about that too in the beginning, but it turns out
that it is not a problem. I made a version handling that specifically, one
defines for ? : precedence and associativity left/right.
I work now on a more general operator precedence theory. One advantage,
apart from the generality, is that the actual implementation becomes
simpler, because one needs not work through so many special cases, as that
is handled by the general theory. Some details:
An n-ary operator f is defined to be a sequence of tokens (f_0, ..., f_n),
where f_0 and f_n, the left/right delimiters may be empty. If one of the
left/right delimiters are not present, one must have a precedence number
available. It is supposed to act on the arguments x_1, ..., x_n
syntactically as
f_0 x_1 f_1 ... f_{n-1} x_n f_n
If neither of the left/right delimiters are present, then the operator is
called infix. Then the operator can bind left or right in the parsing
(associativity). If only the left (resp. right) delimiter is present, the
operator is called prefix (resp. postfix). If both left and right
delimiters are present, it is called delimited.
Thus an operator can have the attributes
left right prefix postfix delimited
And for the different arities n, one has the following possible types:
nullary (n = 0): delimited.
unary (n = 1): prefix, postfix, delimited.
n >= 2: infix, prefix, postfix, delimited.
And some examples of operators: Write arguments as x, y, z, f, A.
nullary:
delimited identifier producing a value.
unary:
prefix -x
postfix x++ in C/C++
delimited (x)
binary:
infix:
left x + y
right x = y -- C/C++ assignment
prefix (x)f -- Function operand written to the left.
if x then y
postfix f(x) -- Function operand written to the right.
delimited <x|y> -- Scalar product.
ternary:
infix:
left
right x ? y : z -- Conditional expression in C/C++
prefix if x then y else z
postfix
delimited <x|A|y> -- Feynman "bra-ket" notation of Hilbert
space operators in physics.
I have found no examples of the ternary left and postfix operator or any
good examples for the higher n-aries except for obvious examples such as
variations of "(x,...,y)", so if you know of some, please let me know.
Also, one can generalize a bit and let the operator components f_i be
sequences of tokens, so that the C/C++ "if (x) y else z" construct becomes
a ternary prefix operator (the first component being the sequence of
tokens "if" and "(").
The parsing technique I use is a special case of LR(1) parsing, so one can
probably extend the parsing a bit by closing in on LR(1).
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: kgw-zamboni-news@stiscan.com
Date: Fri, 6 Jul 2001 04:00:43 GMT Raw View
On Thu, 5 Jul 2001 23:20:30, remove.haberg@matematik.su.se (Hans
Aberg) wrote: WEEKDAY
>In article <eDcSHHgdulIC-pn2-eSGhXgM2krHz@zamboni.stiscan.com>,
>kgw-zamboni-news@stiscan.com wrote:
>>Rather than using numbers for precedence, how about using precedence
>>relations?
>>
>>new-operator < old-operator
>>new-operator = old-operator
>>new-operator > old-operator
>>
>>This only gives you a partial ordering but that will usually only be
>>between operators are not likely to be in the same expression probably
>>from separate name spaces.
>
>So what do you suggest should happen when operators do not compare. Should
>they produce errors?
>
>---
I do not think that will happen much.
Within one "domain", its operators and the standard ones should be
totally ordered.
Between two independent domains, either they would not be used in the
same expression or
their relationship is not well defined and parentheses should be used
anyway.
This is no worse than both defining the same operator with different
precedences.
Consider the problem of one domain defining opx < opy and the other
opy < opx.
There is no way to satisfy them at the same time.
If you say that the operand types will resolve which operator, this
means you must
produce all alternative parses and more than one may be valid and thus
ambiguous.
Total ordering would give you a result but not necessarily the one you
expect.
What happens when one domain use an identifier for an operator and
another
uses the same identifier for a class name or enum?
I have almost come to the conclusion that user define operators would
need to be lexically
distinct from identifiers?
--
Remove -zamboni to reply
All the above is hearsay and the opinion of no one in particular
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Fri, 6 Jul 2001 18:48:04 GMT Raw View
In article <eDcSHHgdulIC-pn2-Vshkzj4EmxVn@zamboni.stiscan.com>,
kgw-zamboni-news@stiscan.com wrote:
>>So what do you suggest should happen when operators do not compare. Should
>>they produce errors?
>I do not think that will happen much.
But when it does happen, the compiler must know how to handle it.
>Within one "domain", its operators and the standard ones should be
>totally ordered.
So then one could just as well use precedence numbers.
It suffices though that there is an defined order between each pair that
is actually used in an expression.
There is nothing wrong with your I idea -- I have considered it before. It
is only a little more complicated to implement, because one needs a
mechanism to infer the operator precedences (like checking that there are
not intransitivities) rather than merely compare two numbers.
>Between two independent domains, either they would not be used in the
>same expression or
>their relationship is not well defined and parentheses should be used
>anyway.
Most likely, some will write modules using different syntaxes, and
somebody else would want to import such modules. Then it must be possible
to resolve the conflicting syntaxes, or the system will be pretty
useless..
Therefore I think that these syntaxes should be associated with
namespaces, together with a way to indicate which syntax to use, when they
conflict.
>Consider the problem of one domain defining opx < opy and the other
>opy < opx.
>There is no way to satisfy them at the same time.
>If you say that the operand types will resolve which operator, this
>means you must
>produce all alternative parses and more than one may be valid and thus
>ambiguous.
Right. One could probably work it out with a Prolog like implementation,
but it would not be easy to use, or at least too complicated for a
language like C++.
>Total ordering would give you a result but not necessarily the one you
>expect.
When creating new types operators, that is most likely done in order to be
compatible with the already existing ones. So this ought not to be a big
problem. But otherwise, one can always put in parentheses.
>What happens when one domain use an identifier for an operator and
>another
>uses the same identifier for a class name or enum?
There is already the namespace mechanism in C++, which should help to
solve name conflicts.
Otherwise, there are many possible solutions here: One can have the
operator and identifier name sets to be disjoint. Or use special syntax
for operator names like x `mod` y. Or have some other mechanism to
distinguish say
namespace A {
class plus { ... };
}
namespace B {
left plus 12;
}
void f() {
using namespace A;
using namespace B;
plus x = 1 plus 2; // class A::plus and operator B::plus
...
}
Those things must be worked out, even though I decided to not work on that
aspect.
>I have almost come to the conclusion that user define operators would
>need to be lexically
>distinct from identifiers?
So that is one possibility. But would it not be nice to be able to write
z = x div y;
a = b mod c;
instead of
z = div(a, b);
a = mod(b, c),
When expressions become more complicated, such syntaxes can make it easier
for humans to read.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Sat, 7 Jul 2001 23:04:35 GMT Raw View
In article <remove.haberg-0607011108470001@du130-226.ppp.su-
anst.tninet.se>, Hans Aberg <remove.haberg@matematik.su.se> writes
>It suffices though that there is an defined order between each pair that
>is actually used in an expression.
You mean like there is between assignment operators and the conditional
operator ( ? : ) ? :)
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sun, 8 Jul 2001 23:07:38 GMT Raw View
In article <oHvv9kC9YkR7EwYV@ntlworld.com>, Francis Glassborow
<francisG@robinton.demon.co.uk> wrote:
>In article <remove.haberg-0607011108470001@du130-226.ppp.su-
>anst.tninet.se>, Hans Aberg <remove.haberg@matematik.su.se> writes
>>It suffices though that there is an defined order between each pair that
>>is actually used in an expression.
>
>You mean like there is between assignment operators and the conditional
>operator ( ? : ) ? :)
Yes, like the order defined in Harbison & Steele, "C a reference manual",
sec 7.2.1 (4th ed).
This order defined there is a total order, but what one actually needs is
only to be able to compare each pair (f, g) of operators as they appear in
an actual expression when read from left to right.
I use a Bison generated parser to sort out what is unary, binary, ternary,
and parenthesis operators. As the tokens come by, arguments are put on a
special value stack, and the operators are put on an operator stack if the
precedence rules so requires. When a new operator comes by, one compares
it with the operator on the stack, and from that determines what to do --
to "shift" (put it on the stack) or to "reduce" (apply operator evaluation
to arguments), according to standard bottom-up parsing terminology. It is
not possible to get access to Bison generated stack for dynamic
manipulations, so therefore I must build my own.
Multi-component operators such as (...) or ...?...:... requires some
special treatment, as one knows that the argument must be between the
components (...) and ?...: and not by using precedence rules there. (I am
still working in the details.) For the ternary ? : operator, there are
strictly speaking two precedences, one for the left most argument, and one
for the right most argument. Also, an expression a ? b : c ? d : e can
bind
left: (a ? b : c) ? d : e
or right: a ? b : (c ? d : e)
Right is what is used in C/C++. I have also added operator() pairs, which
can be used as f(...) for every matching delimiter pair "(" ")" chosen. --
Like [ ]; but I do not yet know what happens if one is using <...>
together with "<" and ">" as binary operators. (This is perhaps irrelevant
from the point of view of C++ additions, as one always for use with C++
can choose something more special than what I now implement.)
But as an addition to C++, if one assumes that the system chosen so it is
easy to do the operator precedence comparisons before any other operator
semantics (like name overloading) is investigated, then it looks as though
it would not be difficult to add it to a C++ compiler.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Francis Glassborow <francis.glassborow@ntlworld.com>
Date: Mon, 9 Jul 2001 19:57:55 GMT Raw View
In article <remove.haberg-0807011249520001@du132-226.ppp.su-
anst.tninet.se>, Hans Aberg <remove.haberg@matematik.su.se> writes
>>You mean like there is between assignment operators and the conditional
>>operator ( ? : ) ? :)
>
>Yes, like the order defined in Harbison & Steele, "C a reference manual",
>sec 7.2.1 (4th ed).
They were writing about C, C++ is different.
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Tue, 3 Jul 2001 17:02:23 GMT Raw View
Here is a suggestion for allowing the C++ to define operators with precedence:
I worked with Flex/Bison in order to attain a version that is easily
implementable.
Then the suggestion is that on can introduce declarations of the form
operator-precedence-definition:
operator-type operator-name precedence-number ';'
operator-type:
'prefix' | 'postfix' | 'left' | 'right'
I admit the precedence-number to be in the range +/-999.999999. The idea
with the decimals is to avoid that one feels compelled to insert
"just-as-in-case" space between numbers, in a style known from say the PL
Basic. The range is chosen just so that they fit into a 32-bit int, but I
played around with other ideas: hexadecimal numbers, infinite decimal
part, no integral part.
As for the operator-type, 'prefix' and 'postfix' are unary operators, and
'left' and 'right' are binary infix operators with the indicated parsing
(sometimes called "associativity").
Prolog and Haskell also allows one to define "non-associative" operators,
but I figure it will not be necessary in C++, as one can let the type
system take care of such restrictions. That is, if one does not want say
operator= to be associative, just let it return "void" or some other type
that does not combine with the other arguments at hand.
As for operator-name overloading with respect to operator-type, it turns
out that one can have one binary operator, one prefix operator and one
postfix operator with the same name (a maximimum of three operators with
the same of which at most one can be binary): Prefix operators can be
identified by looking at the token before, and postfix operators by
looking at the token after.
If one then has the operators in place, it appears to be straightforward
to have operator-name overloading with respect to argument type as well.
But if one wants operator precedence to vary with argument types, then
parsing will become very complicated, and it seems that it is not a
suitable system for C++. But syntax conflicts will somehow have to worked
out anyway, and I will indicate the problem by some examples:
Suppose I write a polynomials package, and I want the code to look like:
monomial x("x"), y("y"), z("z"); // Giving runtime names to some variables.
polynomial p1 = 2*x^2*y^3 + z^4, p2 = x^2 + 3;
Now this does not work out at all with the C/C++ grammar, because
operator^ has lower precedence than + and *, so p2 will parse as x^(2 +
3).
So suppose the C grammar is entered as:
// Following Harbison & Steele, "C, A Reference Manual", sec. 7.2.1.
// primary <name> 17; // Names, literals.
// postfix a[k] 17; // Subscripting (index).
// postfix f(...) 17; // Function call.
right -> 17; right . 17; // Declared as postfix in Harbison & Steele
postfix ++ 17; postfix -- 17;
prefix ++ 15; prefix -- 15; // Prefix in-/de-crement.
// prefix sizeof 15; // Size operator.
prefix ! 15; prefix ~ 15; // Logical and bitwise not.
prefix - 15; prefix + 15; // Negation and unary plus.
prefix & 15; // Address (reference).
prefix * 15; // Indirection (dereference).
// prefix (type name) 14 // Type cast.
left * 13; left / 13; left % 13;
left + 12; left - 12;
left << 11; left >> 11;
left < 10; left <= 10; left > 10; left >= 10;
left == 9; left != 9;
left & 8;
left ^ 7;
left | 6;
left && 5;
left || 4;
// Ternary ?: right 3;
right = 2; right += 2; right -= 2; right *= 2; right /= 2; right %= 2;
right &= 2; right ^= 2; right |= 2; right <<= 2; right >>= 2;
left , 1;
Then I would want my
monomial operator^(const monomial&, int);
be declared
right ^ 13.5;
The question is then how to resolve that parsing conflict with the
exclusive or operator ^ that has precedence 7 in the table above. The
system I arrive at is something like this:
Precedence rules can be localized to namespaces. That is, within a
namespace, one is first using the precedence rules declared there. Outside
namespaces, if there is a precedence conflict between operators that are
actually used in an expression, one has to indicate which syntax is to
have precedence.
So if the polynomial package are defined within the "poly" namespace:
namespace poly {
right ^ 13.5;
class monomial { ... };
class polynomial { ... };
monomial operator^(const monomial&, int);
}
then
namespace poly {
void f() {
monomial x("x"), y("y"), z("z");
// Use "poly" precedence of ^:
polynomial p1 = 2*x^2*y^3 + z^4, p2 = x^2 + 3;
}
}
Without namespace poly, one gets a syntax conflict:
void f() {
monomial x("x"), y("y"), z("z");
// Syntax conflict between "poly" and global precedence of ^:
polynomial p1 = 2*x^2*y^3 + z^4, p2 = x^2 + 3;
}
so one is forced to write say
void f() {
monomial x("x"), y("y"), z("z");
// Resolving syntax conflict between "poly" and global precedence of ^:
syntax poly, std; // First use syntax of poly, then std namespace in order.
polynomial p1 = 2*x^2*y^3 + z^4, p2 = x^2 + 3;
}
or explicitly by
void f() {
monomial x("x"), y("y"), z("z");
// Resolving syntax conflict between "poly" and global precedence of ^:
right ^ 13.5;
polynomial p1 = 2*x^2*y^3 + z^4, p2 = x^2 + 3;
}
But the disadvantage here is that one must know what the original
precedence types and numbers of ^ were.
Well, something like that comes to my mind.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: kgw-zamboni-news@stiscan.com
Date: Thu, 5 Jul 2001 16:44:28 GMT Raw View
On Tue, 3 Jul 2001 17:02:23, remove.haberg@matematik.su.se (Hans
Aberg) wrote: WEEKDAY
>Here is a suggestion for allowing the C++ to define operators with precedence:
>
>I worked with Flex/Bison in order to attain a version that is easily
>implementable.
>
>Then the suggestion is that on can introduce declarations of the form
> operator-precedence-definition:
> operator-type operator-name precedence-number ';'
>
> operator-type:
> 'prefix' | 'postfix' | 'left' | 'right'
>
>I admit the precedence-number to be in the range +/-999.999999. The idea
>with the decimals is to avoid that one feels compelled to insert
>"just-as-in-case" space between numbers, in a style known from say the PL
>Basic. The range is chosen just so that they fit into a 32-bit int, but I
>played around with other ideas: hexadecimal numbers, infinite decimal
>part, no integral part.
>
>As for the operator-type, 'prefix' and 'postfix' are unary operators, and
--
Rather than using numbers for precedence, how about using precedence
relations?
new-operator < old-operator
new-operator = old-operator
new-operator > old-operator
This only gives you a partial ordering but that will usually only be
between operators are not likely to be in the same expression probably
from separate name spaces.
The only problem I have with new operator precedence is if there is
more than one
precedence for an operator in one program, it can become confusing to
the reader.
I am almost to the point of giving all new operators the same lowest
level precedense
and saying "use parentheses". It is almost easier just to write
everything in RPN.
Remove -zamboni to reply
All the above is hearsay and the opinion of no one in particular
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Thu, 5 Jul 2001 23:20:30 GMT Raw View
In article <eDcSHHgdulIC-pn2-eSGhXgM2krHz@zamboni.stiscan.com>,
kgw-zamboni-news@stiscan.com wrote:
>Rather than using numbers for precedence, how about using precedence
>relations?
>
>new-operator < old-operator
>new-operator = old-operator
>new-operator > old-operator
>
>This only gives you a partial ordering but that will usually only be
>between operators are not likely to be in the same expression probably
>from separate name spaces.
So what do you suggest should happen when operators do not compare. Should
they produce errors?
>The only problem I have with new operator precedence is if there is
>more than one
>precedence for an operator in one program, it can become confusing to
>the reader.
So what do you suggest to do about that? -- This is the hard one.
>I am almost to the point of giving all new operators the same lowest
>level precedense
>and saying "use parentheses".
In other words, instead of writing
a = b = c + d * e;
you want to be forced to write
a = (b = (c + (d * e));
Pascal had something like confusing precedence levels of logical operators
I think, which was the reason to abandon it more modern languages.
> It is almost easier just to write
>everything in RPN.
Is this a suggestion for the next revision of the C++ standard?
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]