Thread

Topic: Prior art for exponentiation operator

Author: daveg@synaptics.com (Dave Gillespie)
Date: 11 Jul 92 08:56:13 GMT Raw View

In article <MATT.92Jul9110824@physics.berkeley.edu> matt@physics.berkeley.edu (Matt Austern) writes:
> Well, that makes life simpler, then, if the fortran and C++ unary
> minus operators behave so similarly: we can duplicate the fortran
> precedence rules for ** by just making operator~ bind more tightly
> than unary minus.

There is a strong reason not to do this:  Right now in C and C++,
the unary operators all have the same precedence, and this is the
highest precedence except for brackets and function calls.  While
this may not be "right," it is a nice symmetry that C programmers
know and it would cause huge confusion to break it.  Unary minus
should continue to be a highest-precedence operator, even if this
makes C++ slightly different from Fortran.

(Mind you, I personally prefer the approach where unary minus has
the same precedence and binary addition and subtraction, but I also
recognize that it's far too late to change it now.)

        -- Dave
--
Dave Gillespie
  daveg@synaptics.com, uunet!synaptx!daveg
  or: daveg@csvax.cs.caltech.edu

Author: karl@ima.isc.com (Karl Heuer)
Date: Sat, 11 Jul 1992 18:54:15 GMT Raw View

I just happened across this discussion.  I've long been in favor of adding a
power operator to C, so of course I would also support its presence in C++.
This article is a slight adaptation of one that I wrote for comp.lang.c some
time ago.  (Please read the entire article before responding; I do address the
most common objections later on.)

The usual explanation for the omission of a power operator is that C
operators tend to mirror common hardware instructions, with anything more
complex being handled as a library function instead.  I find this
explanation to be lacking, since (a) there *is* no standard library function
to raise numbers to *integer* powers, and (b) research has shown that almost
all uses of an exponentiation operator involve a small, constant, integer
exponent--and that subcase *is* simple enough that the function-call
overhead can easily dominate.


Here's the formal version of the proposal:

power-expression:
 cast-expression
 cast-expression *^ power-expression

multiplicative-expression:
 power-expression
 multiplicative-expression * power-expression
 multiplicative-expression / power-expression
 multiplicative-expression % power-expression

(Also, "*^=" should be added to the list of assignment operators.)

In the expression x *^ y, x can be any integral or floating expression;
its type is the type of the result.  The exponent y must be an integral
expression; if x has integral type, then y must be nonnegative.  If y==0,
then the value is 1 (even if x==0 also).


Here are my prebuttals to the usual objections.

] Why is it spelled "*^"?

Because the obvious symbols "^" and "**" are taken (the latter currently means
TIMES FETCH), and proposing an nonstandard extension that breaks existing code
is very seldom a wise move.  "^^" is free, but it breaks the symmetry between
the single-char bitwise operators and the double-char booleans; even if we
never add a logical xor, I'd rather not use that symbol.  "!" isn't too bad
(SNOBOL uses it, as I recall), but then we couldn't have a corresponding
assignment operator since "!=" is taken--besides which we'd be fighting
against the Danish proposal to bring back BCPL's "!" as an infix subscripting
operator.  Think of "*^" as a hybrid of Fortran's "**" and Basic's "^".

(The recent discussion has used binary "~", which is acceptable--but it
seems to be less mnemonic, and I don't think the operator will be used so
often that the shorter name should be preferred on the grounds of brevity.)

] Why not just use pow() from <math.h>?  It can be implemented as a builtin.

"(int)pow((double)n, 3.0)" is aesthetically inferior to "n*^3" (or even the
existing "n*n*n").  (Even in C, I personally would never omit those casts,
and I would hope that the compiler would have an option to warn me if I do.)
Also, on some machines (int)pow(2.0, 3.0) yields 7 instead of 8.  The C
Standard does not guarantee the perfect accuracy of the floating-point
library.

] How about a separate ipow() function then?  (Or overload pow() to do the
] right thing.)

That's my second choice.  It has the advantage of not requiring compiler
changes, and in C++ you don't have the problem of needing a whole family of
functions named ipow(), lpow(), upow(), ulpow(), fpow(), dpow(), ldpow() to
cover the full range of a single type-generic *^ operator.  But it's still
a cumbersome notation.  Also, *^ would be subject to compile-time constant
folding: "int bigtable[10*^DIGITS]".

] Why is 3 *^ -1 undefined instead of 0?

For the same reason that 4 << -1 isn't guaranteed to yield 2.  It's extra work
at run-time, and I can't imagine an algorithm depending on that behavior.
(Does anyone think 3u *^ -1 should yield 2863311531u on a 32-bit machine?)

] Why define 0*^0?  It's mathematically indeterminate; should be undefined.

Even in mathematical usage, it's indeterminate only when the exponent is a
continuous variable (and even then the result is "almost always" 1).  Integer
exponents are commonly defined such that 0*^0=1.  Otherwise you'd screw up
polynomials: sum a[i]*x*^i.

] The precedence is bad: if somebody writes -x*^2, they probably meant
] -(x*^2), not (-x)*^2.

Defining it that way is more trouble than it's worth.  It's simpler to just
have the compiler (and/or linter) issue a warning for an unparenthesized
-x*^2.  Besides, the proposed precedence agrees with that of bc.

] Why no floating-point exponents?

Only because this is intended to be a minimal proposal.  If you accept this
much, and want to extend it to include floating-point exponents, I certainly
have no objection.  (I note that the ANSI C Committee chose not to define
"%" on floating-point operands, on the grounds that there was already a
standard function to do it; similar reasoning could be applied to argue that
the existing pow() function is adequate for floating-point exponents.  But
personally, I think it should be done properly, with an operator.)

] Why didn't this get added to C during the ANSI C standardization?

It was proposed, and quite rightfully rejected, since there was no prior art.
I had hoped that RMS might adopt the feature for gcc, so it would be prior
art next time around, but apparently he didn't think it was worth it.  (But
since gcc is free, anyone could add it, of course.)  I was informed that one
compiler project (they did not wish to be named) has accepted the proposal,
possibly with a change in the syntax.

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint

Author: matt@physics.berkeley.edu (Matt Austern)
Date: 13 Jul 92 10:54:06 Raw View

In article <DAVEG.92Jul11015613@synaptx.synaptics.com> daveg@synaptics.com (Dave Gillespie) writes:

> There is a strong reason not to do this:  Right now in C and C++,
> the unary operators all have the same precedence, and this is the
> highest precedence except for brackets and function calls.  While
> this may not be "right," it is a nice symmetry that C programmers
> know and it would cause huge confusion to break it.  Unary minus
> should continue to be a highest-precedence operator, even if this
> makes C++ slightly different from Fortran.

Making the exponentiation operator have a higher precedence than unary
minus is a good idea not just because of compatibility with fortran
(and other computer languages), but also because it is the
mathematically sensible definition.

There are very few common cases in C++ where a unary minus is used.
In all of those cases, the existing precedence yields the correct
result; in the case of the exponentiation operator, however, it does
not.  Consider, for example (where ~ denotes exponentiation):
 x = -a + b;
 x = -a - b;
 x = -a * b;
 x = -a / b;
 x = -a ~ 2;

I think it is clear that interpreting the last line as (-a)~2 would
be, to say the least, surprising.

I believe it would be best if we just said that unary minus has the
precedence that it does because it is mathematically correct, and that
making the precedence of the exponentiation operator higher is also
mathematically correct.
--
Matthew Austern              I dreamt I was being followed by a roving band of
(510) 644-2618               of young Republicans, all wearing the same suit,
matt@physics.berkeley.edu    taunting me and shouting, "Politically correct
austern@theorm.lbl.gov       multiculturist scum!"... They were going to make
austern@lbl.bitnet      me kiss Jesse Helms's picture when I woke up.

Author: matt@physics.berkeley.edu (Matt Austern)
Date: 13 Jul 92 11:01:40 Raw View

In article <1992Jul11.185415.8631@ima.isc.com> karl@ima.isc.com (Karl Heuer) writes:


> ] Why no floating-point exponents?
> Only because this is intended to be a minimal proposal.  If you accept this
> much, and want to extend it to include floating-point exponents, I certainly
> have no objection.

I agree that the case with integer exponents is by far the most
important one; it wouldn't be a terrible burden to have to use pow()
for floating-point exponents.  On the other hand, it seems to me like
a rather pointless restriction, and one that new programmers would
find surprising.  Once the work of adding a new operator is done, how
much extra work is it to have it defined for floating-point exponents
as well?  Nobody would expect any terribly aggressive optimizations;
it would be completely satisfactory to just have the compiler replace
x *^ y with pow(x,y).


> ] Why is it spelled "*^"?
>
> Because the obvious symbols "^" and "**" are taken (the latter
> currently means TIMES FETCH), and proposing an nonstandard extension
> that breaks existing code is very seldom a wise move.  "^^" is free,
> but it breaks the symmetry between the single-char bitwise operators
> and the double-char booleans; even if we never add a logical xor,
> I'd rather not use that symbol.  "!" isn't too bad (SNOBOL uses it,
> as I recall), but then we couldn't have a corresponding assignment
> operator since "!=" is taken--besides which we'd be fighting against
> the Danish proposal to bring back BCPL's "!" as an infix
> subscripting operator.

I certainly have no objection to *^.  The only (unimportant) reason to
prefer ~ instead is that using *^ requires adding a new token to the
language.  As far as I can tell, *^ and ~ are the two obvious choices;
there are real objections to every other symbol.
--
Matthew Austern              I dreamt I was being followed by a roving band of
(510) 644-2618               of young Republicans, all wearing the same suit,
matt@physics.berkeley.edu    taunting me and shouting, "Politically correct
austern@theorm.lbl.gov       multiculturist scum!"... They were going to make
austern@lbl.bitnet      me kiss Jesse Helms's picture when I woke up.

Author: matt@physics (Matt Austern)
Date: 8 Jul 1992 20:43:15 GMT Raw View

With all of the discussion about adding an exponentiation operator to
C++, I thought it would be interesting to see how this is handled in
other languages.  (This is one of the most important points to
remember: unlike most other proposed extensions to C++, there is a
great deal of relevant experience with exponentiation operators in
other languages.)

I have summarized the rules for the fortran exponentiation operator,
**.  I chose fortran mainly because it has had a heavily used
exponentiation operator for decades; any serious flaws in its
semantics are likely to have come to light already.

Here, then, is how ** works in fortran:

(1)  PRECEDENCE.  ** binds more tightly than any other arithmetic
     operator.  (This includes unary minus; however, in fortran, the
     minus sign in "-4" isn't a unary minus, but part of the number.)

(2)  ASSOCIATIVITY.  ** associates right to left, so x**y**z means
     x**(y**z).

(3)  DATA TYPE OF THE RESULT.  The general rule is that the result has
     the same data type as the operands; if the two operands don't have
     the same type, then one may be promoted, using the hierarchy
     integer, real, double precision, complex.  (In particular, this
     means that if i and j are integers, then i**j is an integer.)
(3a) A SPECIAL CASE: the expression X**n, where n is an integer.  In this
     case, the result is the same data type as X, but n is not promoted
     to that type.  This is so that an expression like -2.1**3 is
     evaluated correctly.

(4)  ERRORS. It is an error to raise 0 to a zero or negative power.  It
     is an error to raise a real negative number to a real power.
     However, it is legal to raise a negative number to a complex or
     integral power.


Applying the "principle of least surprise," I think it is best if the
C++ exponentiation operator is as similar as possible to the
exponentiation operator in other languages, and to fortran in
particular. How much of the fortran behavior is relevant to C++?
Well, of course, we can't call it ** in C++.  Several options are
possible; the best, I think, is operator~.  Some less trivial
coments about applying the fortran behavior to C++:

(1) The part dealing with complex numbers is irrelevant, since C++ has
no complex data type.  (However, authors of complex number classes
might want to use the fortran rules as a guide when deciding how to
overload operator~ for their class.)

(2) The C++ numerical data types (char,short,int,long,float,double)
don't precisely correspond to the fortran data types.  The general idea
of promotion is the same, though, and can just be the same way that
C++ handles promotion for all other operators.

(3) The fortran rule about precedence means that -4**2 is interpreted as
(-4)**2, but -x**y is interpreted as -(x**y).  This is because the two
minus signs mean different things.  This is not the case in C++, where
the minus sign in "-4" is just the unary minus operator.  Both
expressions will be interpreted in the same way, and we simply have to
decide whether operator~ binds more or less tightly than unary
operator-.  Personally, I think that it is better for it to bind more
tightly, i.e., to have -4~2 mean -(4~2), and -x~y mean -(x~y).  I
don't care all *that* much, though; if I were writing either of those
expressions, in any language, I'd parenthesize them anyway.

(4) Fortran says that 0**0 is an error; to me, though, it seems more
in the spirit of C++ to allow this behavior to be implementation-
dependent.  (If a particular architecture has a hardware exponentiation
instruction, it would be a shame to specify the behavior of operator~
so tightly that it couldn't be easily used.)  Again, though, I regard
this as a minor issue: no matter what the standard says, programmers
had better be very careful about relying too much on the precise
behavior of 0.~0.


--
Matthew Austern              I dreamt I was being followed by a roving band of
(510) 644-2618               of young Republicans, all wearing the same suit,
matt@physics.berkeley.edu    taunting me and shouting, "Politically correct
austern@theorm.lbl.gov       multiculturist scum!"... They were going to make

Author: burley@geech.gnu.ai.mit.edu (Craig Burley)
Date: 9 Jul 92 03:37:09 GMT Raw View

In article <13fk13INNaa5@agate.berkeley.edu> matt@physics (Matt Austern) writes:

   Here, then, is how ** works in fortran:

   (1)  PRECEDENCE.  ** binds more tightly than any other arithmetic
 operator.  (This includes unary minus; however, in fortran, the
 minus sign in "-4" isn't a unary minus, but part of the number.)

A minor nit, but in Fortran, unary plus/minus _is_ an operator with the same
precedence as binary plus/minus, and in fact no two operators may appear in
succession (this is Fortran 77; maybe they relaxed that rule in Fortran 90?).
It doesn't matter that it is followed by a constant in this case -- that
"rule" permitting a minus sign to precede a constant applies, I believe, only
to cases where the context requires a constant but disallows a more general
form of an expression.  So:

    -4**2   yields -16, not 16
    2**-4   is invalid, though many compilers allow it anyway
    2+-4    is invalid as above, again because of two consecutive operators
    -A**B   is evaluated as -(A**B), not (-A)**B as would seem obvious

In fact it's a real hassle to handle, on, say, 32-bit two's-complement
machines (there are a few out there :-), the fact that 2147483648 is an
out-of-range integer (being == 2 ** 32) but -2147483648 (-(2 ** 32)) is not,
because the easy way to write a compiler means it'd be a choice of either
complaining about _both_ cases or neither.  (One expensive Fortran front end
I worked with complained about both, and we had to write the latter as
"20000000000, i.e. in octal.)  I'm handling it by delaying the complaint
until after it is certain whether there is a preceding unary minus that,
precedence-wise, applies _only_ to the questionable constant.  So,
-2147483648+2.0 is ok, but -2147483648*2.0 is not (even though it seems
"obvious" what to do, it says to multiply an out-of-range positive integer
by 2.0 after conversion, then negate the result).

   (3a) A SPECIAL CASE: the expression X**n, where n is an integer.  In this
 case, the result is the same data type as X, but n is not promoted
 to that type.  This is so that an expression like -2.1**3 is
 evaluated correctly.

Again, more precisely, it allows (-2.1)**3 to be evaluated correctly, since
-2.1**3.0 is correct anyway since it is like -(2.1**3.0).  (I.e. (-2.1)**3 is
done in the integer domain, where a negative number may be raised to a power
since the logarithm of the number isn't strictly needed, or may be obtained
by getting the log of the absolute value of the number.)

   (3) The fortran rule about precedence means that -4**2 is interpreted as
   (-4)**2, but -x**y is interpreted as -(x**y).  This is because the two
   minus signs mean different things.  This is not the case in C++, where
   the minus sign in "-4" is just the unary minus operator.  Both
   expressions will be interpreted in the same way, and we simply have to
   decide whether operator~ binds more or less tightly than unary
   operator-.  Personally, I think that it is better for it to bind more
   tightly, i.e., to have -4~2 mean -(4~2), and -x~y mean -(x~y).  I
   don't care all *that* much, though; if I were writing either of those
   expressions, in any language, I'd parenthesize them anyway.

Again, I believe that there is no problem here, that Fortran 77 treats the
minus as a unary minus operator as well even for an integer constant.  In
fact I think it'd be real annoying to have -4**2 be treated differently from
-FOUR**2, where FOUR is an INTEGER named constant with the value 4.  If
your compiler does this, I think it's either broken or not in conformance with
ANSI FORTRAN 77.  (As a point of reference, not only would my compiler evalute
-4**2 as -16, but so does AT&T's free f2c program.)

   (4) Fortran says that 0**0 is an error; to me, though, it seems more
   in the spirit of C++ to allow this behavior to be implementation-
   dependent.  (If a particular architecture has a hardware exponentiation
   instruction, it would be a shame to specify the behavior of operator~
   so tightly that it couldn't be easily used.)  Again, though, I regard
   this as a minor issue: no matter what the standard says, programmers
   had better be very careful about relying too much on the precise
   behavior of 0.~0.

In Fortran 77, when the standard says something is an error, it means the
behavior is completely undefined.  It does not mean any error message has
to be produced or anything; just that whatever happens, happens.  So if
the same can apply to C++, we're in luck.

PLEASE, if I'm wrong about any of this Fortran stuff, somebody let me know
(hence the crosspost to comp.lang.fortran)!!  I'm writing a compiler, y'see.
(Well, actually it's already written, but I'd rather correct bugs now than
later.)
--

James Craig Burley, Software Craftsperson    burley@gnu.ai.mit.edu
Member of the League for Programming Freedom (LPF)

Author: adk@sun13.SCRI.FSU.EDU (Tony Kennedy)
Date: 9 Jul 92 16:31:15 GMT Raw View

>> Craig Burley <burley@geech.gnu.ai.mit.edu> writes:

   burley> -4**2 yields -16, not 16 2**-4 is invalid, though many
   burley> compilers allow it anyway

   burley> -A**B is evaluated as -(A**B), not (-A)**B as would seem
   burley> obvious

A**-B*C is invalid, but many compilers allow it anyway. Unfortunately
they interpret it to mean A**(-(B*C)) because unary minus has the same
priority as binary minus. Quite counter-intuitive.

Author: matt@physics.berkeley.edu (Matt Austern)
Date: 9 Jul 92 11:08:24 Raw View

In article <BURLEY.92Jul8233709@geech.gnu.ai.mit.edu> burley@geech.gnu.ai.mit.edu (Craig Burley) writes:

>    Here, then, is how ** works in fortran:
>
>    (1)  PRECEDENCE.  ** binds more tightly than any other arithmetic
>  operator.  (This includes unary minus; however, in fortran, the
>  minus sign in "-4" isn't a unary minus, but part of the number.)
>
> A minor nit, but in Fortran, unary plus/minus _is_ an operator with the same
> precedence as binary plus/minus, and in fact no two operators may appear in
> succession (this is Fortran 77; maybe they relaxed that rule in Fortran 90?).

Oops, you're right.  Shows what you get for trying to summarize the
rules of a language you don't use very often.  (My excuse: I was
misled by the discussion of the syntax rules for a real constant.)

Well, that makes life simpler, then, if the fortran and C++ unary
minus operators behave so similarly: we can duplicate the fortran
precedence rules for ** by just making operator~ bind more tightly
than unary minus.
--
Matthew Austern              I dreamt I was being followed by a roving band of
(510) 644-2618               of young Republicans, all wearing the same suit,
matt@physics.berkeley.edu    taunting me and shouting, "Politically correct
austern@theorm.lbl.gov       multiculturist scum!"... They were going to make
austern@lbl.bitnet      me kiss Jesse Helms's picture when I woke up.

Author: cflatter@nrao.edu (Chris Flatters,208,7209,homephone)
Date: 9 Jul 92 19:50:28 GMT Raw View

In article 92Jul8233709@geech.gnu.ai.mit.edu, burley@geech.gnu.ai.mit.edu (Craig Burley) writes:
>Again, I believe that there is no problem here, that Fortran 77 treats the
>minus as a unary minus operator as well even for an integer constant.  In
>fact I think it'd be real annoying to have -4**2 be treated differently from
>-FOUR**2, where FOUR is an INTEGER named constant with the value 4.  If
>your compiler does this, I think it's either broken or not in conformance with
>ANSI FORTRAN 77.  (As a point of reference, not only would my compiler evalute
>-4**2 as -16, but so does AT&T's free f2c program.)
>
> ...
>
>PLEASE, if I'm wrong about any of this Fortran stuff, somebody let me know
>(hence the crosspost to comp.lang.fortran)!!  I'm writing a compiler, y'see.
>(Well, actually it's already written, but I'd rather correct bugs now than
>later.)

You are correct (and so is the behaviour of your parser and F2C).  The syntax
rules in the Fortran 90 standard make this explicit.  The parsing is unambiguous.
Missing a few trivial expansions the parse tree looks like this.


                           -4**2
                           (expr)
                              |
                              |
                           -4**2
                        (level-2-expr)
                              |
                  +-----------+-------------+
                  |                         |
                  -                        4**2
              (add-op)                 (add-operand)
                                            |
                                            |
                                           4**2
                                       (mult-operand)
                                            |
                            +---------------+-------------------+
                            |               |                   |
                            4               **                  2
                       (level-1-expr)     (pow-op)        (mult-operand)
              |                                   |
                            |                                   |
                            4                                   2
                 (int-literal-constant)               (int-literal-constant)


Which should, of course, evaluate to -16.

>that "rule" permitting a minus sign to precede a constant applies, I
>believe, only to cases where the context requires a constant but
>disallows a more general form of an expression.

signed-real-literal-constant only appears as an expension of 3 non-terminals
in the Fortran 90 grammar.

 real-part (of a complex constant)       (R418)
        imag-part (of a complex constant)       (R419)
        data-stmt-constant                      (R533)

signed-int-literal-constant appears as an expansion in the same rules and
just one other: the constant preceding the P edit descriptor (R1011).

 Chris Flatters
 cflatter@nrao.edu

Author: HDK@psuvm.psu.edu (H. D. Knoble)
Date: 10 Jul 92 15:01:22 GMT Raw View

Re exponentiation legalities and anomalies:

How many compilers out there flag (at compile-time with at least a warning)
integer exponentiation of the form I**(-n), where abs(I) is (almost) any
integer (not equal to 0 or 1) and n is a positive integer constant?  E.g.,
why doesn't a subexpression like I**(-2) at least generate a warning since
its value is most likely 0?  Doesn't it seem more prudent to have the compiler
give one warning than to attempt to manage the likelihood of this being part
of a "wrong answer" at run-time (potentially in a loop)?