Thread

Topic: Using extended precision in floating point

Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Fri, 19 May 2006 11:48:29 CST Raw View

kanze wrote:
> As a result of a recent discussion in the French language
> newsgroup, I am now unsure as to when the compiler can use
> extra precision in floating point operations.
>
>    5/10 is clear with regards to expressions: "The values of the
> floating operands and the results of floading expressions may be
> represented in greater precision and range than required by the
> type; the types are not changed thereby."  It seems clear enough
> here that in an expression like "(a + b) * c", the compiler can
> do the entire expression in long double, even if all of the
> variables are float -- and even if "a + b", processed as a
> float, was equal to a.  However, there is a footnote
> (non-normative, I know) to the effect that "The cast and
> assignment operators must still perform their specific
> conversions as described in 5.4, 5.2.9 and 5.17."  And    5.7/1
> says quite clearly that "The result of the assignment operation
> is the value stored in the left operant after the assignment has
> taken place;"

I think we can conclude that the gcc's floating point optimization is
both legal and one unambiguously allowed by the Standard. And to do so,
we simply have to assume for a moment the other possibility  - that the
optimization is somehow prohibited by the Standard. But if the
optimization were not in fact allowed, then the C++ Standard would be
placed in the slightly ridiculous position of allowing floating point
computations to be conducted with a greater degree of accuracy than the
size of the types inherently support - while simultaneously ensuring
that the results obtained are not "too accurate."

Now how likely is it that the Standard expressly seeks to prohibit
computations in C++ that are "too accurate"? Assuming that we believe
that the C++ Standards committee is composed of (predominately)
rational individuals - who meet and partake in (largely) rational
discussions - and who reach decisions on a (more-often-than-not)
rational basis - then it is simply inconceivable that such a group
would ever decide that a particular computation in C++ was "too
accurate" for a program's own good and had to be made less so.

Otherwise we should well expect that the Committee will also decide
that C++ is "too useful" or "solves too many real-world problems" or
the that the language is currently "too easy to learn" or - worse yet -
that "the C++ computer language is superior to every other computer
language by an intolerably wide margin". So until the Committee takes
on those more pressing issues, I think it is safe to say that we will
never have a chance to read the minutes of the meeting at which the C++
Committee deemed a set of computations as simply "too accurate" for the
language to countenance. It will never happen.

> I would have normally interpreted this to mean that if I write
> something like "(a += b) * c", the value multiplied by c must be
> the actual value of the float, and not the extended precision
> value which the compiler was allowed to use when there was no
> assignment.  Even more so, I would have interpreted this to mean
> that in a bit of code like:
>
>     float total = 0.0 ;
>     for ( size_t i = 0 ; i < size ; ++ i ) {
>         total += array[ i ] ;
>     }
>
> where array is float*, the compiler must ensure that the results
> are exactly the same as those that it would have obtained by
> storing the value to total each time in the loop, and rereading
> the stored value in the expression.

Moreover it is not necessary to decide that gcc's optimization is legal
solely on our faith in the collective lucidity of the C++ Standards
committee (and in fact some individuals may be understandably hesitant
to place such a wager in the first place). Fortunately, the text of the
Standard itself provides all of the information needed to reach the
same conclusion.

The first observation to make is that the variable "total" in the above
loop is a floating point operand - and remains a floating operand for
each iteration of the loop. Therefore its value representation may
exceed the precision inherently supported by a float. The second
observation to make is that - despite its extended precision - total
remains a float type ("the types are not changed thereby"). And since
the righthand side of the expression is also a float type, the entire
operation is a strict float-to-float compound assignment, so no
conversion ever takes place. In other words, the footnote cited has no
relevance in this situation.

And viewing this loop in practical terms, there is no reason at all why
the compiler would have to discard total's extended precision upon each
iteration of the loop. The extended precision would have to be
discarded only if total were not an operand - for example, if total
were passed as a parameter to a function call. So only in those cases
in which the size of the float's represented value becomes a factor -
such as for I/O operations or for storage - would the compiler be
forced to discard any extended precision. To do so at any other time
would simply impair computational accuracy - and while an
implementation is free to follow such a course - the Standard for its
part could no motivation (other than to prevent overly accurate
computations) for requiring such behavior.

In fact, were the compiler to discard total's extended precision upon
each iteration of the loop, then its extended precision would be
rendered useless - the final tally would be unchanged from the
precision supported by a float inherently. In other words, instead of
limiting the degree of additional accuracy for a floating point
calculation, the Standard would be nullifying it altogether in this
case. And no matter how dim one's view of the C++ Standard or the
degree to which one may question the motives of its committee's members
- I don't think any (rational) person would ever sincerely believe that
the Standard really sets out to permit extended precision floating
calculations only to the extent that the values of any results computed
- are not to be affected.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wade@stoner.com
Date: Fri, 19 May 2006 12:58:11 CST Raw View

kanze wrote:

> From what I understand, adding the [g++] option -ffloat-store should
> cause conformant behavior.

> ... I think that the intent [of the standard] is that even when extended precision is
> used in sub-expressions, the programmer can force normal
> precision by means of a cast or an assignment.  The standard can
> certainly be interpreted this way, but I'm not sure that other
> interpretations are not possible.

I think that the documented behavior of -ffloat-store matches this
"intent."  However, the g++ emails suggest that on x86, -ffloat-store
applies to named variables, but not to temporaries.  This would suggest
that a cast may not force normal precision (unless the result of the
cast is directly used in an assignment).

This is mostly a quibble.  I'd expect most programmers would use
assignment-to-a-named-variable for this operation.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kanze.james@neuf.fr (James Kanze)
Date: Sun, 21 May 2006 07:33:50 GMT Raw View

wade@stoner.com wrote:
 > kanze wrote:

 >> From what I understand, adding the [g++] option -ffloat-store
 >> should cause conformant behavior.

 >> ... I think that the intent [of the standard] is that even
 >> when extended precision is used in sub-expressions, the
 >> programmer can force normal precision by means of a cast or
 >> an assignment.  The standard can certainly be interpreted
 >> this way, but I'm not sure that other interpretations are not
 >> possible.

 > I think that the documented behavior of -ffloat-store matches
 > this "intent."  However, the g++ emails suggest that on x86,
 > -ffloat-store applies to named variables, but not to
 > temporaries.  This would suggest that a cast may not force
 > normal precision (unless the result of the cast is directly
 > used in an assignment).

The intent is for it only to apply to temporaries that result
from a cast.  I suspect that this is not necessarily trivial in
the compiler -- if an expression is of type double, and I cast
it to double, it's quite possible that there is no way of
expressing this in the intermediate language that is passed to
the back end.

 > This is mostly a quibble.  I'd expect most programmers would use
 > assignment-to-a-named-variable for this operation.

Well, I would:-).

--=20
James Kanze                                    kanze.james@neuf.fr
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: Jean-Marc Bourguet <jm@bourguet.org>
Date: Sun, 21 May 2006 09:16:24 CST Raw View

"kanze" <kanze@gabi-soft.fr> writes:

> Jean-Marc Bourguet wrote:
> > "kanze" <kanze@gabi-soft.fr> writes:
>
> > > I'd always been pretty sure of this, but on analysing some
> > > code which wasn't giving the expected results (they were too
> > > accurate for the algorithm being used !), I found that g++
> > > (on a PC) used extended precision through out the loop, at
> > > least when optimization was requested.  Is this an error in
> > > g++, or have I misunderstood something?
[...]
> > I think this is a bug is gcc.  But see:
>
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
[...]
> From what I understand, adding the option -ffloat-store
> should cause conformant behavior.

I don't remembre from where I got the impression that it
has that effect most of the time but not allways.

Yours,

--
Jean-Marc

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: kanze.james@neuf.fr (James Kanze)
Date: Sun, 21 May 2006 14:16:56 GMT Raw View

Greg Herlihy wrote:
 > kanze wrote:
 >> As a result of a recent discussion in the French language
 >> newsgroup, I am now unsure as to when the compiler can use
 >> extra precision in floating point operations.

 >> =A75/10 is clear with regards to expressions: "The values of
 >> the floating operands and the results of floading expressions
 >> may be represented in greater precision and range than
 >> required by the type; the types are not changed thereby."  It
 >> seems clear enough here that in an expression like "(a + b) *
 >> c", the compiler can do the entire expression in long double,
 >> even if all of the variables are float -- and even if "a +
 >> b", processed as a float, was equal to a.  However, there is
 >> a footnote (non-normative, I know) to the effect that "The
 >> cast and assignment operators must still perform their
 >> specific conversions as described in 5.4, 5.2.9 and 5.17."
 >> And =A75.7/1 says quite clearly that "The result of the
 >> assignment operation is the value stored in the left operant
 >> after the assignment has taken place;"

 > I think we can conclude that the gcc's floating point
 > optimization is both legal and one unambiguously allowed by
 > the Standard.

I think that it is possible to interpret the standard this way.
I'm pretty sure that this isn't the intent, or the most
reasonable interpretation.

 > And to do so, we simply have to assume for a
 > moment the other possibility  - that the optimization is
 > somehow prohibited by the Standard.

We don't have to assume anything.  We just have to read the
standard.

 > But if the optimization
 > were not in fact allowed, then the C++ Standard would be
 > placed in the slightly ridiculous position of allowing
 > floating point computations to be conducted with a greater
 > degree of accuracy than the size of the types inherently
 > support - while simultaneously ensuring that the results
 > obtained are not "too accurate."

If you want to do rigorous numeric calculations, you do need to
enforce the precision at some points, to exactly that which you
want (or at least to something defined).  Otherwise, the results
are not reproducible.

 > Now how likely is it that the Standard expressly seeks to prohibit
 > computations in C++ that are "too accurate"? Assuming that we believe
 > that the C++ Standards committee is composed of (predominately)
 > rational individuals - who meet and partake in (largely) rational
 > discussions - and who reach decisions on a (more-often-than-not)
 > rational basis - then it is simply inconceivable that such a group
 > would ever decide that a particular computation in C++ was "too
 > accurate" for a program's own good and had to be made less so.

I'd say the situation is that the standard is trying to satisfy
both camps.  If you want the exact precision, you can assign or
cast to the correct type, and if you can accept greater
precision, you get that by default.  Most of the time, I suspect
that the greater precision is an advantage, but if you are
trying to prouve the mathematical stability of a convergence,
I'm not sure.  The point is, of course, if you're doing that
sort of thing, you'd better know numerics enough to know when
you need the deterministic results, and insert the assignments
or the casts to enforce them.

In the case of g++, as we see, it is impossible to get
deterministic results.  (Actually it's not -- there's an option
to do so that I hadn't seen.)  Without a possibililty to get
deterministic results, I don't think that you could use the
compiler (or the language, if this were what the standard
actually says) for any serious numerical work.

(There was a lot of discussion about this when C was being
standardized.  In particular, in K&R C, the compiler was allowed
to rearrange expressions according to mathematical identities.
This was banned by the C standard, because it made the language
unusable for any serious floating point work.)

 > Otherwise we should well expect that the Committee will also decide
 > that C++ is "too useful" or "solves too many real-world problems" or
 > the that the language is currently "too easy to learn" or - worse yet =
-
 > that "the C++ computer language is superior to every other computer
 > language by an intolerably wide margin". So until the Committee takes
 > on those more pressing issues, I think it is safe to say that we will
 > never have a chance to read the minutes of the meeting at which the C+=
+
 > Committee deemed a set of computations as simply "too accurate" for th=
e
 > language to countenance. It will never happen.

It would help if 1) you read up on numeric processing, and 2)
you actually read the standard.  It's not a question of "too
accurate", but of "deterministic results".  It's very hard to
reason about anything if the results of a given expression are
not always the same.

 >> I would have normally interpreted this to mean that if I write
 >> something like "(a +=3D b) * c", the value multiplied by c must be
 >> the actual value of the float, and not the extended precision
 >> value which the compiler was allowed to use when there was no
 >> assignment.  Even more so, I would have interpreted this to mean
 >> that in a bit of code like:

 >>     float total =3D 0.0 ;
 >>     for ( size_t i =3D 0 ; i < size ; ++ i ) {
 >>         total +=3D array[ i ] ;
 >>     }

 >> where array is float*, the compiler must ensure that the results
 >> are exactly the same as those that it would have obtained by
 >> storing the value to total each time in the loop, and rereading
 >> the stored value in the expression.

 > Moreover it is not necessary to decide that gcc's optimization
 > is legal solely on our faith in the collective lucidity of the
 > C++ Standards committee (and in fact some individuals may be
 > understandably hesitant to place such a wager in the first
 > place). Fortunately, the text of the Standard itself provides
 > all of the information needed to reach the same conclusion.

My contention is that it doesn't.  At least not well enough --
the intent, and the most reasonable reading, is that this
optimization is illegal.  And while it doesn't cause problems
here, without some way of suppressing it, it becomes seriously
difficult to do certain types of numeric processing.

 > The first observation to make is that the variable "total" in
 > the above loop is a floating point operand - and remains a
 > floating operand for each iteration of the loop. Therefore its
 > value representation may exceed the precision inherently
 > supported by a float. The second observation to make is that -
 > despite its extended precision - total remains a float type
 > ("the types are not changed thereby"). And since the righthand
 > side of the expression is also a float type, the entire
 > operation is a strict float-to-float compound assignment, so
 > no conversion ever takes place. In other words, the footnote
 > cited has no relevance in this situation.

As it happens, total is also a variable which is being assigned
to.  The footnote to =A75/10 (not normative, but very indicative
of intent) states clearly that "the cast and assignment
operators must still perform their specific conversions as
described in 5.4, 5.2.9 and 5.17."  =A75.17 says that 1) "The
result of the assignment operation is the value stored in the
left operand after the assignment has taken place", and 2) "If
the left operand is not of class type, the expression is
implicitly converted (clause 4) to the type of the left
operand."  And in =A74.8, we find that "If the source value is
between two adjacent destination values, the result of the
conversion is an implementation-defined choice of either of
those values."  No liberty of extended precision anywhere there.

 > And viewing this loop in practical terms, there is no reason
 > at all why the compiler would have to discard total's extended
 > precision upon each iteration of the loop.

Except that I, the programmer, told it to.  Using the means the
standard provided for me to do so.

 > The extended precision would have to be
 > discarded only if total were not an operand - for example, if total
 > were passed as a parameter to a function call.

That's in direct contractition with =A75.17.

 > So only in those cases
 > in which the size of the float's represented value becomes a factor -
 > such as for I/O operations or for storage - would the compiler be
 > forced to discard any extended precision. To do so at any other time
 > would simply impair computational accuracy - and while an
 > implementation is free to follow such a course - the Standard for its
 > part could no motivation (other than to prevent overly accurate
 > computations) for requiring such behavior.

 > In fact, were the compiler to discard total's extended
 > precision upon each iteration of the loop, then its extended
 > precision would be rendered useless - the final tally would be
 > unchanged from the precision supported by a float inherently.

In this case.  Because that's what I asked for.  If I had wanted
the extended precision, presumably, I would have asked for long
double.  Extended precision is legal within an expression,
unless I explicitly tell the compiler not to use it, by means of
a cast.  (C99 goes much further in this regard, and provides a
pragma to forbit the extended precision.  Due to demand from the
numerics community.)

 > In other words, instead of
 > limiting the degree of additional accuracy for a floating point
 > calculation, the Standard would be nullifying it altogether in this
 > case. And no matter how dim one's view of the C++ Standard or the
 > degree to which one may question the motives of its committee's member=
s
 > - I don't think any (rational) person would ever sincerely believe tha=
t
 > the Standard really sets out to permit extended precision floating
 > calculations only to the extent that the values of any results compute=
d
 > - are not to be affected.

It obviously affects the results.  Which is why many numeric
experts condenm it, and why C99 felt it necessary to provide a
global means of suppressing it.  The problem here is that I've
actually used the means provided by the C++ standard to suppress
it, and the compiler didn't.

And that, regretfully, part of the text which makes it 100%
clear that it should be suppressed is in a footnote, and not
normative.

--=20
James Kanze                                    kanze.james@neuf.fr
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: wade@stoner.com
Date: Wed, 17 May 2006 15:45:44 CST Raw View

kanze wrote:
> As a result of a recent discussion in the French language
> newsgroup, I am now unsure as to when the compiler can use
> extra precision in floating point operations.

I think it is a QOI issue. The standard doesn't say. Or perhaps, 5/5
is saying that for the typical floating point representation (where
one-third is not representable), 1.0/3.0 is UB.

In "The Standard C Library", PJP says "... the C Standard is mostly
descriptive in the area of floating-point arithmetic. It endeavors to
define enough terms to talk about the parameters of floating point.
But it says little that is prescriptive about getting the right
answer." I don't believe C++ does significantly better.

Regardless of what the standard says, the current best you'll find is a
vendor that tells you the facts abot what his compiler does.

I'm not an expert in this stuff. I read the documentation, and then
try to find a set of compiler options that make paranoia.c (from
netlib) happy when other optimizations are turned on.

MS documents some floating point modes called, "fast", "precise", and
"strict." My reading is that

- fast: Compiler is free to assume that fp-math obeys the rules of
real-math. Extra precision may be carried over between statements (or
statements may be rewritten, if real math would mean the old statements
and new statements give the same result).

- precise: Compiler may perform optimizations that give the correct
result assuming that you leave the fp processor in its default state
(default rounding mode, division by zero produces a non-signaling NAN,
etc.). Extended precision is "tossed" at assignments, casts, and
function calls. The compiler may use low-level contraction (a*b+c)
instructions, even when that instruction can give a "better" answer
than multiply followed by add.

-strict: No contractions. Extended precision is "tossed" at
assignments, casts, and function calls. Compiler does not assume that
you are leaving the floating point mode alone, so
float x = 1.0 / 3.0;
Bar();
float y = 1.0 / 3.0;
the compiler does not assume x==y (since Bar() might have changed the
rounding mode), and 1.0 / 3.0 is not a compile-time constant, for the
same reason.

As far as I can tell from the MS documentation, paranoia.c should be
happy with "precise." However, with VC8, paranoia.c (double precision)
isn't happy until "strict." I don't know if this is a VC bug (can't
find it in their knowledge base, and when I try a bug report, MS
passport stuff gets in the way) or if I am misreading the
documentation.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/floapoint.asp

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]

Author: jm@bourguet.org (Jean-Marc Bourguet)
Date: Thu, 18 May 2006 15:19:30 GMT Raw View

"kanze" <kanze@gabi-soft.fr> writes:

> I'd always been pretty sure of this, but on analysing some code
> which wasn't giving the expected results (they were too accurate
> for the algorithm being used !), I found that g++ (on a PC) used
> extended precision through out the loop, at least when
> optimization was requested.  Is this an error in g++, or have I
> misunderstood something?
>
> (For those who are familiar with Intel assembler, the body of
> the loop is simply:
>
> .L9:
>  fadd DWORD PTR [%ecx+%eax*4]
>  inc %eax
>  cmp %eax, %edx
>  jb .L9
>
> .)

I think this is a bug is gcc.  But see:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

There are something like 70 duplicates.

One comment is in the bug log is:

Steven> ...to end this pointless discussion.
Steven>
Steven> Some people call this a bug in the x87 series.  Other call it a bug in
Steven> gcc.  See these mails at least for the reason why this could be considered
Steven> a bug in gcc:
Steven> http://gcc.gnu.org/ml/gcc/2003-08/msg01195.html
Steven> http://gcc.gnu.org/ml/gcc/2003-08/msg01234.html
Steven> http://gcc.gnu.org/ml/gcc/2003-08/msg01257.html
Steven>
Steven> Regardless of where one wishes to put the blame, this problem will _not_ be
Steven> fixed.  Period.
Steven>

Yours,

--
Jean-Marc

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Fri, 19 May 2006 06:40:47 CST Raw View

Jean-Marc Bourguet wrote:
> "kanze" <kanze@gabi-soft.fr> writes:

> > I'd always been pretty sure of this, but on analysing some
> > code which wasn't giving the expected results (they were too
> > accurate for the algorithm being used !), I found that g++
> > (on a PC) used extended precision through out the loop, at
> > least when optimization was requested.  Is this an error in
> > g++, or have I misunderstood something?

> > (For those who are familiar with Intel assembler, the body of
> > the loop is simply:

> > .L9:
> >  fadd DWORD PTR [%ecx+%eax*4]
> >  inc %eax
> >  cmp %eax, %edx
> >  jb .L9
> >
> > .)

> I think this is a bug is gcc.  But see:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

> There are something like 70 duplicates.

> One comment is in the bug log is:

> Steven> ...to end this pointless discussion.
> Steven>
> Steven> Some people call this a bug in the x87 series.  Other call it a bug in
> Steven> gcc.  See these mails at least for the reason why this could be considered
> Steven> a bug in gcc:
> Steven> http://gcc.gnu.org/ml/gcc/2003-08/msg01195.html
> Steven> http://gcc.gnu.org/ml/gcc/2003-08/msg01234.html
> Steven> http://gcc.gnu.org/ml/gcc/2003-08/msg01257.html
> Steven>
> Steven> Regardless of where one wishes to put the blame, this problem will _not_ be
> Steven> fixed.  Period.

>From what I understand, adding the option -ffloat-store should
cause conformant behavior.  Maybe they should mention this in
the documentation for options like -std=c++98.  (IMHO, the
current behavior of g++ is probably preferable in most cases.
My question only concerned standards conformance.)

I also think that the standard isn't as clear as it could be.  I
think that the intent is that even when extended precision is
used in sub-expressions, the programmer can force normal
precision by means of a cast or an assignment.  The standard can
certainly be interpreted this way, but I'm not sure that other
interpretations are not possible.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 17 May 2006 11:36:30 CST Raw View

As a result of a recent discussion in the French language
newsgroup, I am now unsure as to when the compiler can use
extra precision in floating point operations.

   5/10 is clear with regards to expressions: "The values of the
floating operands and the results of floading expressions may be
represented in greater precision and range than required by the
type; the types are not changed thereby."  It seems clear enough
here that in an expression like "(a + b) * c", the compiler can
do the entire expression in long double, even if all of the
variables are float -- and even if "a + b", processed as a
float, was equal to a.  However, there is a footnote
(non-normative, I know) to the effect that "The cast and
assignment operators must still perform their specific
conversions as described in 5.4, 5.2.9 and 5.17."  And    5.7/1
says quite clearly that "The result of the assignment operation
is the value stored in the left operant after the assignment has
taken place;"

I would have normally interpreted this to mean that if I write
something like "(a += b) * c", the value multiplied by c must be
the actual value of the float, and not the extended precision
value which the compiler was allowed to use when there was no
assignment.  Even more so, I would have interpreted this to mean
that in a bit of code like:

    float total = 0.0 ;
    for ( size_t i = 0 ; i < size ; ++ i ) {
        total += array[ i ] ;
    }

where array is float*, the compiler must ensure that the results
are exactly the same as those that it would have obtained by
storing the value to total each time in the loop, and rereading
the stored value in the expression.

I'd always been pretty sure of this, but on analysing some code
which wasn't giving the expected results (they were too accurate
for the algorithm being used !), I found that g++ (on a PC) used
extended precision through out the loop, at least when
optimization was requested.  Is this an error in g++, or have I
misunderstood something?

(For those who are familiar with Intel assembler, the body of
the loop is simply:

.L9:
 fadd DWORD PTR [%ecx+%eax*4]
 inc %eax
 cmp %eax, %edx
 jb .L9

.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]