Thread

Topic: Error in floating point calculations

Author: Valentin Bonnard <Bonnard.V@wanadoo.fr>
Date: 1999/04/29 Raw View

Gabriel Dos_Reis wrote:

> is computing:
>
>         (0.0 + a * b) - a * b;  // #1
>
> whereas the author is expecting the result to be the same as
>
>         0.0 + (a * b - a * b);  // #2

> This is one manifestation of lack of associativity of
> floating point addition.

It also means that 0. isn't a neutral element for +, which to
me means that the compiler is broken.

--

Valentin Bonnard

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/29 Raw View

In article <3727BB45.1381@wanadoo.fr>,
  Valentin Bonnard <Bonnard.V@wanadoo.fr> wrote:
>
> Gabriel Dos_Reis wrote:
>
> > is computing:
> >
> >         (0.0 + a * b) - a * b;  // #1
> >
> > whereas the author is expecting the result to be the same as
> >
> >         0.0 + (a * b - a * b);  // #2
>
> > This is one manifestation of lack of associativity of
> > floating point addition.
>
> It also means that 0. isn't a neutral element for +, which to
> me means that the compiler is broken.

This would be true *if* Gabriel's rewrite were an exact equivalent to
the original code.  However, I'm willing to bet that the expression:

    (0.0 + a * b) - a * b == 0.0 + (a * b - a * b)

*will* evaluate to true.

The problem is complex, and even less intuitive that standard machine
floating point.  The "error" seen is due to the assignment operator
changing precision -- I don't have my copy of the C standard here to
find the exact place, but I'm pretty sure that this is allowed.  And it
does make high quality number crunching more difficult.  (With regards
to number crunching: if it isn't high quality, it's wrong.)

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orientie objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh|ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/30 Raw View

James.Kanze@dresdner-bank.com writes:

[...]

| This would be true *if* Gabriel's rewrite were an exact equivalent to
| the original code.  However, I'm willing to bet that the expression:
|
|     (0.0 + a * b) - a * b == 0.0 + (a * b - a * b)
|
| *will* evaluate to true.

Well, in my previous post I explicitly made the assumption:

> Now, I'm sure you see how lack of associativity
> affects the result if you assume further that expressions in "( )" are
> first stored in memory, then retrived to carry on the computation.
 ^^^^^^^^^^^^^^^^^^^^^^^^

[...]

| ... And it
| does make high quality number crunching more difficult.

accurate number crunching is always difficult.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/28 Raw View

In article <37250906.3C88@pop.hip.cam.org>,
  hickin@Hydro.CAM.ORG wrote:
>
> James.Kanze@dresdner-bank.com wrote:
> >
>
> > > You will get a 0 out. This shows that floating point doesn't satisfy the
> > > distributive law. In fact, it isn't commutative.
> >
> > What should you get out, if not 0?  Machine floating point isn't
> > distributive, nor associative, but your example doesn't show this.
> > Machine floating point should normally be commutative, and it should be
>
> Given that the original result seemed to be a surprise, I was suggesting
> that the original poster might have tried my example to further probe
> the situation. On the theory that the floating hardware was broken my
> example certainly might have produced a non-zero result.
>
> I fail to see how floating point can be commutative if a finite sum has
> two distinct values summed in different orders.

As soon as you sum more than two values, associativity also comes into
play.  And while generally commutative (a+b == b+a), computer arithmetic
definitely isn't associative ((a+b)+c != a+(b+c)).  Your example fails
because of this.

Anyway, I think we agree on the principle: machine floating points are
not real numbers, the results may sometimes be surprising for naive
users, and you have to know what you are doing to ensure correct
results.  My only point was that the example in the initial posting of
this thread is *not* due to a problem implicit in all machine floating
point, but rather to a particularity of the Intel (and possibly others)
floating point implementation *and* the way the compiler generated
code.  And that even with Intel processors, it is avoidable, at a
certain cost.  (It has been almost 20 years since I last did floating
point on an Intel, but I seem to remember that the 8087 had a control
flag to ensure correct rounding of intermediate results.  Precisely
because of this specific problem.)

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orientie objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh|ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/28 Raw View

Valentin Bonnard <Bonnard.V@wanadoo.fr> writes:

| John D. Hickin wrote:
|
| > I fail to see how floating point can be commutative if a finite sum has
| > two distinct values summed in different orders. Try the following
| > example:
|
| > and you may see that for some machines the two calculations are not
| > idenitical. From this I conclude that machine floating point cannot be
| > commutative on all floating point processors in use today.
|
| F.p. addition is certainly commutative (at least on a 'normal'
| processor), but it isn't associative.
|
| Your example shows that addition isn't commutative and associative,
| and that doesn't answer your questions (it is commutative ? it is
| associative ?)

His example only shows that floating point addition isn't
associative. It's certainly commutative.
At least in the scope of IEC 559.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Barry Margolin <barmar@bbnplanet.com>
Date: 1999/04/28 Raw View

In article <xaj90bey75s.fsf@korrigan.inria.fr>,
Gabriel Dos_Reis  <gdosreis@korrigan.inria.fr> wrote:
>Commutavity involves _only_ *two* operands:
>
> a op b == b op a.
>
>Associativity involves *more* than two operands:
>
> (a op b) op c == a op (b op c).
>
>It's this latter property that is lacking in floating ppint
>arithmctic, *not* commutativity, and that affects long summation.

I'm not sure what associativity has to do with the case at hand.  The
example program seems to be computing:

 (a * b) - (a * b)

and surprising the author because the result isn't 0.0.  How would lack of
associativity explain this?

Sorry, I haven't looked at at Higham's book.

--
Barry Margolin, barmar@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "John D. Hickin" <hickin@hydro.cam.org>
Date: 1999/04/28 Raw View

Valentin Bonnard wrote:
>
>
> Your example shows that addition isn't commutative and associative,
> and that doesn't answer your questions (it is commutative ? it is
> associative ?)
>

Gabriel Dos_Reis wrote:

> No, it doesn't follow.
> What your program is supposed to examplify is a well known problem
> in floating point arithmetic. I strongly suggest you take  a serious
> look at chapters 2 and 4 of


Gentlemen,

I am dead wrong. I've been thick headed about this one, havn't I?
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/29 Raw View

Barry Margolin <barmar@bbnplanet.com> writes:

| In article <xaj90bey75s.fsf@korrigan.inria.fr>,
| Gabriel Dos_Reis  <gdosreis@korrigan.inria.fr> wrote:
| >Commutavity involves _only_ *two* operands:
| >
| > a op b == b op a.
| >
| >Associativity involves *more* than two operands:
| >
| > (a op b) op c == a op (b op c).
| >
| >It's this latter property that is lacking in floating ppint
| >arithmctic, *not* commutativity, and that affects long summation.
|
| I'm not sure what associativity has to do with the case at hand.  The
| example program seems to be computing:
|
|  (a * b) - (a * b)

No. The original program

  float a,b,c;
  a = 0.1;
  b = 0.1;
  c = 0.0;

  c += a * b;
  c -=  a * b;

is computing:

 (0.0 + a * b) - a * b;  // #1

whereas the author is expecting the result to be the same as

 0.0 + (a * b - a * b);  // #2

| and surprising the author because the result isn't 0.0.  How would lack of
| associativity explain this?

Let's assume, with James Kanze, that on the original poster's machine,
FP registers are 80 bits wide and a double is stored in main memory in
a 64 bit format. Now, I'm sure you see how lack of associativity
affects the result if you assume further that expressions in "( )" are
first stored in memory, then retrived to carry on the
computation. This is one manifestation of lack of associativity of
floating point addition.
Another manifestation was demonstrated by John Hickin's post.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "John D. Hickin" <hickin@Hydro.CAM.ORG>
Date: 1999/04/27 Raw View

James.Kanze@dresdner-bank.com wrote:
>

> > You will get a 0 out. This shows that floating point doesn't satisfy the
> > distributive law. In fact, it isn't commutative.
>
> What should you get out, if not 0?  Machine floating point isn't
> distributive, nor associative, but your example doesn't show this.
> Machine floating point should normally be commutative, and it should be

Given that the original result seemed to be a surprise, I was suggesting
that the original poster might have tried my example to further probe
the situation. On the theory that the floating hardware was broken my
example certainly might have produced a non-zero result.

I fail to see how floating point can be commutative if a finite sum has
two distinct values summed in different orders. Try the following
example:

double func( double x ) { return x*x; }

double fd( int N )
{
        double sum = 0.0;

        for ( int n=1; n<=N; ++n )
        {
                double factor = 1.0/double(n);
                sum = sum + func(factor);
        }

        return sum;
}

double fa( int N )
{
        double sum = 0.0;

        for ( int n=N; n>=1; --n )
        {
                double factor = 1.0/double(n);
                sum = sum + func(factor);
        }

        return sum;
}

#include <iostream>
#include <iomanip>

main()
{
        std::cout << std::setprecision(16)
                          << "1 2345678901234567890\n"
                             "- -------------------\n"
                      << fa(1000000) << '\n'
                      << fd(1000000) << '\n'
                          ;
}

and you may see that for some machines the two calculations are not
idenitical. From this I conclude that machine floating point cannot be
commutative on all floating point processors in use today.

Another poster noted that the proper setup for rounding [on Win32
systems] might be had be inserting a statement _controlfp(_PC_24,
_MCW_PC). This amplifies the disparity between the two calculations
above.

Regards, John.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Valentin Bonnard <Bonnard.V@wanadoo.fr>
Date: 1999/04/27 Raw View

John D. Hickin wrote:

> I fail to see how floating point can be commutative if a finite sum has
> two distinct values summed in different orders. Try the following
> example:

> and you may see that for some machines the two calculations are not
> idenitical. From this I conclude that machine floating point cannot be
> commutative on all floating point processors in use today.

F.p. addition is certainly commutative (at least on a 'normal'
processor), but it isn't associative.

Your example shows that addition isn't commutative and associative,
and that doesn't answer your questions (it is commutative ? it is
associative ?)

--

Valentin Bonnard

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/27 Raw View

"John D. Hickin" <hickin@Hydro.CAM.ORG> writes:

| James.Kanze@dresdner-bank.com wrote:
| >
|
| > > You will get a 0 out. This shows that floating point doesn't satisfy the
| > > distributive law. In fact, it isn't commutative.
| >
| > What should you get out, if not 0?  Machine floating point isn't
| > distributive, nor associative, but your example doesn't show this.
| > Machine floating point should normally be commutative, and it should be
|
| Given that the original result seemed to be a surprise, I was suggesting
| that the original poster might have tried my example to further probe
| the situation. On the theory that the floating hardware was broken my
| example certainly might have produced a non-zero result.
|
| I fail to see how floating point can be commutative if a finite sum has
| two distinct values summed in different orders. Try the following
| example:

[...]

| and you may see that for some machines the two calculations are not
| idenitical. From this I conclude that machine floating point cannot be
| commutative on all floating point processors in use today.

No, it doesn't follow.
What your program is supposed to examplify is a well known problem
in floating point arithmetic. I strongly suggest you take  a serious
look at chapters 2 and 4 of
 http://www.ma.man.ac.uk/~higham/asna.html.

Commutavity involves _only_ *two* operands:

 a op b == b op a.

Associativity involves *more* than two operands:

 (a op b) op c == a op (b op c).

It's this latter property that is lacking in floating ppint
arithmctic, *not* commutativity, and that affects long summation.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr




[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/23 Raw View

In article <7fmkk2$65q$1@proxy.st-georgen.gft.de>,
  "Joerg Schaible" <Joerg.Schaible.A@T.gft.de> wrote:
> Try Pascal - it won't work either <g>

I'm not sure about Pascal, but it *is* guaranteed to work in Fortran.
It was also guaranteed to work in Java, but I think they changed this:
it wrecks havoc with performance on Intel architectures.  (Most RISC
machines get it right.)

> Most languages are dependend on the underlaying processor capabilities to
> represent float values.

Correct.  But many languages *do* make requirements with regards to the
reproducibility of the results -- a*b may not be exact, but it is always
the same value.  C/C++ don't: the actual results of a*b may depend on
what you do with it.  In particular, for most (all?) implementations on
Intel hardware, the results will depend on whether you store the value,
or whether it is a temporary, used immediately in the expression.  In
fact, the results may depend on how many floating point register happen
to be free at the moment (which in turn may depend on code widely
separated from the expression in question).

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/23 Raw View

In article <xajr9pdkos1.fsf@korrigan.inria.fr>,
  Gabriel Dos_Reis <gdosreis@korrigan.inria.fr> wrote:
>
> "SANDBERG JOHAN" <johan.sandberg.mail@telia.com> writes:
>
> | Hi
> | The following simple test
> |
> | #include <iostream.h>
> | #include <conio.h>
>
> I was unable to get your program compile with these hearders.
>
> |
> | main() {
> |
> |  float a,b,c;
> |  a = 0.1;
> |  b = 0.1;
> |  c = 0.0;
> |
> |  c += a * b;
> |  c -=  a * b;
> |
> |  cout << c << endl;
> | }
> |
> | > 4.09782e-010
> |
> | Shows that the floating point part of C++ does not work right !??
> | Is this really right, or have a missed something ?
>
> Yes. See 'floating point arithmetic' and promises C++ makes about
> floating point aritmetic. As far as I can tell, the result is
> consistent although not what you were expecting.

I think that his whole point was that it wasn't consistent, and not that
it wasn't precise.  Note well that the initial value of c is 0.0.  We
add a*b to it.  The calcul of a*b might not be precise, but it will give
some number, x.  Which, added to 0.0, should still give x -- it's a
mighty poor implementation which looses precision when adding 0.0.  Then
we recalculate a*b.  Multiplication may not be precise, but it should be
predictable -- the result of the multiplication should be the same value
x as the preceding a*b.  Then we subtract this from the current value of
c, which logically is also x.  Again, while machine arithmetic isn't
necessarily precise in the usual sense, x-x should always give 0.0, for
all representable x.

As I pointed out in another posting, the only explination in this case
(except an intentionally perverse implementation) is that internal
calculations are taking place with more precision than the values are
stored.  For example, I'm willing to bet that if c -= a * b were
replaced by the two statement sequence tmp = a * b ; c -= tmp, the
results would be 0.0.

Anyway, if memory serves me right, the C standard explicitly allows
intermediate results to be maintained in more precision than stored
values, so the implementation is conforming.  But a good implementation
will have compile time options to force intermediate results to the
correct precision; IMHO, these options should be the default, with an
explicit option to allow wider intermediate results.

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/23 Raw View

In article <371E6555.F03@pop.hip.cam.org>,
  hickin@Hydro.CAM.ORG wrote:
>
> SANDBERG JOHAN wrote:
> >
> > Hi
> > The following simple test
> >
> > #include <iostream.h>
> > #include <conio.h>
> >
> > main() {
> >
> >  float a,b,c;
>    double a, b, c;
> >  a = 0.1;
> >  b = 0.1;
> >  c = 0.0;
> >
> >  c += a * b;
> >  c -=  a * b;
> >
> >  cout << c << endl;
> > }
> >
> > > 4.09782e-010

> 0

> > Shows that the floating point part of C++ does not work right !??
> > Is this really right, or have a missed something ?

> You missed two things.

> 1) To produce a result accurate to N significant figures intermediate
> computations need precision M>N.

This should be transparent to the user.

> 2) It may not be possible to exactly represent a given figure in the
> hardware. In this case the number 0.1000... would need an infinitely
> repeating fraction when expressed in base 2. What ends up in the
> floating point register is something that is close to 0.1 but not exact.
>
> Try the following additional experiment with your original code:
>
>    float d = a*(b-b);
>
> You will get a 0 out. This shows that floating point doesn't satisfy the
> distributive law. In fact, it isn't commutative.

What should you get out, if not 0?  Machine floating point isn't
distributive, nor associative, but your example doesn't show this.
Machine floating point should normally be commutative, and it should be
predictable: a*b should always result in the same value, even if that
value isn't precise. Also, a few elementary operations, like a-a should
be precise, for all possible values of a.

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/23 Raw View

James.Kanze@dresdner-bank.com writes:


[...]

| > Try the following additional experiment with your original code:
| >
| >    float d = a*(b-b);
| >
| > You will get a 0 out. This shows that floating point doesn't satisfy the
| > distributive law. In fact, it isn't commutative.
|
| What should you get out, if not 0?  Machine floating point isn't
| distributive, nor associative, but your example doesn't show this.
| Machine floating point should normally be commutative,

As far as IEC 559 is concerned, floating point arithmetic *is*
commutative. But as you said, it isn't distributive nor associative.

| ... and it should be
| predictable: a*b should always result in the same value, even if that
| value isn't precise.

I'd rather say reproducible, not predictable. But that is another story...

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/23 Raw View

In article <MPG.118906f419d0ecbd9898e1@news.mindspring.com>,
  brahms@mindspring.com (Stan Brown) wrote:

> A better-known example of this is the classic for-loop:
>
>  for (double d = 0.1; d! = 1.0; d += 0.1)
>      ;
>
> This loop will never terminate, because 1.0 can be represented exactly in
> binary but 0.1 cannot be, so adding 0.1 to itself will be slightly
> different from 1.0.

Strictly speaking, it is not defined whether the loop will terminate or
not, and in fact, I've used machines where it did terminate.  (In one
case, the floating point was decimal; in other cases, the rounding just
happened to work out right.)

What is certain is that:

    for ( double d = 0.0 ; d != 1.0 ; d += 1.0/n )
        ;

will fail to terminate for some integral n, even if it happens to work
for 10.

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/23 Raw View

James.Kanze@dresdner-bank.com writes:

[...]

| > Yes. See 'floating point arithmetic' and promises C++ makes about
| > floating point aritmetic. As far as I can tell, the result is
| > consistent although not what you were expecting.
|
| I think that his whole point was that it wasn't consistent, and not that
| it wasn't precise.  Note well that the initial value of c is 0.0.  We
| add a*b to it.  The calcul of a*b might not be precise, but it will give
| some number, x.  Which, added to 0.0, should still give x -- it's a
| mighty poor implementation which looses precision when adding 0.0.  Then
| we recalculate a*b.  Multiplication may not be precise, but it should be
| predictable -- the result of the multiplication should be the same value
| x as the preceding a*b.  Then we subtract this from the current value of
| c, which logically is also x.  Again, while machine arithmetic isn't
| necessarily precise in the usual sense, x-x should always give 0.0, for
| all representable x.

Unless, the optimizer enters in play. It's not hard to imagine that
part of the computation is done during constant folding and that
'in memory' floating point representation is less precise than
'register' conterpart:

 c += a * b; // MOV 0.01, %r1  ! 0.01 results from CF
   // STORE %r1, [c] ! ->rounding happens here<-
 c -= a * b; // LOAD [c], %r2
   // SUB %r2, %r1, %r2
   // STORE %r2, [c]

|
| As I pointed out in another posting, the only explination in this case
| (except an intentionally perverse implementation) is that internal
| calculations are taking place with more precision than the values are
| stored.  For example, I'm willing to bet that if c -= a * b were
| replaced by the two statement sequence tmp = a * b ; c -= tmp, the
| results would be 0.0.

First I'd ensure that optimization is turned off.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Michael" <mvl@rocketmail.com>
Date: 1999/04/23 Raw View

SANDBERG JOHAN <johan.sandberg.mail@telia.com> wrote in message
news:371e373a.0@d2o30.telia.com...
>
> Hi
> The following simple test
>
> #include <iostream.h>
> #include <conio.h>
>
> main() {
>
>  float a,b,c;
>  a = 0.1;
>  b = 0.1;
>  c = 0.0;
>
>  c += a * b;
>  c -=  a * b;
>
>  cout << c << endl;
> }
>
> > 4.09782e-010
>
> Shows that the floating point part of C++ does not work right !??
> Is this really right, or have a missed something ?

Since you use floats, set up properly your FPU rounding mode. In case of
MSVC this can be done by
_controlfp(_PC_24, _MCW_PC).
In this case c is equal to 0.0
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "SANDBERG JOHAN" <johan.sandberg.mail@telia.com>
Date: 1999/04/21 Raw View

Hi
The following simple test

#include <iostream.h>
#include <conio.h>

main() {

 float a,b,c;
 a = 0.1;
 b = 0.1;
 c = 0.0;

 c += a * b;
 c -=  a * b;

 cout << c << endl;
}

> 4.09782e-010

Shows that the floating point part of C++ does not work right !??
Is this really right, or have a missed something ?

Regards
Johan Sandberg




[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: juergen@monocerus.demon.co.uk (Juergen Heinzl)
Date: 1999/04/22 Raw View

In article <371e373a.0@d2o30.telia.com>, SANDBERG JOHAN wrote:
>
>Hi
>The following simple test
[...]
>> 4.09782e-010
>
>Shows that the floating point part of C++ does not work right !??
>Is this really right, or have a missed something ?

You've missed something and you might do a search on Yahoo regarding
floating point arithmetic and its problems. Using IEEE arithmetic it
is at least possible to tell in what range the error is (at least
one compiler I know of implemented that and used intervall arithmetic,
though not a C++ compiler).

Cheers,
Juergen

--=20
\ Real name     : J=FCrgen Heinzl                 \       no flames      =
/
 \ EMail Private : juergen@monocerus.demon.co.uk \ send money instead /
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "John D. Hickin" <hickin@Hydro.CAM.ORG>
Date: 1999/04/22 Raw View

SANDBERG JOHAN wrote:
>
> Hi
> The following simple test
>
> #include <iostream.h>
> #include <conio.h>
>
> main() {
>
>  float a,b,c;
   double a, b, c;
>  a = 0.1;
>  b = 0.1;
>  c = 0.0;
>
>  c += a * b;
>  c -=  a * b;
>
>  cout << c << endl;
> }
>
> > 4.09782e-010

0

>
> Shows that the floating point part of C++ does not work right !??
> Is this really right, or have a missed something ?
>

You missed two things.

1) To produce a result accurate to N significant figures intermediate
computations need precision M>N.

2) It may not be possible to exactly represent a given figure in the
hardware. In this case the number 0.1000... would need an infinitely
repeating fraction when expressed in base 2. What ends up in the
floating point register is something that is close to 0.1 but not exact.

Try the following additional experiment with your original code:

   float d = a*(b-b);

You will get a 0 out. This shows that floating point doesn't satisfy the
distributive law. In fact, it isn't commutative.

If you are trying to add an infinite series of positive terms, for
example, sort the terms in ascending order before adding. There are
_forward_ algorithms that can come close; they are also really hard to
figure out. There is an excellent text on numerical analysis by Bjorck
and (I think) Andersen (circa 1971).

Regards, John.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Yannick de Kercadio <kercadio@limsi.fr>
Date: 1999/04/22 Raw View

SANDBERG JOHAN wrote:
>
> #include <iostream.h>
> #include <conio.h>
>
> main() {
>
>  float a,b,c;
>  a = 0.1;
>  b = 0.1;
>  c = 0.0;
>
>  c += a * b;
>  c -=  a * b;
>
>  cout << c << endl;
> }
>
> > 4.09782e-010
>
> Shows that the floating point part of C++ does not work right !??
> Is this really right, or have a missed something ?

First of all, <conio.h> is not required here. Actually, it isn't a
standard header file (as far as I know...)

Your code outputs 0 when compiled on an SGI machine with g++ 2.8.1.

You should remember that "float" is quite often a 32-bit storage, with
an 8-bit exponent and a 24-bit mantissa. 4.09782e-10 is about
0.88 * 2^-31. There should be some valid explanation, but I should have
a look at what your compiler did, what were the settings of the FPU unit
of the processor and so on. But there's really no interest in doing
that. Just replace "float" by "double"...

--
Yannick de Kercadio
kercadio@limsi.fr

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Gabriel Dos_Reis <gdosreis@korrigan.inria.fr>
Date: 1999/04/22 Raw View

"SANDBERG JOHAN" <johan.sandberg.mail@telia.com> writes:

| Hi
| The following simple test
|
| #include <iostream.h>
| #include <conio.h>

I was unable to get your program compile with these hearders.

|
| main() {
|
|  float a,b,c;
|  a = 0.1;
|  b = 0.1;
|  c = 0.0;
|
|  c += a * b;
|  c -=  a * b;
|
|  cout << c << endl;
| }
|
| > 4.09782e-010
|
| Shows that the floating point part of C++ does not work right !??
| Is this really right, or have a missed something ?

Yes. See 'floating point arithmetic' and promises C++ makes about
floating point aritmetic. As far as I can tell, the result is
consistent although not what you were expecting.

--
Gabriel Dos Reis, dosreis@cmla.ens-cachan.fr


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/22 Raw View

In article <371e373a.0@d2o30.telia.com>,
  "SANDBERG JOHAN" <johan.sandberg.mail@telia.com> wrote:
>
> The following simple test
>
> #include <iostream.h>
> #include <conio.h>
>
> main() {
>
>  float a,b,c;
>  a = 0.1;
>  b = 0.1;
>  c = 0.0;
>
>  c += a * b;
>  c -=  a * b;
>
>  cout << c << endl;
> }
>
> > 4.09782e-010
>
> Shows that the floating point part of C++ does not work right !??
> Is this really right, or have a missed something ?

Well, the standard doesn't say anything about precision or
reproducibility, so the implementation does conform.

I started to bang out my stock response concerning the vagracies of
floating point arithmetic on non-infinite hardware, until I looked
closely at the program.  Exact precision is impossible in floating
point; this is an established fact.  But in this case tests
reproducibility, not precision.  And one normally expects the same
(possibly incorrect) results from a given operation or sequence of
operations.

What is happening is obvious for those familiar with IEEE
implementations using extended precision.  If we take the Intel chips as
a typical example, internal operations are carried out in an extended
precision, 64 bits instead of 24 (or 53 for double).  Of course, since
the chip doesn't support direct modification (other than writing) of an
in-memory variable, the compiler basically rewrites c += a * b to c = c
+ (a * b).  The entire right side is evaluated in the floating point
processor, generating a 64 bit result (which is exactly the same as
simply a * b).  This result is then rounded to 53 bits and stored in c.
The same rewrite happens with the second expression, which becomes c = c
- (a * b).  In this case, the 64 bit result of a * b is subtracted from
the value in c, which is the result a * b *rounded* to 24 bits. Since
neither the initial values (0.1) nor the result (0.01) are exactly
representable in any finite number of binary digits, the rounded results
are *not* equal to the unrounded results -- in this case, the difference
is roughly 4.1e-10.

All of this is perfectly conformant to the current C and C++ standards.
(I think that C9x is adding some additional constraints to floating
point arithmetic, but I'm not at all familiar with that aspect of the
new standard.)  The only question can be one of quality of
implementation.  IMHO, any implementation which uses a larger internal
format document this fact, and offer options (preferrably, the default
case) to force intermediate results to the desired precision.

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orientie objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh|ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: brahms@mindspring.com (Stan Brown)
Date: 1999/04/22 Raw View

[This followup was also e-mailed to the cited author.]

Dixitque johan.sandberg.mail@telia.com (SANDBERG JOHAN) in comp.std.c++:

[f.p. calculation should have yielded zero but yielded 4.09782e-010]

>Shows that the floating point part of C++ does not work right !??
>Is this really right, or have a missed something ?

The floating-point numbers are not the reals: they are a limited subset
of the rationals.

You start with 0.1, which must be converted to binary. But there is no
exact representation of 0.1 in binary: 1/10 in binary is the equivalent
of a repeating decimal. For the same reason that a calculator can only
represent an approximation of 1/3, a binary computer can only represent
an approximation of 1/10.

A better-known example of this is the classic for-loop:

 for (double d = 0.1; d! = 1.0; d += 0.1)
     ;

This loop will never terminate, because 1.0 can be represented exactly in
binary but 0.1 cannot be, so adding 0.1 to itself will be slightly
different from 1.0.

--
Stan Brown, Oak Road Systems, Cleveland, Ohio, USA
                                    http://www.mindspring.com/~brahms/
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: christian.bau@isltd.insignia.com (Christian Bau)
Date: 1999/04/22 Raw View

In article <371e373a.0@d2o30.telia.com>, "SANDBERG JOHAN"
<johan.sandberg.mail@telia.com> wrote:

> Hi
> The following simple test
>
> #include <iostream.h>
> #include <conio.h>
>
> main() {
>
>  float a,b,c;
>  a = 0.1;
>  b = 0.1;
>  c = 0.0;
>
>  c += a * b;
>  c -=  a * b;
>
>  cout << c << endl;
> }
>
> > 4.09782e-010
>
> Shows that the floating point part of C++ does not work right !??
> Is this really right, or have a missed something ?

You missed something. About 50 years of development of floating point
arithmetic :-). The result is roughly what one could expect.

C and C++ dont give any guarantees on the precision of floating point
arithmetic. Furthermore, C and C++ give compilers quite a lot of freedom
in implementing floating point arithmetic, so you cant talk about C++
here, but about the implementation of C++ that you are using. Three
different C++ compilers could legally give three different answers.

And anyway,

   #include <conio.h>

invokes undefined behavior, which means ANYTHING can happen.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Joerg Schaible" <Joerg.Schaible.A@T.gft.de>
Date: 1999/04/22 Raw View

Try Pascal - it won't work either <g>

Most languages are dependend on the underlaying processor capabilities to
represent float values.

Greetings, Jvrg
--
BTW: It is normally better to answer to the group! For direct mail reply
exchange the ".A@T." by "@"
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]