Topic: Overflow during floating to unsigned integral conversion


Author: Pavel Minaev <int19h@gmail.com>
Date: Tue, 1 Sep 2009 23:15:18 CST
Raw View
What is the expected behavior when converting a float or double value
that is out of range for an unsigned int? For example:

   unsigned u = -1;
   double d = (unsigned)u;
   ++d;
   u = d;

On one hand, there's 3.9.1[basic.fundamental]/4, and the associated
footnote:

"Unsigned integers, declared unsigned, shall obey the laws of
arithmetic modulo 2^n where n is the number of bits in the value
representation of that particular size of integer.

This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type."

On the other, there's 4.9[conv.fpint]:

"An rvalue of a floating point type can be converted to an rvalue of
an integer type. The conversion truncates; that is, the fractional
part is discarded. The behavior is undefined if the truncated value
cannot be represented in the destination type."

In the code sample above, is the truncated value of double d
unrepresentable in destination type unsigned int? Or is it
representable by the rules of modulo arithmetic? In other words, is
the effect of "u = d" there U.B., or is it guaranteed to produce u==0?

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Helmut Zeisel <zei2006q1@liwest.at>
Date: Wed, 2 Sep 2009 11:37:47 CST
Raw View
On Sep 2, 7:15 am, Pavel Minaev <int...@gmail.com> wrote:
> What is the expected behavior when converting a float or double value
> that is out of range for an unsigned int?

I do not know what the standard says, but clearly float and double
values have finite precision and rounding errors will occur.

> For example:
>
>    unsigned u = -1;

OK, u is the largest unsigned value.

>    double d = (unsigned)u;

Depending on the machine dependent number representations, d might be
numerically different from u because of rounding errors.

>    ++d;

For large floating point values, d+1 might equal d (because of finite
mantissa)

>    u = d;

Machine dependent

> On one hand, there's 3.9.1[basic.fundamental]/4, and the associated
> footnote:
>
> "Unsigned integers, declared unsigned, shall obey the laws of
> arithmetic modulo 2^n where n is the number of bits in the value
> representation of that particular size of integer.

This cannot work when floating point conversion occurs

> On the other, there's 4.9[conv.fpint]:
>
> "An rvalue of a floating point type can be converted to an rvalue of
> an integer type. The conversion truncates; that is, the fractional
> part is discarded. The behavior is undefined if the truncated value
> cannot be represented in the destination type."

This is IMHO the realistic behavior.

Helmut


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Pavel Minaev <int19h@gmail.com>
Date: Wed, 2 Sep 2009 17:41:56 CST
Raw View
On Sep 2, 10:37 am, Helmut Zeisel <zei200...@liwest.at> wrote:
> On Sep 2, 7:15 am, Pavel Minaev <int...@gmail.com> wrote:
>
> > What is the expected behavior when converting a float or double value
> > that is out of range for an unsigned int?
>
> I do not know what the standard says, but clearly float and double
> values have finite precision and rounding errors will occur.

Rounding behavior is clearly specified; it's not a question here. I'm
specifically interested in overflow.

> > For example:
>
> >    unsigned u = -1;
>
> OK, u is the largest unsigned value.
>
> >    double d = (unsigned)u;
>
> Depending on the machine dependent number representations, d might be
> numerically different from u because of rounding errors.
> >    ++d;
>
> For large floating point values, d+1 might equal d (because of finite
> mantissa)

For the sake of simplicity, let's assume 32-bit int and 64-bit IEEE
double (the most common case) - or, in general, any implementation in
which my code would not lose any precision in unsigned-to-double
conversion, and in which double is wide enough that (++d - 1)==d.

> >    u = d;
>
> Machine dependent

There is no such thing in the Standard. Do you mean undefined,
unspecified, or implementation-defined behavior?

> > On one hand, there's 3.9.1[basic.fundamental]/4, and the associated
> > footnote:
>
> > "Unsigned integers, declared unsigned, shall obey the laws of
> > arithmetic modulo 2^n where n is the number of bits in the value
> > representation of that particular size of integer.
>
> This cannot work when floating point conversion occurs

Not directly, obviously, but it could reasonably apply after floating
point value is truncated to produce an integer, if that integer is
outside the range of unsigned int.

> > On the other, there's 4.9[conv.fpint]:
>
> > "An rvalue of a floating point type can be converted to an rvalue of
> > an integer type. The conversion truncates; that is, the fractional
> > part is discarded. The behavior is undefined if the truncated value
> > cannot be represented in the destination type."
>
> This is IMHO the realistic behavior.

This is debatable - for example, personally, I would expect it to
behave in same way as if I assigned an unsigned long value that is too
large to fit in unsigned int (assuming that long is wider than int) -
i.e., well-defined wraparound. Several other C++ developers, when
asked, expected this behavior to be the same - in fact, they were
quite surprised that this may be undefined, when other unsigned
operations are not. Which is why I decided to clarify here.


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Jack Klein <jackklein@spamcop.net>
Date: Thu, 3 Sep 2009 13:08:45 CST
Raw View
On Wed,  2 Sep 2009 17:41:56 CST, Pavel Minaev <int19h@gmail.com>
wrote in comp.std.c++:

> On Sep 2, 10:37 am, Helmut Zeisel <zei200...@liwest.at> wrote:
> > On Sep 2, 7:15 am, Pavel Minaev <int...@gmail.com> wrote:
> >
> > > What is the expected behavior when converting a float or double value
> > > that is out of range for an unsigned int?
> >
> > I do not know what the standard says, but clearly float and double
> > values have finite precision and rounding errors will occur.
>
> Rounding behavior is clearly specified; it's not a question here. I'm
> specifically interested in overflow.
>
> > > For example:
> >
> > >    unsigned u = -1;
> >
> > OK, u is the largest unsigned value.
> >
> > >    double d = (unsigned)u;

Redundant cast above, and one that achieves absolutely nothing since
the conversion is implicit in the assignment anyway.  In an admittedly
unlikely implementation where UINT_MAX was outside the range
representable by double, the undefined behavior would be the same with
or without the case.

> > Depending on the machine dependent number representations, d might be
> > numerically different from u because of rounding errors.
> > >    ++d;
> >
> > For large floating point values, d+1 might equal d (because of finite
> > mantissa)
>
> For the sake of simplicity, let's assume 32-bit int and 64-bit IEEE
> double (the most common case) - or, in general, any implementation in
> which my code would not lose any precision in unsigned-to-double
> conversion, and in which double is wide enough that (++d - 1)==d.
>
> > >    u = d;
> >
> > Machine dependent
>
> There is no such thing in the Standard. Do you mean undefined,
> unspecified, or implementation-defined behavior?

There is no such thing in the standard, but there is such a think in
implementations.  If d == (d + 1), Helmut suggested, then most likely
the assignment back to 'u' is well defined, but perhaps not.  On the
other hand, if ++d is at least equivalent to the value of 'u' + 1.0,
then the assignment is undefined.

So it depends on the relationship among several implementation-defined
specifics, whether the behavior is well-defined or undefined.

Here's another one:

  int i = 32767;
  ++i;

....is the behavior defined?  You cannot say without knowing whether
INT_MAX on the implementation is greater than 32757 or not.

> > > On one hand, there's 3.9.1[basic.fundamental]/4, and the associated
> > > footnote:
> >
> > > "Unsigned integers, declared unsigned, shall obey the laws of
> > > arithmetic modulo 2^n where n is the number of bits in the value
> > > representation of that particular size of integer.
> >
> > This cannot work when floating point conversion occurs
>
> Not directly, obviously, but it could reasonably apply after floating
> point value is truncated to produce an integer, if that integer is
> outside the range of unsigned int.

I think there are two things you are failing to take into account. The
first is the "spirit of C", where both of these rules originate, also
inherited to a certain extent as the "spirit of C++", namely that you
don't pay for what you don't use.

And the second is, what you are talking about would be an awful lot of
work, that everyone would have to pay for.

> > > On the other, there's 4.9[conv.fpint]:
> >
> > > "An rvalue of a floating point type can be converted to an rvalue of
> > > an integer type. The conversion truncates; that is, the fractional
> > > part is discarded. The behavior is undefined if the truncated value
> > > cannot be represented in the destination type."
> >
> > This is IMHO the realistic behavior.
>
> This is debatable - for example, personally, I would expect it to
> behave in same way as if I assigned an unsigned long value that is too
> large to fit in unsigned int (assuming that long is wider than int) -
> i.e., well-defined wraparound. Several other C++ developers, when
> asked, expected this behavior to be the same - in fact, they were
> quite surprised that this may be undefined, when other unsigned
> operations are not. Which is why I decided to clarify here.

Why would you expect it to behave in any way at all when the standard
specifically tells you that it is undefined behavior?

Perhaps you are giving too much weight to theoretical ideas of what
would be nice or proper, instead of considering the cost you would be
subjecting every such floating point value to unsigned integer type
to.

Consider:

There is very little overhead involved in assigning a wider unsigned
integer type to a narrower one.  Consider the case where something
actually needs to be done, such as in the case of a 32-bit RISC
processor without 8-bit or 16-bit registers.

To assign a 32-bit unsigned long (or int) to an 8-bit unsigned char,
the assembly language is typically something like this:

  and  r1, r0, 0xff

....and r1 now contains the desired result.  In other words, it is
usually quite inexpensive, or totally automatic, to discard higher
bits and only keep the lower ones.

So now you have a floating point value, and you use one of the
overloads of modf() to discard the fractional part.  You still have a
floating point type, with a significand and an exponent.  There is
almost certainly no subset of low bits that can be merely masked off
and dropped into an unsigned integer type.

That would mean that each implementation would need to convert the
floating point value, after fractional truncation, into some form of
integer representation -- regardless of the fact that this could
require many times the number of bits in the widest integer type, so
it could grab the lowest 'n' bits that would fit into the destination
type.

So to do what you would expect, the implementation must carry out a
complex and almost certainly expensive set of operations on every
single floating point, or at least on every floating point value
greater than the maximum value of the destination type.  Which means,
at the very least, the compiler must generate code for every single
such assignment to compare the floating point value to the maximum
value of the destination type.

I think you would find that a very large percentage of C++ programmers
would be upset about having to pay the price for something that is
very likely a program defect in the first place.

On the other hand, if you are willing to pay that price, you can write
your own function, templated or inline, to do the extra work in the
places where your program requires it.

--
Jack Klein http://JK-Technology.Com
FAQs for
news:comp.lang.c http://c-faq.com/
news:comp.lang.c++ http://www.parashift.com/c++-faq-lite/
news:alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Helmut Zeisel <zei2006q1@liwest.at>
Date: Thu, 3 Sep 2009 13:10:34 CST
Raw View
On Sep 3, 1:41 am, Pavel Minaev <int...@gmail.com> wrote:
> On Sep 2, 10:37 am, Helmut Zeisel <zei200...@liwest.at> wrote:

> For the sake of simplicity, let's assume 32-bit int and 64-bit IEEE
> double (the most common case) - or, in general, any implementation in
> which my code would not lose any precision in unsigned-to-double
> conversion, and in which double is wide enough that (++d - 1)==d.

OK

 unsigned u = -1;

Now u = 2^32-1

  double d = (unsigned)u;

Withour rounding errors: d=2^32-1

  ++d;

Withour rounding errors: d=2^32

  u = d;

u = 2^32 mod 2^32, which means u=0

VC 9.0 and cygwin, GCC 4.3.2  indeed have u=0.

> > Machine dependent

> There is no such thing in the Standard. Do you mean undefined,
> unspecified, or implementation-defined behavior?

According to  4.9[conv.fpint]: undefined.

> > > "An rvalue of a floating point type can be converted to an rvalue of
> > > an integer type. The conversion truncates; that is, the fractional
> > > part is discarded. The behavior is undefined if the truncated value
> > > cannot be represented in the destination type."
>
> > This is IMHO the realistic behavior.
>
> This is debatable - for example, personally, I would expect it to
> behave in same way as if I assigned an unsigned long value that is too
> large to fit in unsigned int (assuming that long is wider than int) -
> i.e., well-defined wraparound.

This is only possible as long as the involved integers can be
represented exactly as floating point number.
In my tests this was indeed the case and the result was as expected.
It cannot work, however, in situations where d==d+1
(e.g. when double is replaced by 32-bit float)

> Several other C++ developers, when
> asked, expected this behavior to be the same - in fact, they were
> quite surprised that this may be undefined, when other unsigned
> operations are not.

They never did numerical mathematics, did they?

Helmut



--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: brangdon@cix.co.uk (Dave Harris)
Date: Thu, 3 Sep 2009 16:04:52 CST
Raw View
int19h@gmail.com (Pavel Minaev) wrote (abridged):
> In the code sample above, is the truncated value of double d
> unrepresentable in destination type unsigned int?

I'd say so. If the destination type is 16 bits, it can represent values
from 0 to 65535 inclusive. The value you are assigning, once truncated,
is outside that range: it's 65536.0. It can't be represented.


> Or is it representable by the rules of modulo arithmetic?

At this point its type is still double (albeit truncated), so the rules
of modulo arithmetic haven't kicked in yet. It's not like incrementing an
unsigned int.

That's my reading of the bits of standard you cited. From my knowledge of
my local hardware, I'd expect the truncation/conversion to happen within
the IEEE floating point stack, and as such I'd expect it to follow
floating point rules, which often include throwing a hardware exception
on overflow.

(Also, I don't think 0 "represents" 65536 in the usual sense of the
word.)

-- Dave Harris, Nottingham, UK.

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Pavel Minaev <int19h@gmail.com>
Date: Sat, 5 Sep 2009 11:08:02 CST
Raw View
On Sep 3, 12:10 pm, Helmut Zeisel <zei200...@liwest.at> wrote:
> > On Sep 2, 10:37 am, Helmut Zeisel <zei200...@liwest.at> wrote:
> > For the sake of simplicity, let's assume 32-bit int and 64-bit IEEE
> > double (the most common case) - or, in general, any implementation in
> > which my code would not lose any precision in unsigned-to-double
> > conversion, and in which double is wide enough that (++d - 1)==d.
>
> OK
>
>  unsigned u = -1;
>
> Now u = 2^32-1
>
>   double d = (unsigned)u;
>
> Withour rounding errors: d=2^32-1
>
>   ++d;
>
> Withour rounding errors: d=2^32
>
>   u = d;
>
> u = 2^32 mod 2^32, which means u=0
>
> VC 9.0 and cygwin, GCC 4.3.2  indeed have u=0.

I've also checked it on various implementations, but, naturally, their
results, even when they agree, cannot be taken as a definite proof
that this is indeed what Standard mandates. In particular, your
transformation of:

    u = d;

to

    u = 2^32 mod 2^32

is what I'm unsure about, as I cannot find any wording in the Standard
that would unambiguously require it to go that way (and the only other
place that applies seems to hint at U.B.).

> > > Machine dependent
> > There is no such thing in the Standard. Do you mean undefined,
> > unspecified, or implementation-defined behavior?
>
> According to  4.9[conv.fpint]: undefined.

Do you mean this in general (i.e. for arbitrary value of `u` not known
in advance), or considering the specific constraints outlined above,
for which you also gave the evaluation as: u = 2^32 mod 2^32. I'm
confused now, because I'm not sure whether you're arguing for it being
well-defined _for this particular case_, or whether it is still U.B.

> > > > "An rvalue of a floating point type can be converted to an rvalue of
> > > > an integer type. The conversion truncates; that is, the fractional
> > > > part is discarded. The behavior is undefined if the truncated value
> > > > cannot be represented in the destination type."
>
> > > This is IMHO the realistic behavior.
>
> > This is debatable - for example, personally, I would expect it to
> > behave in same way as if I assigned an unsigned long value that is too
> > large to fit in unsigned int (assuming that long is wider than int) -
> > i.e., well-defined wraparound.
>
> This is only possible as long as the involved integers can be
> represented exactly as floating point number.
> In my tests this was indeed the case and the result was as expected.
> It cannot work, however, in situations where d==d+1
> (e.g. when double is replaced by 32-bit float)

I am specifically only interested in a "normal" situation, where (d +
1) - d == 1. Not in the most general case for all possible legal
values of d.

> > Several other C++ developers, when
> > asked, expected this behavior to be the same - in fact, they were
> > quite surprised that this may be undefined, when other unsigned
> > operations are not.
>
> They never did numerical mathematics, did they?

They (and me) have common sense. I don't see why these should be any
different:

    unsigned int ui;

    unsigned long ul = <some-number-too-large-for-unsigned-int>;
    i = ul; // okay, wraparound

    double d = <exact-same-number>.0;
    i = d; // U.B.?

Yet, reading the Standard, it seems that U.B. is more likely than not.
Which is why I wanted to clarify this, and possibly find out the
rationale.

Again, since we keep returning to this all the time, let me remind
that I'm only considering the case where d can represent the value
exactly with no loss of precision. No need to go into corner cases
where (d+1)==d etc. The only "unusual" thing happening here is
potential overflow.


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Helmut Zeisel <zei2006q1@liwest.at>
Date: Sat, 5 Sep 2009 22:26:16 CST
Raw View
On 5 Sep., 19:08, Pavel Minaev <int...@gmail.com> wrote:

>     u = 2^32 mod 2^32
>
> is what I'm unsure about, as I cannot find any wording in the Standard
> that would unambiguously require it to go that way (and the only other
> place that applies seems to hint at U.B.).

Yes,  also from my understanding the place that applies says U.B

> > According to  4.9[conv.fpint]: undefined.

>: u = 2^32 mod 2^32. I'm
> confused now, because I'm not sure whether you're arguing for it being
> well-defined _for this particular case_, or whether it is still U.B.

U.B, When you want to write portable code, do not use such
conversions.

> I am specifically only interested in a "normal" situation, where (d +
> 1) - d == 1. Not in the most general case for all possible legal
> values of d.

OK, I agree. 4.9[conv.fpint] could be formulated dependent on the
precision of the floating type
and define modulo operation for an unsigned destination type as long
as the float number is "small enough" to satisfy (d+1)-d==1.
What should it do, however, for a signed destination type (which IMHO
is the more important case)?
Maybe you can explain why you want to use this;
from my experience it is better not to trust float-to-int conversions
too much.

> > They never did numerical mathematics, did they?
>
> They (and me) have common sense.

Yes. But numerical mathematics sometimes contradicts common sense
(for example, associative law is no longer vaild).

Helmut


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]