Thread

Topic: Shall I avoid every Undefined Behavior?

Author: James Kanze <james.kanze@gmail.com>
Date: Sun, 2 May 2010 14:25:06 CST Raw View

On Apr 30, 12:52 am, croberts <chrisvrobe...@gmail.com> wrote:
> As has already been stated: undefined behaviour should be avoided.

But at what level?  The original poster seemed to be trying to
write a low level function which would not result in undefined
behavior, regardless of the arguments passed to it.  This
simply isn't possible.

> A balanced view should be reached depending on the purpose of the
> code. For example, if the code is destined for a safety critical
> system then all of the examples you gave can and should be avoided.

They can't be avoided at the level the original poster was
asking about.

> This level of checking, and restrictions on implementation, are likely
> to be unsuitable for other applications.

> > How can we check (in platform independent manner) that 0x27AA98C0 is
> > a valid memory location?

This is a good example.  You can't.  Period.

> Here the parameter could be passed by reference (preferably const
> reference) to an object which is not heap or free store allocated.
>
> void fun([const] Type &ptr ) {
>   ptr.run();
> }

> Type t;
> fun (t);

Which doesn't really change anything.  You can get an invalid
reference just as easily as you can get an invalid pointer.

> For safety critical systems, dynamic allocation is often restricted or
> forbidden.

Agreed, but irrelevant.  (In fact, a safer option with regards
to valid pointers would be to require all pointers and
references to point to dynamically allocated memory, ban the &
operator, and use garbage collection.  The reason critical
systems ban dynamic allocation is not related to pointer
validity.)

> > Second one is what happens if we nest too many function
> > calls. It was already pointed out to me that the C++
> > standard doesn't require an instruction stack for the
> > program, but clearly whatever the implementation would be
> > something will happen when we nest too many function calls.

> This is most likely to happen when recursion is used, which,
> should not be the case for safety critical code. All recursive
> implementations can be rewritten, for example, by implementing
> your own stack which can be monitored and handled safely (and
> in a defined way) should it become full.

Alternatively, any implementation designed for use in critical
systems could "define" the behavior that occurs when the stack
overflows.

> Analysis can be performed to assess the depth that the call
> stack will reach in the absence of recursion (both self
> recursion and larger cycles in the function call graph). This
> can then be verified to fall within the limits of the
> particular platform (I appreciate that this is therefore not a
> platform independent solution).

> > Every use of practically any arithmetic operation may cause UB.

> Yes, although as you yourself identified, checks can be made
> to identify when this will occur and allow error handling to
> be performed. This could represent a significant latency
> overhead but may be required to meet safety requirements.
> (Underflow and overflow bits can also be monitored and handled
> on interrupts on some embedded platforms.)

This is actually a red herring, since even in non-critical
applications, any descent programmer will take the necessary
precautions to ensure that overflow can't occur.  But at the
input level, not before each and every operator.

> > So, I guess my question is, what should be a good
> > programmer's attitude to UB?

> It depends entirely on the application for which the code is
> being generated.

Yes and no.  Regardless of the application, you don't deliver a
program which has undefined behavior.  But you likely do depend
on a lot of behavior which is defined outside of the standard,
by the implementation.

> Undefined behaviour should be avoided... but the amount of time and
> effort during development; and runtime overhead that is considered
> acceptable depends entirely on the given application.

Eliminating undefined behavior generally reduces your total
development cost, and only very rarely has a measurable impact
on performance.

--
James Kanze

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: restor <akrzemi1@gmail.com>
Date: Fri, 23 Apr 2010 11:00:28 CST Raw View

Hi,
This is a list of questions and observations about how seriously the
avoidance of the Undefined Behavior should be treated, and if it is at
all possible to avoid all Undefined Behaviors in the program.

As far as I understand Undefined Behavior is a characteristic of the
program rather than the code; 'program' meaning here the code and the
particular input data that was fed to the program. Thus the code:

   void fun( Type * ptr ) {
       ptr->run(); // ptr == NULL ??
   }

cannot be said to have a UB, but the program that executes will have a
UB when ptr's value happens to be pointing at some invalid memory.
Also, a single place in the code cannot be considered directly
responsible for the UB. In the above example, dereferencing a bad
pointer is UB, but the re is nothing in the function that could be
done to avoid the UB. How can we check (in platform independent
manner) that 0x27AA98C0 is a valid memory location?

So that is one of the examples of UB that I want to consider. Second
one is what happens if we nest too many function calls. It was already
pointed out to me that the C++ standard doesn't require an instruction
stack for the program, but clearly whatever the implementation would
be something will happen when we nest too many function calls. I
believe that the standard states somewhere, that whatever behavior is
not explicitly defined is supposed to be considered undefined.

There is no way to detect that (at least in portable way), so we need
to allow this possibility of Undefined Behavior.

Third one is the following example:


   {
       unsigned long long int i, j;
       file >> i >> j;
       std::cout << process(i, j);
   }

   unsigned long long int process( unsigned long long i,
                                       unsigned long long j ) {
       return i + j;
   }

If the sum of i and j exceeds the maximum value of long long type, the
behavior of the program is undefined. Every use of practically any
arithmetic operation may cause UB. Suppose I wanted to protect my
program against UB; what should my function 'process' check as
precondition?

   i + j <= numeric_limit<unsigned long long>::max(); //?

but it performs the dangerous addition already.

   i <= numeric_limit<unsigned long long>::max() - j; //?

but isn't the result of subtraction supposed to be a signed type (and
overflow in this case for small j)?

   i <= numeric_limit<unsigned long long>::max() / 2  &&
       j <= numeric_limit<unsigned long long>::max() / 2; //?

Well, it prevents some valid combinations of i and j, but if I expect
small values it might do. But it is not a 'technically right'
solution.

   i > j ?
         ( j < numeric_limit<unsigned long long>::max() / 2  &&
           i > numeric_limit<unsigned long long>::max() / 2 ?
                 ( i - numeric_limit<unsigned long long>::max() / 2
                   + j < numeric_limit<unsigned long long>::max() / 2
                 ) : true
     )
         :
         ( .......

Perhaps, there exists an expression that does not overflow, but checks
the overflow condition, but I cannot imagine a C++ "best practice"
saying "whenever you add numbers verify the addition's precondition".
"Write your own, as there is no standard". I have never checked for
such a precondition, although I add int-s quite often. Should I be
considered an "incautious" programmer, or is it simply the right way
not to care about any possible UB that might appear on _some_,
theoretically possible input? I don't believe I have ever seen a
precondition in anyone's code (even in form of a comment) that i + j
must not overflow. Even in the C++ Standard, where std::accumulate
could make use of such a precondition it is not employed.

So, I guess my question is, what should be a good programmer's
attitude to UB? Shall (s)he make sure none must ever happen, or should
they be considered "within reason"? I suppose the latter, and perhaps
I was too much scared by the advice like "UB can do anything in your
program, so do not have it."

Regards,
&rzej

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: usenet@mkarcher.dialup.fu-berlin.de (Michael Karcher)
Date: Sun, 25 Apr 2010 16:09:36 CST Raw View

restor <akrzemi1@gmail.com> wrote:

> particular input data that was fed to the program. Thus the code:
>
>    void fun( Type * ptr ) {
>        ptr->run(); // ptr == NULL ??
>    }
>
> cannot be said to have a UB, but the program that executes will have a
> UB when ptr's value happens to be pointing at some invalid memory.
Exactly. The function "fun" has the precondition that ptr points to valid
memory, that's nothing special in C++.

> Also, a single place in the code cannot be considered directly
> responsible for the UB.
The code path that makes a bad pointer reach the call to "fun" is
responsible for the UB. If you documented the precondition along the call
stack, you can clearly find out at what point you violated one of them to
have a invalid pointer get here - that's the single place in code that's
responsible.

> In the above example, dereferencing a bad pointer is UB, but the re is
> nothing in the function that could be done to avoid the UB. How can we
> check (in platform independent manner) that 0x27AA98C0 is a valid memory
> location?
You can't.


> So that is one of the examples of UB that I want to consider. Second
> one is what happens if we nest too many function calls.
The maximum number of nested calls is an implementation-defined limit IIRC.
So you can check the documentation of your compiler to find out the stack
limit, or to find out how to get information about the stack limit (at
least if the documentation is good). Using whole-program analysis
algorithms, it might be possible to prove that the stack size or nesting
level that is guaranteed by the specific implementation is not exceeded.

> Third one is the following example:
>
>
>    {
>        unsigned long long int i, j;
>        file >> i >> j;
>        std::cout << process(i, j);
>    }
>
>    unsigned long long int process( unsigned long long i,
>                                        unsigned long long j ) {
>        return i + j;
>    }
>
> If the sum of i and j exceeds the maximum value of long long type, the
> behavior of the program is undefined.

If "i" and "j" were signed, this is right. But for unsigned integer types,
the behaviour of overflow is clearly defined: It's modular arithmetic with a
modulus of (std::numeric_limis<unsigned long long>::max() + 1).


> Every use of practically any
> arithmetic operation may cause UB. Suppose I wanted to protect my
> program against UB; what should my function 'process' check as
> precondition?
>
>    i + j <= numeric_limit<unsigned long long>::max(); //?
> but it performs the dangerous addition already.
Right, this won't work. i+j is (see above) guaranteed to be at max
std::numeric_limits<unsigned long long>::max(). For signed integers,
you get undefined behaviour in this check.

>    i <= numeric_limit<unsigned long long>::max() - j; //?
> but isn't the result of subtraction supposed to be a signed type (and
> overflow in this case for small j)?
The difference of unsigned quantities is again an unsigned quantity in
C++. The same comment about modular arithmetic applies. So if your system
has 32-bit ints, the term ((unsigned)5-(unsigned)6) has the value
0xFFFFFFFF, which is a bit more than 4 billions.

> Perhaps, there exists an expression that does not overflow, but checks
> the overflow condition, but I cannot imagine a C++ "best practice"
> saying "whenever you add numbers verify the addition's precondition".
For unsigned ints, it's quite easy: if (i+j < i) an overflow happened.
For signed ints, it gets much more messy:
 if(((i>0 && j>0) && ((unsigned)i+(unsigned)j)
                            > std::numeric_limits<int>::max())
    || ((i<0 && j<0) && ((unsigned)(-i)+(unsigned)(-j)-1)
                            > (-std::numeric_limis<int>::min()-1))
        /* overflow */

Note the "-1" thing in the second line to prevent the overflow if both i and
j are std::numeric_limits<int>::min(). As all arithmetics are done with
unsigned variables, it can't induce undefined behaviour.

> "Write your own, as there is no standard".
Yes, that's the only approach.

> I have never checked for
> such a precondition, although I add int-s quite often. Should I be
> considered an "incautious" programmer, or is it simply the right way
> not to care about any possible UB that might appear on _some_,
> theoretically possible input?
You should document the limit. In that case, the caller is responsible if he
violates it.

> I don't believe I have ever seen a
> precondition in anyone's code (even in form of a comment) that i + j
> must not overflow. Even in the C++ Standard, where std::accumulate
> could make use of such a precondition it is not employed.
std::accumulate just calls the accumulation function, the undefined
behaviour thus does not occur within std::accumulate. If you pass arguments
to std::accumulate that cause undefined behaviour, it's your fault. In this
case, you could (if you are not already at the biggest integer size) just
accumulate into the next bigger integer type to be safe.

> So, I guess my question is, what should be a good programmer's
> attitude to UB? Shall (s)he make sure none must ever happen, or should
> they be considered "within reason"?
If your program is security relevant (i.e. runs with elevated privileges)
you better make sure you don't do anything with undefined behaviour. The
best attitude in my oppinion is to avoid it where possible and document
the limits where the called function is not sensibly able to avoid the UB.

> I suppose the latter, and perhaps
> I was too much scared by the advice like "UB can do anything in your
> program, so do not have it."
If you ever traced bugs that rooted in memory mismanagement, you quite
quickly get picky about avoiding this kind of undefined behaviour. For
signed integer overflow, you can get great lengths by just ignoring the
possible undefined behaviour, but there might be some implementation
that makes your program crash at that point - you must not rely on undefined
behaviour to behave. For unsigned integers, there is no undefined behaviour
caused by overflowing arithmetics, but even overflowing unsigned integer
arithmetics may void your program's logic assumptions and cause unintended
things to happen, like in

 const unsigned int array_size = 3;
 int array[array_size] = {1,2,3}
 /* ... */
 usigned int idx = 0xFFFFFFFE; /* assume a 32 bit system */

 /* ... */

 /* verify that we don't go past the array */
 if(idx + 2 >= array_size) return ERROR;
 return array[idx] + array[idx+2];

This code *does* contain undefined behaviour! idx+2 is 0 on a 32 bit system.
That line is fine, and as 0 is smaller than 3, the "return ERROR" statement
is not executed. But array[0xFFFFFFFE] accesses memory outside of the array
and this causes undefined behaviour.

Regards,
 Michael Karcher

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: =3D?ISO-8859-1?Q?=3DD6=3DF6_Tiib?=3D <ootiib@hot.ee>
Date: Sun, 25 Apr 2010 17:04:46 CST Raw View

On 23 apr, 20:00, restor <akrze...@gmail.com> wrote:
>
> =E1 =E1void fun( Type * ptr ) {
> =E1 =E1 =E1 =E1ptr->run(); // ptr == NULL ??
> =E1 =E1}
>
> cannot be said to have a UB, but the program that executes will have a
> UB when ptr's value happens to be pointing at some invalid memory.
> Also, a single place in the code cannot be considered directly
> responsible for the UB.

It always is a single place on every case that UB happens. It is the
place that propagated that something, that looks like  Type*, but is
actually pointing at invalid memory or at object that is not Type.
Instead of wondering how to make fun() to discover if ptr is bad
pointer you should consider overlooking your techniques that produce
invalid pointers outside of fun(). Fun can (and should) check that it
is not null pointer.

> Second
> one is what happens if we nest too many function calls. It was already
> pointed out to me that the C++ standard doesn't require an instruction
> stack for the program, but clearly whatever the implementation would
> be something will happen when we nest too many function calls.
> [...]
> There is no way to detect that (at least in portable way), so we need
> to allow this possibility of Undefined Behavior.

You mean as too deep recursion? There are techniques that let you to
avoid too deep recursion. You should apply these to your
implementations and algorithms.

Your topic asked if you should avoid but you seem to describe second
time a code that did not avoid and now tries to detect and to cure.
Wrong. Thee Shall Avoid.

> Third one is the following example:
>
> =E1 =E1{
> =E1 =E1 =E1 =E1unsigned long long int i, j;
> =E1 =E1 =E1 =E1file >> i >> j;
> =E1 =E1 =E1 =E1std::cout << process(i, j);
> =E1 =E1}
>
> =E1 =E1unsigned long long int process( unsigned long long i,
> =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =
=E1 =E1unsigned long long j ) {
> =E1 =E1 =E1 =E1return i + j;
> =E1 =E1}
>
> If the sum of i and j exceeds the maximum value of long long type, the
> behavior of the program is undefined.

Unsigned long long is not C++ type by established standard. If to
imagine that it is and behaves like all unsigned types then there are
no UB here. It is well defined  by rules of modular arithmetic.
If you want to detect that result did not overflow, then detect that
in your code:

unsigned sum_with_throw_on_overflow( unsigned a, unsigned b )
{
   unsigned ret = a + b;
   if ( ret < a )
        throw std::runtime_error( "unsigned sum did overflow" );
   return ret;
}

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: Mathias Gaunard <loufoque@gmail.com>
Date: Mon, 26 Apr 2010 00:24:47 CST Raw View

On 23 avr, 18:00, restor <akrze...@gmail.com> wrote:
> Hi,
> This is a list of questions and observations about how seriously the
> avoidance of the Undefined Behavior should be treated, and if it is at
> all possible to avoid all Undefined Behaviors in the program.

Yes, a good program shall never invoke undefined behaviour.

> As far as I understand Undefined Behavior is a characteristic of the
> program rather than the code; 'program' meaning here the code and the
> particular input data that was fed to the program. Thus the code:
>
> =E1 =E1void fun( Type * ptr ) {
> =E1 =E1 =E1 =E1ptr->run(); // ptr == NULL ??
> =E1 =E1}
>
> cannot be said to have a UB, but the program that executes will have a
> UB when ptr's value happens to be pointing at some invalid memory.
> Also, a single place in the code cannot be considered directly
> responsible for the UB. In the above example, dereferencing a bad
> pointer is UB, but the re is nothing in the function that could be
> done to avoid the UB. How can we check (in platform independent
> manner) that 0x27AA98C0 is a valid memory location?

You don't check because that doesn't make sense.
It's up to your program to guarantee your pointers point to objects
when you dereference them.

> So that is one of the examples of UB that I want to consider. Second
> one is what happens if we nest too many function calls.

I don't believe it triggers undefined behaviour. It's just
implementation limits or something like that if I recall correctly.

> Third one is the following example:
>
> =E1 =E1{
> =E1 =E1 =E1 =E1unsigned long long int i, j;
> =E1 =E1 =E1 =E1file >> i >> j;
> =E1 =E1 =E1 =E1std::cout << process(i, j);
> =E1 =E1}
>
> =E1 =E1unsigned long long int process( unsigned long long i,
> =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =E1 =
=E1 =E1unsigned long long j ) {
> =E1 =E1 =E1 =E1return i + j;
> =E1 =E1}
>
> If the sum of i and j exceeds the maximum value of long long type, the
> behavior of the program is undefined.

That is wrong.
Not only is the behaviour well-defined, the value you obtain is well-
specified in function of the maximum value as well.

> Every use of practically any
> arithmetic operation may cause UB.

Not on unsigned integer types.
For signed ones, I don't remember whether it triggers undefined
behaviour or whether it just results in an implementation-defined
value. The latter would be better, of course. Note those are two
vastly different things.

If it's UB it might be worth only considering it as an implementation-
defined value anyway, unless you care about the platforms where it
really is.

> Suppose I wanted to protect my
> program against UB; what should my function 'process' check as
> precondition?

Detecting whether an unsigned operation overflows or not can certainly
be done without any issue.

> Perhaps, there exists an expression that does not overflow, but checks
> the overflow condition, but I cannot imagine a C++ "best practice"
> saying "whenever you add numbers verify the addition's precondition".
> "Write your own, as there is no standard". I have never checked for
> such a precondition, although I add int-s quite often. Should I be
> considered an "incautious" programmer, or is it simply the right way
> not to care about any possible UB that might appear on _some_,
> theoretically possible input? I don't believe I have ever seen a
> precondition in anyone's code (even in form of a comment) that i + j
> must not overflow.

Probably because most programs work regardless of whether overflow
happens or not.
If you want to detect overflow, there are some libraries that wrap
integers and throw on overflow, or that allow to tell whether an
operation is going to overflow.

> Even in the C++ Standard, where std::accumulate
> could make use of such a precondition it is not employed.

Why should std::accumulate check for that? Maybe the overflow
behaviour is what the user intends.

> So, I guess my question is, what should be a good programmer's
> attitude to UB?

First, understand what is UB, what is implementation-defined, and what
is well-defined.
Then there is the standard and actual implementations. Some
implementations may make some conditions that normally triggers
undefined behaviour into something well-defined, so if you ignore
portability those are not truly UB anymore.

Then you should write your code in a way that never triggers UB; there
is no need to check for it, simply statically guarantee it.

> Shall (s)he make sure none must ever happen, or should
> they be considered "within reason"?

Never happen short of a bug.

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Kenneth 'Bessarion' Boyd" <zaimoni@zaimoni.com>
Date: Mon, 26 Apr 2010 00:23:47 CST Raw View

On Apr 23, 12:00 pm, restor <akrze...@gmail.com> wrote:
> Hi,
> This is a list of questions and observations about how seriously the
> avoidance of the Undefined Behavior should be treated, and if it is at
> all possible to avoid all Undefined Behaviors in the program.
>
> ....
>
> Third one is the following example:
>
>    {
>        unsigned long long int i, j;
>        file >> i >> j;
>        std::cout << process(i, j);
>    }
>
>    unsigned long long int process( unsigned long long i,
>                                        unsigned long long j ) {
>        return i + j;
>    }
>
> If the sum of i and j exceeds the maximum value of long long type, the
> behavior of the program is undefined. Every use of practically any
> arithmetic operation may cause UB. Suppose I wanted to protect my
> program against UB; what should my function 'process' check as
> precondition?
>
> ....
>
>    i <= numeric_limit<unsigned long long>::max() - j; //?
>
> but isn't the result of subtraction supposed to be a signed type (and
> overflow in this case for small j)?

Actually, no (subtraction uses the same integer type promotion rules
as addition).

> Perhaps, there exists an expression that does not overflow, but checks
> the overflow condition,

You already named it.

> but I cannot imagine a C++ "best practice"
> saying "whenever you add numbers verify the addition's precondition".

Then expand your imagination:
https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow
.  (Yes, C, but the same issue exists in C++ and CERT is working on a
secure coding standard for C++ ; this would be imported.)  I do
recommend skimming the entire CERT C coding standard.

I haven't personally skimmed the entire MISRA C coding standard, but I
do need to schedule that as well.

> ....
>
> So, I guess my question is, what should be a good programmer's
> attitude to UB?

My attitude is:
* use when required by some other standard that is applicable (e.g.,
POSIX and casting of function pointers to void pointers for certain
POSIX library functions)
* use when some other standard in force forces defined behavior (e.g.,
when IEEE floating point arithmetic is actually known to apply)
* use, with full expectation of non-portability to even different
versions of the same compiler, when the portable analog fails.  Have
test cases in hand for this situation.  (union vs. static_cast of
pointers comes to mind here; one of the projects I work on has two
instances of this.)

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: James Kanze <james.kanze@gmail.com>
Date: Mon, 26 Apr 2010 23:00:32 CST Raw View

On Apr 23, 6:00 pm, restor <akrze...@gmail.com> wrote:

> This is a list of questions and observations about how
> seriously the avoidance of the Undefined Behavior should be
> treated, and if it is at all possible to avoid all Undefined
> Behaviors in the program.

> As far as I understand Undefined Behavior is a characteristic
> of the program rather than the code; 'program' meaning here
> the code and the particular input data that was fed to the
> program. Thus the code:

>    void fun( Type * ptr ) {
>        ptr->run(); // ptr == NULL ??
>    }

> cannot be said to have a UB, but the program that executes
> will have a UB when ptr's value happens to be pointing at some
> invalid memory.  Also, a single place in the code cannot be
> considered directly responsible for the UB. In the above
> example, dereferencing a bad pointer is UB, but the re is
> nothing in the function that could be done to avoid the UB.
> How can we check (in platform independent manner) that
> 0x27AA98C0 is a valid memory location?

You can't.  Locally, if some calling function gives you a bad
pointer (say a pointer to an already deleted block, or just
random bits), there's not much you can do about it.

> So that is one of the examples of UB that I want to consider.
> Second one is what happens if we nest too many function calls.
> It was already pointed out to me that the C++ standard doesn't
> require an instruction stack for the program, but clearly
> whatever the implementation would be something will happen
> when we nest too many function calls. I believe that the
> standard states somewhere, that whatever behavior is not
> explicitly defined is supposed to be considered undefined.

There is a rule somewhere (at least in the C standard, but
I think in C++ as well) that resource exhaustion is undefined
behavior.  Stack overflow is a case of resource exhaustion.

Depending on the platform, it may have defined behavior, or
(less frequently) there may be some way of detecting it in
advance, and not triggering the actual condition.  (Under most
Unix, for example, it is "defined" as generating a core dump.
Of course, depending on how virtual memory is configured, you
may start thrashing before, so much as to render the system
unusable.)

> There is no way to detect that (at least in portable way), so
> we need to allow this possibility of Undefined Behavior.

> Third one is the following example:

>    {
>        unsigned long long int i, j;
>        file >> i >> j;
>        std::cout << process(i, j);
>    }

>    unsigned long long int process( unsigned long long i,
>                                        unsigned long long j ) {
>        return i + j;
>    }

> If the sum of i and j exceeds the maximum value of long long
> type, the behavior of the program is undefined.

Not with unsigned.

> Every use of practically any arithmetic operation may cause
> UB.

Not really.  You define the functions to take input within
a certain range, which you can prove won't overflow, and you
validate all input to make sure that the input is in that range.

> Suppose I wanted to protect my program against UB; what
> should my function 'process' check as precondition?

Protecting a program against undefined behavior must be done at
a higher level.

> Perhaps, there exists an expression that does not overflow, but checks
> the overflow condition, but I cannot imagine a C++ "best practice"
> saying "whenever you add numbers verify the addition's precondition".

You don't do it for every addition.  You do it when you input
data.

> "Write your own, as there is no standard". I have never checked for
> such a precondition, although I add int-s quite often. Should I be
> considered an "incautious" programmer, or is it simply the right way
> not to care about any possible UB that might appear on _some_,
> theoretically possible input?

If you haven't validated your input, and "proven" that overflow
can't occur for all values which pass the validation, you're an
incautious programmer.

> I don't believe I have ever seen a
> precondition in anyone's code (even in form of a comment) that i + j
> must not overflow.

You're looking at it at too low a level.  The "preconditions"
aren't implemented in each individual function, but rather in
the validation stage of input.

> Even in the C++ Standard, where std::accumulate
> could make use of such a precondition it is not employed.

> So, I guess my question is, what should be a good programmer's
> attitude to UB? Shall (s)he make sure none must ever happen, or should
> they be considered "within reason"? I suppose the latter, and perhaps
> I was too much scared by the advice like "UB can do anything in your
> program, so do not have it."

In a certain sense, you can never totally avoid UB, at least as
far as the standard is concerned.  The standard provides no
means of determining, in advance, how much resources you will
need, and it provides no way of verifying in advance that you'll
not exceed them.  In any given environment, however, you can
usually achieve 0% UB, since most environments do define some of
the things left undefined in the standard.

--
James Kanze

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: croberts <chrisvroberts@gmail.com>
Date: Thu, 29 Apr 2010 17:52:58 CST Raw View

As has already been stated: undefined behaviour should be avoided.

A balanced view should be reached depending on the purpose of the
code. For example, if the code is destined for a safety critical
system then all of the examples you gave can and should be avoided.
This level of checking, and restrictions on implementation, are likely
to be unsuitable for other applications.

> How can we check (in platform independent manner) that 0x27AA98C0 is
> a valid memory location?

Here the parameter could be passed by reference (preferably const
reference) to an object which is not heap or free store allocated.

void fun([const] Type &ptr ) {
  ptr.run();
}

Type t;
fun (t);

For safety critical systems, dynamic allocation is often restricted or
forbidden.

> Second one is what happens if we nest too many function calls. It was
already
> pointed out to me that the C++ standard doesn't require an instruction
> stack for the program, but clearly whatever the implementation would
> be something will happen when we nest too many function calls.

This is most likely to happen when recursion is used, which, should
not be the case for safety critical code. All recursive
implementations can be rewritten, for example, by implementing your
own stack which can be monitored and handled safely (and in a defined
way) should it become full.

Analysis can be performed to assess the depth that the call stack will
reach in the absence of recursion (both self recursion and larger
cycles in the function call graph). This can then be verified to fall
within the limits of the particular platform (I appreciate that this
is therefore not a platform independent solution).

> Every use of practically any arithmetic operation may cause UB.

Yes, although as you yourself identified, checks can be made to
identify when this will occur and allow error handling to be
performed. This could represent a significant latency overhead but may
be required to meet safety requirements. (Underflow and overflow bits
can also be monitored and handled on interrupts on some embedded
platforms.)

> So, I guess my question is, what should be a good programmer's
> attitude to UB?

It depends entirely on the application for which the code is being
generated.

Undefined behaviour should be avoided... but the amount of time and
effort during development; and runtime overhead that is considered
acceptable depends entirely on the given application.

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]