Topic: Access uninitialised object intrinsic type.


Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Mon, 20 Nov 2006 00:00:31 CST
Raw View
"Bo Persson" wrote:
> No, I'm arguing that the rules for allowing uninitialized data is *only* to
> be compatible with C. If C++ were to specify that all ints are initialized,
> someone would simply write
>
> void bench(int i)
> {
>    int   x[10000000];
>
>    x[i] = 0;
> }
>
>
> int main()
> {
>    int test = 0;
>
>    for ( ; test < 1000000; i++)
>       bench(test);
>
> }
>
>
> to "prove" that C is a million times faster than C++.

This program is more likely to show the opposite: that C++ (even with
zero-initialization of automatically-allocated, fundamental types) is
million times "faster" than C. An optimizing C++ compiler should
eliminate the call to bench() since the call has no side effects; so a
C++ program should complete the loop in a single iteration and not the
1,000,000 iterations that a C program could well require. And even if
the C++ program does call bench(), a C++ optimizing compiler could skip
zeroing out the 999,999 array members - since their initial values have
no effect on the observable behavior of the program. In other words,
what the C++ program doesn't know (namely, the values in the x array)
can't hurt it. So there is no reason to expect a C++ program to be any
slower than the C program - and in fact by declaring bench() as an
inline function - the C++ program could well be faster.

> In real C++ we would use a std::vector to get away from the initialization
> problem. C can't do that. To level the field, we have to keep the C rules
> for C-style code.

One of the design goals of C++ is to be "a better C than C". And that
goal cannot be attained if C++ retains the same rules as C and does
nothing any differently - or any better - than C.

The initialization of automatic variables certainly appears to be one
area where C++ could improve upon C - and yet incur no additional cost
in performance for doing so. As the program above demonstrates,
initializing every local variable does not necessarily require
initializing every local variable. Any local variable that is not used
- and any that are initialized with - or assigned - a value before
being used, could be skipped. So the only uninitialized variables that
would actually have to be initialized are the ones whose values the
program is using. But any program that relies on the value of an
uninitialized variable, is a program that has - at best - indeterminate
behavior; and such a program is unlikely to be considered correct by
its programmer.

By providing initial values to the uninitialized variables that affect
a program's behavior, a C++ compiler would turn indeterminate program
behavior (under the current rules) into defined program behavior.
Granted, the defined program behavior might not be the same as the
desired behavior, but the defined behavior will bring that fact to
light immediately - whereas the indeterminate behavior may - if
experience is a guide - merely mask the problem until the program is
run on the customer's machine.

After all, the design of C owes more to the constraints of a PDP-11
development environment and 70s-era compiler technology than it does to
the pure demands of maximum efficiency. So if the reason why C++ does
not zero-initialize local variables (of fundamental type) is to
preserve compatibility with C - then that this is simply not a good
enough reason not to do it.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Sun, 12 Nov 2006 16:21:14 GMT
Raw View
We know that the behaviour of the following is undefined:

    int i,j;

    i = j;

There has been a debate over whether the behaviour of the following is
undefined:

    char unsigned i,j;

    i = j;

, reason being that an unsigned char cannot have any padding bits, and
therefore any possible bit-pattern must represent a valid value. Also,
there's debate over whether the following must evaluate to true:

    char i;

    i == i;

Before I discuss that though, I want to talk about copy-construction. When
you return-by-value from a function, a copy-construction is performed
(let's forget about construction elision for the moment). Should the
behaviour of the following be undefined?

    int GetVal()
    {
        int i;

        return i;
    }

My initial thoughts are Yes, the behaviour _is_ undefined. But how about
something like the following:

struct FileInfo {
    bool is_valid;
    char const *name;
    long unsigned size;
};

FileInfo GetFileInfo(SomeOtherType const x)
{
    FileInfo fi;

    if (CheckFileIsOK(x))
    {
        /* Fill the structure */

        return fi;
    }

    fi.is_valid = false;

    return fi;
}

If the file is not OK, then we simply set "is_valid" to false and ignore
the other fields. At first glance, this seems to me to be undefined
behaviour, but I believe others have argued that it should be OK.

With regard to the following snippet:

    char unsigned a;

    a == a;

    int i = a;

, I'm not quite sure whether the Standard _should_ necessitate that "a"
have a valid (albeit random) value.

I'm a great fan of efficiency. I don't take enough satisfaction from simply
solving a problem, I'm only pleased when the problem has been solved in the
greatest possible way I can fathom. Recently, I wrote a class which could
serve as a generic function pointer, e.g.:

    double Func1(int);  /* Defined elsewhere */
    bool Func2(char*);  /* Defined elsewhere */

    FuncPtr a = Func1, b = Func2;

    b = a;

    ((double(*)(int))b)(4);  /* Works perfectly */

In writing the class, I wanted it to behave exactly like an intrinsic type.
Therefore, I gave it a default constructor which did the bare minimum to
keep the object in a defined state. However, I didn't go so far as to allow
the following:

    FuncPtr a,b;

    a = b;

This would produce undefined behaviour because the class's code doesn't
make allowances for copying or assigning from an uninitialised FuncPtr.

So what's this got to do with anything? Well let's say we have a template
class, and we create an object of it as follows:

    MyType<char unsigned> obj;

The code for this template class may not make any allowances for assigning
or copying from an uninitialised object, e.g.:

    MyType<char unsigned> a,b;

    a = b;  /* Behaviour is undefined */

However, depending on the nature of the class, it may be designed to act as
if it were an object of its template parameter type, i.e. in this case, it
would have to act like an unsigned char.

This would cause an itsy bitsy problem if someone were to rely on the fact
that they can access an uninitialised unsigned char. Something like:

    typedef MyType<char unsigned> TypeUsedByFunction;

    void Func()
    {
        TypeUsedByFunction a,b;

        /* Whatever... */

        a = b; /* Regardless of whether it was initialised */
    }

The solution for this would be simple, just default-initialise the object:

    void Func()
    {
        TypeUsedByFunction arr1_a[1] = {},
                           &a = *arr1_a,
                           arr1_b[1] = {},
                           &b = *arr1_b;
    }

However, this might introduce overhead for POD's such as:

    struct MyPOD {
        int x,y,z;
        double a,b,c,d,e,f,g;
        char *p, *q;
        char unsigned data[64];
    };

Anyhow... what should be done? Should we just outlaw the accessing of
uninitialised intrinsic types altogether?

--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: bop@gmb.dk ("Bo Persson")
Date: Mon, 13 Nov 2006 16:52:22 GMT
Raw View
Frederick Gotham wrote:

snip, snip, snip

>
> However, this might introduce overhead for POD's such as:
>
>    struct MyPOD {
>        int x,y,z;
>        double a,b,c,d,e,f,g;
>        char *p, *q;
>        char unsigned data[64];
>    };

Anyone who does that deserved a lot of overhead. Can we specify at least a
10 s delay each time it is used?

>
> Anyhow... what should be done? Should we just outlaw the accessing
> of uninitialised intrinsic types altogether?

If possible, we should outlaw it for newly written C++ code, but allow it
for old C-style code used in benchmarks.

It all boils down to C++ having to be just as fast as C. Otherwise we will
have to defend using a slow langauge.


Bo Persson


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: brangdon@cix.co.uk (Dave Harris)
Date: Mon, 13 Nov 2006 22:04:53 GMT
Raw View
fgothamNO@SPAM.com (Frederick Gotham) wrote (abridged):
> There has been a debate over whether the behaviour of the following is
> undefined:
>
>     char unsigned i,j;
>
>     i = j;
>
> , reason being that an unsigned char cannot have any padding bits, and
> therefore any possible bit-pattern must represent a valid value.

Lack of padding bits could just mean that the undefined behaviour is
unlikely to be a crash, but that the compiler is still allowed to assume
such accesses don't happen. It could optimise the assignment away, for
example, so that i and j did not compare equal.

However, isn't it legal to memcpy almost anything as unsigned char, even
uninitialised values of other types? This is a special dispensation for
unsigned char.


> Anyhow... what should be done? Should we just outlaw the accessing of
> uninitialised intrinsic types altogether?

Unsigned char is a special type. In my view we shouldn't expect to be able
to make user-defined classes which mimic it exactly.


> If the file is not OK, then we simply set "is_valid" to false and
> ignore the other fields. At first glance, this seems to me to be
> undefined behaviour, but I believe others have argued that it should be
> OK.

I've seen real code like that crash. My struct had a double rather than a
unsigned long, and sometimes the double's uninitialised value would happen
to be the bit pattern of a signalling NaN, which would raise an unexpected
exception in the copy constructor.

It was annoying, but I believe the compiler was correct. If you want a
class like that, you need to write explicit copy constructor and
assignment operators which test is_valid and only copy the known-valid
fields.

-- Dave Harris, Nottingham, UK.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Mon, 13 Nov 2006 22:07:20 GMT
Raw View
"Bo Persson":

> Anyone who does that deserved a lot of overhead. Can we specify at least
> a 10 s delay each time it is used?


Is 5 000 000 a big number? Is Germany a big country?

You can't answer these questions unless you decide what to compare against.

If a particular algorithm runs for 68 Gs, then an increase of 10 s
constitutes a percentage increase of: 0.0000000147%. We can probably say
that this is negligible.

If a particular algorithm runs fo 183 us, then an increase of 10 s
constitutes a percentage increase of: 5464480%

There's no use in throwing arbitrary figures like "10 s" at me if you don't
give me a frame of reference.


>> Anyhow... what should be done? Should we just outlaw the accessing
>> of uninitialised intrinsic types altogether?
>
> If possible, we should outlaw it for newly written C++ code, but allow
> it for old C-style code used in benchmarks.


The term "old C-style code" doesn't make sense to me. Are you referring to
C++ code which only uses features that are common to both C and C++? If so,
then by what logic should we differentiate between "my C++ code" and "your
C++ code". Judging by the way you have worded your post, I would assume
that you pay much attention to achieving the goal in programming, and pay
little attention to how efficiently you achieved it. If somebody cares for
efficiency, then are they all of a sudden writing "old C-style code"?


> It all boils down to C++ having to be just as fast as C.


No, it boils down to efficiency. If you want to measure against C, then
fair enough.


> Otherwise we will have to defend using a slow langauge.


Of course we do. Are you arguing that we should make our code and our
languages slower and slower so that they don't resemble C?

--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: bop@gmb.dk ("Bo Persson")
Date: Mon, 13 Nov 2006 23:17:28 GMT
Raw View
Frederick Gotham wrote:
> "Bo Persson":
>
>
>>> Anyhow... what should be done? Should we just outlaw the accessing
>>> of uninitialised intrinsic types altogether?
>>
>> If possible, we should outlaw it for newly written C++ code, but
>> allow it for old C-style code used in benchmarks.
>
>
> The term "old C-style code" doesn't make sense to me. Are you
> referring to C++ code which only uses features that are common to
> both C and C++? If so, then by what logic should we differentiate
> between "my C++ code" and "your C++ code". Judging by the way you
> have worded your post, I would assume that you pay much attention
> to achieving the goal in programming, and pay little attention to
> how efficiently you achieved it. If somebody cares for efficiency,
> then are they all of a sudden writing "old C-style code"?
>
>
>> It all boils down to C++ having to be just as fast as C.
>
>
> No, it boils down to efficiency. If you want to measure against C,
> then fair enough.
>
>
>> Otherwise we will have to defend using a slow langauge.
>
>
> Of course we do. Are you arguing that we should make our code and
> our languages slower and slower so that they don't resemble C?

No, I'm arguing that the rules for allowing uninitialized data is *only* to
be compatible with C. If C++ were to specify that all ints are initialized,
someone would simply write

void bench(int i)
{
   int   x[10000000];

   x[i] = 0;
}


int main()
{
   int test = 0;

   for ( ; test < 1000000; i++)
      bench(test);

}


to "prove" that C is a million times faster than C++.


In real C++ we would use a std::vector to get away from the initialization
problem. C can't do that. To level the field, we have to keep the C rules
for C-style code.

There is no reason to use it in real programs though.


Bo Persson


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: fgothamNO@SPAM.com (Frederick Gotham)
Date: Tue, 14 Nov 2006 00:07:25 GMT
Raw View
"Bo Persson":

> No, I'm arguing that the rules for allowing uninitialized data is *only*
> to be compatible with C. If C++ were to specify that all ints are
> initialized, someone would simply write
>
> void bench(int i)
> {
>    int   x[10000000];
>
>    x[i] = 0;
> }
>
>
> int main()
> {
>    int test = 0;
>
>    for ( ; test < 1000000; i++)
>       bench(test);
>
> }
>
>
> to "prove" that C is a million times faster than C++.


And they would be right! (assuming the initialisation is redundant)


> In real C++ we would use a std::vector to get away from the
> initialization problem. C can't do that. To level the field, we have to
> keep the C rules for C-style code.


I don't desire efficiency in order to be on-par with C -- I desire
efficiency for the sake of being efficient. In writing C++ code, I aim to
surpass the efficiency I could obtain with C. (So you know, I program both
in C and in C++).

Consider the function, "realloc", both in C and C++. If it relocates the
buffer, it copies the data which was held at the previous location. On more
than one ocassion, I have used "realloc" in such a way that the copying was
redundant (and therefore inefficient). This is a bad thing.

What I am arguing is that we shouldn't throw all these redundancies into
the language making it less and less efficient.

--

Frederick Gotham

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]