Thread

Topic: Intelligent subscript checking (was: The future of C++)

Author: nagle@animats.com (John Nagle)
Date: Thu, 9 Jan 2003 22:31:41 +0000 (UTC) Raw View

Ken Hagan wrote:

 > "John Nagle" <nagle@animats.com> wrote...
 >
 >>    Here's a good start: I'd like to have "assert",
 >>or something like "assert", known to the compiler,
 >>with the following semantic rule:
 >>
 >>    "An assertion can be reported as failing as soon
 >>as its failure becomes inevitable."
 >>
 >
 > I'd prefer a rule that lets a sufficiently smart compiler
 > fail the compilation if an assertion failure can be proven
 > at compile time,

      That's hard.  I've built a full-scale proof of
correctness system (see POPL '83).  It can be done, but
it's not something that's routinely done in a compiler.

      A little of that technology can help quite a
bit.  Some assertions fail at compile time, some
become true (and can be dropped) at compile time, some
get simplified or hoisted, and some are executed
as written.  This is much easier to do than trying
to prove the hard cases.  In practice, you can prove
over 90% of null/overflow/subscript assertions with minimal
analysis.  Then it gets hard.  The hard cases have to
be checked at run time.

      DEC SRL, before Compaq/HP closed them down,
did some nice work in this area for Java.  See

     http://research.compaq.com/SRC/esc

 > Microsoft's compiler has an __assume extension which generates
 > code that "trusts" assertions.

      That's so Microsoft.

      A better way to look at "assert" is that you have to
show it true at the assertion, but then can assume it true
after the assertion.  This allows using "assert" as a hint
to the optimizer without losing safety.

      All this requires that the compiler know that
"assert", or something like it, is special.

    John Nagle
    Animats


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 10 Jan 2003 18:16:45 +0000 (UTC) Raw View

scottm@toast.net ("Scott Mayo") wrote in message
news:<3e1cd189@news.toast.net>...
> "John Nagle" <nagle@animats.com> wrote in message
> news:3E1C87E8.70409@animats.com...

> >     The big problem with C is that the compiler doesn't know about
> > arrays.  I'd argue that it's time for C++ compilers to know more
> > about the STL, so that they can check arrays.

> >     Here's a good start: I'd like to have "assert", or something
> > like "assert", known to the compiler, with the following semantic
> > rule:

> >     "An assertion can be reported as failing as soon
> > as its failure becomes inevitable."
>  ...
> >   vector<int> tab(100);
> >    int j;
> >    // j gets assigned some value
> >    for (int i=0; i <  j; i++)
> >    { tab[i] = 0; } // ... code not including break, return, or
> >assignment to i }
> >The loop gets expanded into
> >    for (int i=0; i <  j; i++)
> >   {  assert(i >= 0);
> >       assert(i < tab.size());
> >       tab[i] = 0;
> >    }

> These are interesting ideas, but I don't believe they are C++.  They
> belong to C++'s offspring, to some much higher level language in which
> there is no implicit guarantee that any statement reduces to the
> minimum, obvious hardware operations.

There's no such guarantee in C++, or even in C.  In fact, most compilers
don't follow such rules.

> "Everyone knows", to pirate your example, that
>     for (i=0;i<j;++i)
> reduces to a single instruction that zeros i, another that tests i vs
> j, and another one or two that increments i and jumps to the
> test. (Yes, there are lots of possible quibbles here, but everyone
> recognises the "spirit of C" I'm getting at.)

It's been some time since I've worked on compilers, but the ones I
worked on didn't do it that way.  To begin with, if i wasn't used after
the loop, it would generally never even exist.  They'd generate
something alont the lines of:

    int* tmp1 = &tab[ 0 ] ;
    int  tmp2 = j ;
    while ( tmp2 > 0 ) {
        *tmp1 ++ = 0 ;
        -- tmp2 ;
    }

On an Intel (or the NSC chip, which had similar instructions), they'd
generate:

    mov cx,j
    xor ax,ax
    mov di,tab
    rep stosw

Your description of what "everybody knows" was false back then (15 years
ago), and I would expect it to be even more false today.

> In the language you describe, you don't know any such thing.

You don't know it about C, either.  Never did, since it wasn't true
fifteen years ago.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: scottm@toast.net ("Scott Mayo")
Date: Thu, 9 Jan 2003 05:10:56 +0000 (UTC) Raw View

"John Nagle" <nagle@animats.com> wrote in message
news:3E1C87E8.70409@animats.com...

>     The big problem with C is that the compiler doesn't
> know about arrays.  I'd argue that it's time for
> C++ compilers to know more about the STL, so that
> they can check arrays.
>
>     Here's a good start: I'd like to have "assert",
> or something like "assert", known to the compiler,
> with the following semantic rule:
>
>     "An assertion can be reported as failing as soon
> as its failure becomes inevitable."
...
>   vector<int> tab(100);
>    int j;
>    // j gets assigned some value
>    for (int i=0; i <  j; i++)
>    { tab[i] = 0; } // ... code not including break, return, or
>assignment to i }
>The loop gets expanded into
>    for (int i=0; i <  j; i++)
>   {  assert(i >= 0);
>       assert(i < tab.size());
>       tab[i] = 0;
>    }


These are interesting ideas, but I don't believe they are C++.
They belong to C++'s offspring, to some much higher level
language in which there is no implicit guarantee that any
statement reduces to the minimum, obvious hardware
operations. "Everyone knows", to pirate your example, that
    for (i=0;i<j;++i)
reduces to a single instruction that zeros i, another that tests
i vs j, and another one or two that increments i and jumps to
the test. (Yes, there are lots of possible quibbles here, but
everyone recognises the "spirit of C" I'm getting at.)

In the language you describe, you don't know any such thing.
That breaks one of the most significant contracts C makes
with programmers.

I agree that C++ has a weaker grip on that contract than C; C++
is  more tricky and sly. But this sort of thing strikes me as a whole
different (I hate to use the word) paradigm.




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: K.Hagan@thermoteknix.co.uk ("Ken Hagan")
Date: Thu, 9 Jan 2003 18:44:58 +0000 (UTC) Raw View

"John Nagle" <nagle@animats.com> wrote...
>
>     Here's a good start: I'd like to have "assert",
> or something like "assert", known to the compiler,
> with the following semantic rule:
>
>     "An assertion can be reported as failing as soon
> as its failure becomes inevitable."

I'd prefer a rule that lets a sufficiently smart compiler
fail the compilation if an assertion failure can be proven
at compile time, and lets less smart compilers assume the
truth of the assertion. Any failure would then enter the
realm of undefined behaviour, permitting a debug build to
check the assertion (with the hoisting you suggest) and
letting the release build off the hook altogether.

Yes, such a facility can be abused, most obviously by those
who use assert as a sort of trace statement, but so can the
current assert macro.

Microsoft's compiler has an __assume extension which generates
code that "trusts" assertions. Gimpel's LINT can be told to
trust assertions when examining code, which gives me the "build
failure" semantics. I find both facilities useful.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Thu, 9 Jan 2003 18:45:13 +0000 (UTC) Raw View

nagle@animats.com (John Nagle) wrote in message
news:<3E1C87E8.70409@animats.com>...
> Tom Hyer wrote:

> > For larger and more numerical tasks, you really want to trade safety
> > for speed.

>     If you need to trade safety for speed, the language design is bad.
> Safety may require a compile-time penalty, but if there's a
> substantial run-time penalty, the language and/or compiler design is
> bad.

>     The big problem with C is that the compiler doesn't know about
> arrays.  I'd argue that it's time for C++ compilers to know more about
> the STL, so that they can check arrays.

>     Here's a good start: I'd like to have "assert", or something like
> "assert", known to the compiler, with the following semantic rule:

>     "An assertion can be reported as failing as soon as its failure
> becomes inevitable."

>     This allows a crucial optimization.  Consider

>     vector<int> tab(100);
>     int j;
>     // j gets assigned some value
>     for (int i=0; i <  j; i++)
>     { tab[i] = 0; } // ... code not including break, return, or
> assignment to i }

> The loop gets expanded into

>     for (int i=0; i <  j; i++)
>     {  assert(i >= 0);
>        assert(i < tab.size());
>        tab[i] = 0;
>     }

> Right now, it's illegal for the compiler to hoist those asserts out of
> the loop, because they'd fail "early".

Right now, it is perfectly legal for the compiler to hoist those asserts
out of the loop, because doing so does not affect the observable
behavior of the program.

The first assert should disappear completely with any decent compiler.
The compiler can trivially prove that it can never occur.

The second assert can be hoisted out of the loop, and applied to j, as
long as there is no IO and no access to volatile variables in the loop.
In practice, with current compiler technology, this generally means as
long as there is no call to a non-inlined function in the loop.

In most cases, if there is a function too large to be inlined in the
loop, the time it will take will make the time required for the assert
negligible.

Finally, of course, the compiler could optimize the code to use two
loops, one with the asserts (when they might fail), and one without.

    [...]
>     If we get this right, the day will come when STL arrays will equal
> C arrays in performance.

For this to happen, the compiler will need to do some sort of analysis
to determine that the array will never change size.  Not necessarily
trivial, it a reference or a pointer to the array is passed to some
other function.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: nagle@animats.com (John Nagle)
Date: Thu, 9 Jan 2003 22:07:28 +0000 (UTC) Raw View

James Kanze wrote:

> nagle@animats.com (John Nagle) wrote in message
> news:<3E1C87E8.70409@animats.com>...
>
>>Tom Hyer wrote:

>>Right now, it's illegal for the compiler to hoist those asserts out of
>>the loop, because they'd fail "early".
>>
>
> Right now, it is perfectly legal for the compiler to hoist those asserts
> out of the loop, because doing so does not affect the observable
> behavior of the program.
....


    Unfortunately, no.  Early assertion failure does affect the
observable behavior of the program.   Whatever was supposed to
happen on the iterations of the loop before the assertion failed
won't get to happen if the assert is hoisted out of the loop
and fails before the loop is even entered.
The compiler doesn't know that assert causes termination and
is considered an error condition.


....
> Finally, of course, the compiler could optimize the code to use two
> loops, one with the asserts (when they might fail), and one without.

....

    You could do that.  But it would mean generating considerable
code that's only meaningful for programs whose behavior is
erronious, which is wasteful of space.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: nagle@animats.com (John Nagle)
Date: Thu, 9 Jan 2003 22:29:59 +0000 (UTC) Raw View

Scott Mayo wrote:

> "John Nagle" <nagle@animats.com> wrote in message
> news:3E1C87E8.70409@animats.com...

>>    "An assertion can be reported as failing as soon
>>as its failure becomes inevitable."

> These are interesting ideas, but I don't believe they are C++.
> They belong to C++'s offspring, to some much higher level
> language in which there is no implicit guarantee that any
> statement reduces to the minimum, obvious hardware
> operations. "Everyone knows", to pirate your example, that
>     for (i=0;i<j;++i)
> reduces to a single instruction that zeros i, another that tests
> i vs j, and another one or two that increments i and jumps to
> the test. (Yes, there are lots of possible quibbles here, but
> everyone recognises the "spirit of C" I'm getting at.)
>
> In the language you describe, you don't know any such thing.
> That breaks one of the most significant contracts C makes
> with programmers.


     That's not a contract the C++ standard makes.
All you're promised is correct results for correct
programs.  The steps the compiler
performs to get them is not guaranteed.  Compilers
reorder code all the time.  In the presence of errors,
reordering may produce results visible to the programmer.
This happens all the time in optimized code.  CPUs
reorder code, too.  IA-32 machines hide this quite
well, for backwards compatibility with the DOS era.
Most other superscalar CPUs don't bother, and
the state of the computation after a machine exception
may be quite strange.

     Trap on a dereference of a null
pointer, and some computation just before the dereference
may not take place.  Or some computation supposedly after the
dereference but not dependent upon it may complete normally.

     More exotic systems, like the Itanium compilers or
the SGI parallelizing compiler violate sequentiality
assumptions in even bigger ways.  Again, you're promised
correctness only for correct programs.

     It's quite possible to implement
STL collections with checked iterators, and at least
two major STL implementations offer this in debug mode.
The problem is that the performance penalty for that
checking is excessive.  That's what I'm trying to
address.

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: nagle@animats.com (John Nagle)
Date: Wed, 8 Jan 2003 20:52:15 +0000 (UTC) Raw View

Tom Hyer wrote:

> For larger and more numerical tasks, you
> really want to trade safety for speed.


    If you need to trade safety for speed, the language
design is bad.  Safety may require a compile-time
penalty, but if there's a substantial run-time penalty,
the language and/or compiler design is bad.

    The big problem with C is that the compiler doesn't
know about arrays.  I'd argue that it's time for
C++ compilers to know more about the STL, so that
they can check arrays.

    Here's a good start: I'd like to have "assert",
or something like "assert", known to the compiler,
with the following semantic rule:

    "An assertion can be reported as failing as soon
as its failure becomes inevitable."

    This allows a crucial optimization.  Consider

    vector<int> tab(100);
    int j;
    // j gets assigned some value
    for (int i=0; i <  j; i++)
    { tab[i] = 0; } // ... code not including break, return, or
assignment to i }

The loop gets expanded into

    for (int i=0; i <  j; i++)
    {  assert(i >= 0);
       assert(i < tab.size());
       tab[i] = 0;
    }

Right now, it's illegal for the compiler to hoist
those asserts out of the loop, because they'd fail
"early".  But if we allow the optimization I suggest,
we can optimize to

    assert(j >= 0);
    assert(j < tab.size());
    for (int i=0; i <  j; i++)
    {  tab[i] = 0;
    }

This requires some program verification type analysis,
but that's well-understood today.  References on
request.

Asserts have been hoisted out of the loop; they're only
computed once.  This is a huge win.  But the program
aborts before the loop is entered, not on the failing
iteration.  It's a slight incompatibility with existing
semantics. But the programs for which the incompability
is visible are just about to crash anyway.

    I'd like to see the standard allow the optimization of
such inevitable assertion failures.

    If we get this right, the day will come when STL
arrays will equal C arrays in performance.

    John Nagle
    Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]