Thread

Topic: C++ should catch frequent mistakes (was Boundschecker)

Author: hendrik@vedge.com (Hendrik Boom)
Date: Thu, 07 Oct 1993 16:34:54 GMT Raw View

rrowe@halcyon.com (Robin Rowe) writes:

: If correcting the bugs of others is your goal then you must be very
: very fortunate that their errors are limited to pointer misuse and
: not logic or other types of errors! ;-)
:
:      Robin
:

I can't imagine how you got this backwards.  Logic errors are the easiest
to find, because the symptom is related to the problem via the program
structure.  It's pointer misuse that causes problems, because it
sort of drops a bomb on a random, unrelated part of the program,
which explodes an arbitrary amount of time later.  The symptom
leaves no clue as to the origin of the bomb.

I cound understand your observations if you only ever debug tiny programs,
but somehow I doubt that. Please explain. Is there some obvious technique
I've been missing out on for the last 25 years?

 hendrik.
--
-------------------------------------------------------
Try one or more of the following addresses to reply.
at work: hendrik@vedge.com,  iros1!vedge!hendrik
at home: uunet!ozrout!topoi!hendrik

Author: monnier@hebe.nectar.cs.cmu.edu (Stefan Monnier)
Date: Mon, 11 Oct 1993 16:53:11 GMT Raw View

In article <mb3r1jINNj5b@exodus.eng.sun.com>,
David Chase <chased@rbbb.Eng.Sun.COM> wrote:
> cbarber@apricot-fddi.bbn.com (Chris Barber) writes:
> > You are right that
> >reference counting is not thread-safe as commonly implemented, but then,
> >garbage collection and memory allocation in general are single-threaded
> >tasks and impose the same performance restrictions in a multi-threaded
> >environment that reference-counting does.
>
> This is definitely not true.  In both cases, the performance
> restrictions are potentially serious, but they are definitely
> different, and I think that the opportunities for improvement are
> different as well.  In the case of reference counting, you've got to
> synchronize (or use a fetch-and-add if your memory system supports it)
> at every reference count adjustment.  For garbage collection on a
> multiprocessor, the big cost is a potential "stop-the-world" step.
> There's been ample work done (Appel,Ellis&Li and Hans Boehm are two
> that come to mind immediately) on reducing the cost of the
> stop-the-world step and offloading the collection effort onto a
> separate processor.  Most of the tricks that I've seen involve
> assistance from the VM system (marking pages as either read-protect or
> copy-on-write).

[here speaks a GC bigot who doesn't care about 20% overhead:]
Note that any GC has to synchronize with mutator(s), be it by stopping
the world or by mutexes or by VM tricks. Note also that this
synchronization is necessary for every pointer update (at least), just
like ref-count. Stop-the-world being inacceptable in most cases and VM
tricks being way too slow because of OS overhead, you end up having
exactly the same requirements for GC as for ref-count (as far as
synchronization goes). GC in a multithreaded world is tricky and
potentially very heavy. (but of course, you still get the advantage
that the GC can be designed with the compiler and you can take
advantage of it (compile-time GC, inline allocation, or ...))

Don't forget also that any GC requires (much) more memory in order to
keep an acceptable speed (which may mean that the acceptable speed is
just not reachable because of lack of memory or (better) thrashing).

 Stefan

Author: ellis@parc.xerox.com (John Ellis)
Date: Mon, 11 Oct 1993 20:57:13 GMT Raw View

Stefan Monnier writes about VM synchronization for multi-threaded GC:

    ...and VM tricks being way too slow because of OS overhead...

By "too slow", I assume you mean "too slow compared to reference
counting".

Published data indicates that in fact, VM synchronization on many of
today's platforms may be quite competitive with multi-threaded
reference counting, and it shouldn't be dismissed with the label "too
slow".  See the papers by Boehm et al., Bartlett and Yip, and Detlefs
and Zorn for results on VM-synchronized collection, the papers by
Detlefs and others on smart-pointer reference counting, and the papers
by DeTreville for measurements of a high-performance multi-threaded
reference-counting collector.  The data in these papers can't be
directly compared, but comparisons of the relative overheads of the
various techniques indicate that VM syncronization is quite
competitive.

There's also plenty of fatty overhead in the OS interfaces and
implementations that provide access to VM protection facilities.  On
many systems, that overhead could be reduced by an order of magnitude
through careful but straightforward redesign.  When mainstream vendors
start selling GC for C and C++ (next year), if a sufficient revenue
stream develops, pressure will build on OS vendors to make the VM
facilities more efficient.   As it is now, they have no rational
reason to improve those facilities.

Author: kanze@us-es.sel.de (James Kanze)
Date: 14 Oct 93 15:21:28 Raw View

In article <1993Oct11.205713.5239@parc.xerox.com> ellis@parc.xerox.com
(John Ellis) writes:

|> When mainstream vendors
|> start selling GC for C and C++ (next year), if a sufficient revenue
|> stream develops, pressure will build on OS vendors to make the VM
|> facilities more efficient.

OK, I'll bite.  Do you actually have information that a mainstream
vendor will start selling GC for C and C++, or is this just wishful
thinking on your part?
--
James Kanze                             email: kanze@us-es.sel.de
GABI Software, Sarl., 8 rue du Faisan, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                   -- Beratung in industrieller Datenverarbeitung

Author: ellis@parc.xerox.com (John Ellis)
Date: Thu, 14 Oct 1993 17:24:19 GMT Raw View

James Kanze asks:

    Do you actually have information that a mainstream vendor will
    start selling GC for C and C++, or is this just wishful thinking
    on your part?

At the OOPSLA workshop on memory management, it came out that DEC has
a GC product for C++ and C going into field test soon.   DEC is not
Microsoft or Borland, but it isn't insignificant either.

Author: rrowe@halcyon.com (Robin Rowe)
Date: 3 Oct 1993 12:20:34 -0700 Raw View

In article <28a1dm$h2s@liege.ics.uci.edu>,
Douglas C. Schmidt <schmidt@ics.uci.edu> wrote:
>
>Have you ever tried to track down a stray pointer or array problem in
>a large program that you didn't write and really don't want to spend
>weeks learning?  Yikes!

Ah, we are talking about debugging, not design. Yes, bounds checkers and
other debugging tools are very useful. But, why would we want the
overhead of those tools forced upon us in our own properly designed
and debugged code?

If correcting the bugs of others is your goal then you must be very
very fortunate that their errors are limited to pointer misuse and
not logic or other types of errors! ;-)

     Robin

Author: cbarber@apricot-fddi.bbn.com (Chris Barber)
Date: 4 Oct 93 14:30:22 Raw View

In article <mabql8INNii7@exodus.Eng.Sun.COM>
chased@rbbb.Eng.Sun.COM (David Chase) writes:

   Untrue.  C contains cycle-wasting behaviors, but you just don't know
   about them.  In particular, C-as-practiced, as opposed to
   C-as-specified, contains lots of cycle-wasting behaviors.  (NOBODY
   writes strictly conforming programs, though ANSI C is much better than
   pre-ANSI C.)

What are you talking about?

   [...]
   Another example -- C++ doesn't have garbage collection, which is reputed
   to be slower. Instead, when things get hairy, people put
   reference-counting into their constructors and destructors, which is not
   only slower than garbage collection (usually), but also not thread-safe,
   unless you wrap a mutex around the reference-count operations.

You are confused about what reference counted classes are used for.  It is
true that some garbage collectors use some form of reference counting, but
when used in the implementation a C++ class it is not for the purpose of
garbage collection but in order to save memory on copying.  For example, a
common practice is to build a reference-counted String class which allows
one to assign one String variable to another and have them both share a
common representation until such a time as one of the variables is
modified.  In other words, reference-counting is used to implement
copy-on-write semantics.  Even if C++ had a garbage collecting memory
allocator, it would not provide these semantics.  You are right that
reference counting is not thread-safe as commonly implemented, but then,
garbage collection and memory allocation in general are single-threaded
tasks and impose the same performance restrictions in a multi-threaded
environment that reference-counting does.

   That's ok, you didn't want your code to run 8 times faster on an
   8-processor machine, anyway (but Fortran can).

I didn't know that Fortran had thread-safe garbage collection :-)  In
any case, there are very few programs that can actually gain an N-factor
speedup in the presence of N processors, no matter what the language.

--
Christopher Barber
(cbarber@bbn.com)

Author: chased@rbbb.Eng.Sun.COM (David Chase)
Date: 5 Oct 1993 21:50:43 GMT Raw View

cbarber@apricot-fddi.bbn.com (Chris Barber) writes:
> I wrote:

>   Untrue.  C contains cycle-wasting behaviors, but you just don't know
>   about them.  In particular, C-as-practiced, as opposed to
>   C-as-specified, contains lots of cycle-wasting behaviors.  (NOBODY
>   writes strictly conforming programs, though ANSI C is much better than
>   pre-ANSI C.)

>What are you talking about?

In general, aliasing.  C's confusion between pointers and arrays, and
widely ignored type-casting rules, force optimizers to be very
conservative when scheduling C code.  There's also the occasional
problem with people using "signals for semantics" (the Bourne Shell,
at least one version of it, is a notable offender here) in which a
segmentation violation actually has meaning to the program.  Most C
programmers (and C texts) don't pay much attention to whether the code
that they write is easy to schedule or not.

Now, IF you could assume strictly conforming ANSI C, you could exploit
other rules to make the aliasing approximation less conservative.  IF
you could assume strictly conforming ANSI C, there are various loop
optimizations that you could do (e.g., if you assume no integer
overflow and no wraparound).

>You are confused about what reference counted classes are used for.  It is
>true that some garbage collectors use some form of reference counting, but
>when used in the implementation a C++ class it is not for the purpose of
>garbage collection but in order to save memory on copying.

My error.  Still, maintaining the reference counts is not cheap.

> You are right that
>reference counting is not thread-safe as commonly implemented, but then,
>garbage collection and memory allocation in general are single-threaded
>tasks and impose the same performance restrictions in a multi-threaded
>environment that reference-counting does.

This is definitely not true.  In both cases, the performance
restrictions are potentially serious, but they are definitely
different, and I think that the opportunities for improvement are
different as well.  In the case of reference counting, you've got to
synchronize (or use a fetch-and-add if your memory system supports it)
at every reference count adjustment.  For garbage collection on a
multiprocessor, the big cost is a potential "stop-the-world" step.
There's been ample work done (Appel,Ellis&Li and Hans Boehm are two
that come to mind immediately) on reducing the cost of the
stop-the-world step and offloading the collection effort onto a
separate processor.  Most of the tricks that I've seen involve
assistance from the VM system (marking pages as either read-protect or
copy-on-write).

>   That's ok, you didn't want your code to run 8 times faster on an
>   8-processor machine, anyway (but Fortran can).

>I didn't know that Fortran had thread-safe garbage collection :-)  In
>any case, there are very few programs that can actually gain an N-factor
>speedup in the presence of N processors, no matter what the language.

True enough about Fortran and garbage collection (though Fortran-90
holds its pointer representation close enough to permit it.  In
theory, but not always in practice, F-77 programs can coexist with a
garbage collector.  See remarks above about conforming to standards
:-).  There are enough programs that do get linear or superlinear
speedup (cache effects can give you this) to make multiprocessors
interesting, and it happens that some of these programs solve
important problems (typically, simulation of physical things, like oil
reservoirs, car crashes, weather, aerodynamics, molecules, and heat
flow).  (Nonetheless, people do write programs in C++ to solve these
problems on multiprocessors.)

Remember that I was replying to a "C++ doesn't have these features
because they are slow" flame, and my intention is to point out that in
fact, there are all sorts of ways that C++ is not fast that people are
either ignorant of or comfortable with (or both).

Furthermore, people (C programmers) seem to be quite good at imagining
bad implementations of the features that they don't want to see in
C++.  In particular, when people think of things like boundschecking,
they imagine only the most naive and clunky implementation of it, and
not what they could actually get if someone made a serious attempt at
implementing it well (and IF it were in C++, people would make a
serious attempt at implementing it well).  Similarly, when someone
says "garbage collection" to your average C programmer, they probably
don't imagine something like the Appel-Ellis-Li or Boehm-Weiser or
Bartlett or Ungar collectors, all of which incorporate some very
interesting tricks to reduce overhead, or latency, or both.  There's
probably a good book worth of language implementation tricks (maybe I
should write it) that people are ignorant of.  Some of them are
compiler optimizations, some of them are clever run-time data
structures, and some of them are both.  One thing that C (and C++, to
some extent) do well with their "access to the raw bits" approach to
programming is to deny the programmer access to many of these clever
implementation tricks.

David Chase
Sun (speaking for myself)

Author: schmidt@liege.ics.uci.edu (Douglas C. Schmidt)
Date: 28 Sep 1993 11:56:53 -0700 Raw View

In article <28790r$4er@nwfocus.wa.com>, Robin Rowe <rrowe@halcyon.com> wrote:
++  1. Array indices straying out of bounds.
++  2. Stray or null pointers.
++
++ How serious a problem are these really? With array bounds using an
++ asymmetrical boundary test will avoid most problems (i.e., always use
++ the < sign to compare on the upper bound). Assert() will protect against
++ problems in more complex situations. If pointers are always set to zero
++ when not actually pointing to something and (where reasonable) tested
++ for zero before use then most bugs there will be avoided.

I agree with your point to a certain extent.  For example, when
writing my own code, I program defensively and rarely have pointer or
array problems occur.  However, It has been my experience that these
types of pointer problems occur most frequently when maintaining
and/or enhancing code that was written by someone else.  Many moons
ago, I spent a lot of time installing and debugging GNU tools on
various platforms.  Invariably, these kinds of problems lurked deep
within the bowels of some relative large, complex application (e.g.,
GNU tar).  Problems like this would manifest themselves in odd ways,
depending on the platform.  If I was lucky, a segmentation fault would
occur.  If I was unlucky, the program would simple behave incorrectly.

Have you ever tried to track down a stray pointer or array problem in
a large program that you didn't write and really don't want to spend
weeks learning?  Yikes!  Fortunately, we had an evaluation copy of
Saber C (now Centerline C) on site.  Within about 10 minutes of using
Saber I had located the sources of the problems and fixed them.  That
experience taught me a great deal of respect for
environments/tools/languages that will detect these subtle types of
bugs automatically.

 Doug
--
His life was gentle, and the elements so            | Douglas C. Schmidt
Mixed in him that nature might stand up             | schmidt@ics.uci.edu
And say to all the world: "This was a man."         | ucivax!schmidt
   -- In loving memory of Terry Williams (1971-1991)| (714) 856-4101

Author: ellis@parc.xerox.com (John Ellis)
Date: Wed, 29 Sep 1993 19:09:34 GMT Raw View

Ed Osinski writes:

    ...I don't think that the language definition precludes
    implementations from doing complete run-time checking of array
    references, etc.  It's just that few or no implementors have done
    so.

The language definition does allow implementations to check array
references, and some environments like Centerline do that checking.

Unfortunately, the semantics of C/C++ arrays make that checking very
expensive, since pointer arithmetic and/or dereferences must be
checked as well as subscripting.  Checking the use of pointers
requires either doubling their size to carry a descriptor, or storing
information on the side.  Doubling the size of pointers introduces
severe compatibility problems when dealing with code that's been
compiled without the safety checks or compiled by another vendor's
product.

Storing array-bounds information on the side retains representational
compatibility but makes pointer arithmetic and/or dereferencing very
expensive.  For example, Purify slows down programs by factors of two
to four or more, and Purify doesn't even check references to local and
static arrays.

While such development tools can be quite useful, their performance
overhead prevents them being used in situations were execution is
important, e.g. in alpha- and beta-testing, in embedded systems,
graphics and CAD, etc.  Such situations are often where the nastiest
bugs arise.

Languages that enable cheap, safe bounds checking give programmers
more flexibility in deciding when to turn off the safety checks, i.e.
to trade safety for performance.  Some product teams might keep bounds
checking on through beta testing and turn it off only in the released
versions.  Other product teams might value safety very highly and
decide to keep bounds checking enabled in the released version; then
the program will fail-stop rather than continue execution when it
encounters an array bug (many mission-critical Ada systems are shipped
with checking enabled).

This is one area where other languages actually give programmers more
flexibility than C/C++.

Ed Osinski also writes:

    In effect, it seems to be impossible to define an Array class that
    has the same capabilities as a built-in array and yet is
    completely "safe" in that out-of-bounds references and similar
    errors are guaranteed to be caught at run-time.

See the appendix of "Safe, Efficient Garbage Collection for C++", by
Dave Detlefs and myself (ftp from parcftp.xerox.com:/pub/ellis/gc).

We defined three template classes that provide the same capabilities
as built-in arrays but also provide optional bounds checking, ensuring
complete safety:

    An Array<T, n> is an array of n elements of type T.

    A DynArray<T> is a heap-allocated array whose size is chosen when
    it is created.

    A SubArray<T> references a contiguous sub-sequence of elements in
    another Array, DynArray, or SubArray.

These classes are intended to provide the same functionality and
time/space performance as built-in arrays; they are not meant to be
replacements for the higher-level array abstractions that people often
build.  Since the classes just wrap option bounds-checking around
built-in arrays, any compiler that does good inlining will yield
performance matching that of built-in arrays.  There are a few
carefully chosen implicit conversions between these to make the common
cases syntactically concise.

Here's a standard C++ fragment and its equivalent with the safe
arrays:

----------------------------------------------------
built-in arrays:                    safe arrays:

int a[N];                           Array<int, N> a;
f(a, N);                            f(a);
f(&a[i], 3);                        f(a.Sub(i, 3));
...                                 ...
void f(int a[], int n) {            void f(SubArray<int> a) {
    for (int i = 0; i < n; i++) {       for (int i = 0; i < a.Number(); i++) {
        ...a[i]...}}                        ...a[i]...}}
----------------------------------------------------

The safe array classes also provide aggregate operations such as element-wise
copying and comparing.

There are two criticisms that can be made about the safe array
classes:

    1. You have to change existing code to use them.

    2. You don't get that last bit of syntactic conciseness with
    pointer arithmetic.  (With good compilers, there is no performance
    advantage to using pointer arithmetic instead of subscripting;
    some compilers even do worse with pointer arithmetic.)  On the
    other hand, the class's aggregate operations provide more concise
    replacements for the most common short, idiomatic uses of pointer
    arithmetic.

Changing existing code is a serious problem.  But having to use
subscript notation instead of pointer arithmetic is a trivial price
most experienced programmers would willingly pay to get safety.

Compared to development tools like Centerline and Purify, the safe
array classes are cheap enough to be used through all stages of
development and testing and even in release (programmers choose when
they want to turn off the safety checks).  Since they are just code
wrappers for the built-in arrays, there are no problems with
representational compatibility with other code.

If you have detailed questions about these classes, I recommend you
read our paper first.

Author: kanze@us-es.sel.de (James Kanze)
Date: 30 Sep 93 18:49:11 Raw View

In article <287na2$ca6@slinky.cs.nyu.edu> osinski@panini.cs.nyu.edu
(Ed Osinski) writes:

|> In article <27skieINNarf@grumpy.symantec.com>, Kostya Vasilyev <Kostya@Symantec.com> writes:
|> |> In article <CDst5w.CKw@hatch.socal.com> Brendan Jones, bj@hatch.socal.com
|> |> writes:
|> |> >There must be tens of thousands of people who have wasted hundreds
|> |> >of hours each hunting these type of bugs down.  Think of all the
|> |> >wasted time that adds up to!  It's about time the C++ language
|> |> >specification was amended so these, the most common bugs, can
|> |> >be trapped.

|> |> >C and C++'s power is its flexibility....

|> |> Exactly, it has power and flexibilty for anyone to impelement what you're
|> |> asking for without modifying the language.

|> |> It's very easy to implement bounds checking in arrays in C++. Just
|> |> implement arrays as classes, define operator[], and add any checking
|> |> code you like!  This requires no changes to the C++ language, and you
|> |> have complete flexibility and independence as to what kind of checking
|> |> is performed, how errors are reported, when it is turned off, etc.

|> I beg to differ.  This is far easier said than done.  Remember that you can do
|> all sorts of things to raw arrays that are hard to duplicate in a class.  For
|> example, you take the address of an array element and treat it is a pointer to
|> a smaller portion of the original array:

|>  int a[10];
|>  int *p = &a[4];  // p points to a 6-element sub-array

If this feature is an essential part of your code, I would suggest
checking out the array classes in the Rogue Wave library.

Although I have not needed this feature, and have not actually used
the library, I have seen a description of the interface, and it would
seem to offer all of the flexibility you want, and more.
--
James Kanze                             email: kanze@us-es.sel.de
GABI Software, Sarl., 8 rue du Faisan, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                   -- Beratung in industrieller Datenverarbeitung

Author: Kostya Vasilyev <Kostya@Symantec.com>
Date: 23 Sep 1993 16:57:50 GMT Raw View

In article <CDst5w.CKw@hatch.socal.com> Brendan Jones, bj@hatch.socal.com
writes:
>There must be tens of thousands of people who have wasted hundreds
>of hours each hunting these type of bugs down.  Think of all the
>wasted time that adds up to!  It's about time the C++ language
>specification was amended so these, the most common bugs, can
>be trapped.

>C and C++'s power is its flexibility....

Exactly, it has power and flexibilty for anyone to impelement what you're
asking for without modifying the language.

It's very easy to implement bounds checking in arrays in C++. Just
implement arrays as classes, define operator[], and add any checking
code you like!  This requires no changes to the C++ language, and you
have complete flexibility and independence as to what kind of checking
is performed, how errors are reported, when it is turned off, etc.

--------------------------------------------------------------------------
Kostya Vasilyev, SYMANTEC Corporation, Bedrock group
Cytomax Junkie  10201 Torre Avenue
   Cupertino, CA 95014
   (408) 446-7165
   eMail: Kostya_Vasilyev_at_SYMCU-DEV@Symantec.com
--------------------------------------------------------------------------

Author: frampton@vicuna.ocunix.on.ca (Steve Frampton)
Date: 24 Sep 93 00:17:45 GMT Raw View

elan@tasha.cheme.cornell.edu (Elan Feingold) writes:

> I am finishing up a memory checker library that writes a full log on memory
> usage, checks for invalid frees, and can detect an illegal memory access that
> ran beyond (either above or below) an memory region.  The code using the
> debug library runs slower, but it is very much worth it as it catches a lot
> of bugs.  All I have to do is take out the -DDEBUG and everything disappears
> and is reduced to malloc()/free().

Sounds great...are you going to release it onto the net as shareware?

----< Voting is now in progress for soc.couples.intercultural >----
Steve Frampton               Home:   <frampton@vicuna.ocunix.on.ca>
Kingston, Ontario            School: <2843600@jeff-lab.queensu.ca>

Author: leech@cs.unc.edu (Jon Leech)
Date: 24 Sep 1993 00:17:49 GMT Raw View

In article <27skieINNarf@grumpy.symantec.com>, Kostya Vasilyev <Kostya@Symantec.com> writes:
|> >C and C++'s power is its flexibility....
|>
|> Exactly, it has power and flexibilty for anyone to impelement what you're
|> asking for without modifying the language.

    Assuming you never use builtin C pointer and array types outside your
library classes, sure. This is not realistic in most cases, and still leaves
the library code open to failure, although that's a smaller problem.

    Fortunately, runtime detection of these problems does not require
amending the language, but rather instrumenting the runtime code. Purify is
one good example of this. There's no reason compilers couldn't generate such
runtime checks without modifying the definition of the language, and they
could likely do a much better and faster job than postprocessing approaches
like Purify which have no knowledge of the source code. This would be used
mostly in development, of course, and turned off when speed was an issue.

    C++ took a tiny related step in this direction when the array delete
syntax was changed to require the compiler to keep track of how many items
were in an array, rather than the user. I don't think we should require the
availability of this sort of runtime checking in the language, but most
people would welcome the option.

    Jon
    __@/

Author: osinski@lang9.cs.nyu.edu (Ed Osinski)
Date: 24 Sep 1993 20:29:05 GMT Raw View

In article <CDst5w.CKw@hatch.socal.com>, bj@hatch.socal.com (Brendan Jones) writes:
|>
|> The easiest (and I am sure most frequent) programming errors in
|> C++ and C programs would have to be:
|>
|>   1. array indexes straying out of bounds
|>   2. stray or null pointers
|>
|> There must be tens of thousands of people who have wasted hundreds
|> of hours each hunting these type of bugs down.  Think of all the
|> wasted time that adds up to!  It's about time the C++ language
|> specification was amended so these, the most common bugs, can
|> be trapped.

Maybe I'm being dense, but how does the specification prevent these from being
trapped?  I thought that all it said was that the result of such events was
undefined, which meant that the implementor was free to trap them and print a
helpful error message or even something more if he so desired.  That's my
understanding of the meaning of *undefined*.  Of course, I don't know if any
implementations do this kind of thing; certainly most of the big names don't
(eg. Borland, Microsoft, cfront, gnu), but this is independent of the language
specification.

|>  [ details omitted ]
|>
|> C and C++'s power is its flexibility, but that shouldn't stop it
|> from offering a few safety nets should we ask to work with them.

Agreed, but as I said above, I don't think the language spec. is the culprit
here.

|>
|> nuff said,
|> Brendan Jones.

--
---------------------------------------------------------------------
 Ed Osinski                  |
 Computer Science Department | "I hope life isn't a big joke,
 New York University         |  because I don't get it."
 E-mail:  osinski@cs.nyu.edu |                           Jack Handey
---------------------------------------------------------------------

Author: timur@seas.gwu.edu (Timur Tabi)
Date: Sat, 25 Sep 1993 04:07:50 GMT Raw View

In article <CDst5w.CKw@hatch.socal.com>,
Brendan Jones <bj@hatch.socal.com> wrote:
>The easiest (and I am sure most frequent) programming errors in
>C++ and C programs would have to be:
>
>  1. array indexes straying out of bounds
>  2. stray or null pointers
>
>There must be tens of thousands of people who have wasted hundreds
>of hours each hunting these type of bugs down.  Think of all the
>wasted time that adds up to!  It's about time the C++ language
>specification was amended so these, the most common bugs, can
>be trapped.

The C language was designed for people who know what they're doing
when they program.  It doesn't check array bounds because that
takes time.  It is the responsibility of the programmer to make
sure that his code is correct.  Advanced debuggers, linters,
and similar tools can be used to track down these kinds of bugs.

Don't flame C for being exactly what it is designed to be.  If you
don't like it, then go ahead and use Pascal and watch your software
run at half speed.  In the meantime, real programmers (like me :-)
will be happy using a computer language that doesn't restrict us
unnecessarily.

>C and C++'s power is its flexibility, but that shouldn't stop it
>from offering a few safety nets should we ask to work with them.

Efficiency is more C/C++'s strong point than flexibility.  Any
time a certain behavior (like array bound checking) would result
in a possibile waste of CPU cycles, it is not included.

C is not for the timid.

--
------------------------------------------------------------------ Timur Tabi
Contributing Editor for "OS/2 Monthly"        Internet:    timur@seas.gwu.edu
Maintainer of the DOS-OS/2 Games List         Fidonet: Timur Tabi @ 1:109/347
                                              Bitnet:            if402c@gwuvm

Author: chased@rbbb.Eng.Sun.COM (David Chase)
Date: 26 Sep 1993 19:16:56 GMT Raw View

timur@seas.gwu.edu (Timur Tabi) writes:
>The C language was designed for people who know what they're doing
>when they program.  It doesn't check array bounds because that
>takes time.  It is the responsibility of the programmer to make
>sure that his code is correct.  Advanced debuggers, linters,
>and similar tools can be used to track down these kinds of bugs.

The C language was designed for people who THINK they know what
they're doing when they program.  It doesn't check array bounds
because these people are ignorant of years of work in optimizers that
removes and reduces those costs.  For reasons unfathomable to me,
these people prefer to ship a product late, buggy, and missing
features, instead of 10% slower, if that.

>Don't flame C for being exactly what it is designed to be.  If you
>don't like it, then go ahead and use Pascal and watch your software
>run at half speed.

Untrue.  Neither language wins all the benchmarks, and I've seen
situations where Modula-2 was twice as fast as C (C run through an
optimizer, versus a simple Modula-2 compiler).  Final efficiency of
the generated code often depends very much on the optimizer, and the
optimizer often depends upon things that "real programmers" don't seem
to understand.  If you REALLY want your code to go portably fast, you
will use Fortran.  It's easier for the optimizer to improve, and even
more macho than C (what, me read between the lines?)

>In the meantime, real programmers (like me :-) will be happy using a
>computer language that doesn't restrict us unnecessarily.

Suit yourself -- it's your life and your time.  I can hack in C and
C++ as well as the rest of the world, but I know full well what I'm
missing, because I've used other languages with those features.
People I know who've gone from Modula-2+, Modula-3, or Cedar Mesa say
it's not that bad -- coding in C or C++ only takes twice as long as it
ought to (yeah, I know, all my friends are incompetent boobs who don't
deserve to be let anywhere near a computer.  So am I.)

>Efficiency is more C/C++'s strong point than flexibility.  Any
>time a certain behavior (like array bound checking) would result
>in a possibile waste of CPU cycles, it is not included.

Untrue.  C contains cycle-wasting behaviors, but you just don't know
about them.  In particular, C-as-practiced, as opposed to
C-as-specified, contains lots of cycle-wasting behaviors.  (NOBODY
writes strictly conforming programs, though ANSI C is much better than
pre-ANSI C.)  Furthermore, C++ as practiced is shot through with
cycle-wasting behaviors, and you haven't got a prayer of finding them
in your source code.  (I'm speaking as a code generator/optimizer
writer, programming in C++.)  One simple example -- "const" might be a
real help to compiler writers, IF there wasn't a non-trivial (i.e.,
non-ignorable) minority of the programmers who cast const into
non-const and then modify it.  Another example -- C++ doesn't have
garbage collection, which is reputed to be slower. Instead, when
things get hairy, people put reference-counting into their
constructors and destructors, which is not only slower than garbage
collection (usually), but also not thread-safe, unless you wrap a
mutex around the reference-count operations.  That's ok, you didn't
want your code to run 8 times faster on an 8-processor machine, anyway
(but Fortran can).

You may think C++ is fast because it has a "C" in its name and you
think C is fast, but it's not.

>C is not for the timid.

Fools rush in where angels fear to tread.

David Chase
Sun (speaking for myself, and in an exceptionally bad mood about C++
     and popular misunderstandings about programming languages.)

Author: mkohtala@lk-hp-11.hut.fi (Marko Kohtala)
Date: 26 Sep 93 20:13:15 GMT Raw View

In <mabql8INNii7@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:

>The C language was designed for people who THINK they know what
>they're doing when they program.  It doesn't check array bounds
>because these people are ignorant of years of work in optimizers that
>removes and reduces those costs.

Although well said, I do think there is a place for a portable
assembler language like C.

>Untrue.  Neither language wins all the benchmarks, and I've seen
>situations where Modula-2 was twice as fast as C (C run through an
>optimizer, versus a simple Modula-2 compiler).

Modula-2 (which I do not know myself) might give freedom to the
compiler select algorithms for things that the C programmer must have
implemented himself. The programmer often is not as good software
designer than a compiler designer is.

I agree, C and C++ have many drawbacks when used on programming tasks
that do not require very low level control on how things are done.
Programmers are faced with many issues unimportant to the task they
try to complish and novice programmers do not even know how to scope
with those issues.

Perhaps one should use another programming language for those tasks
instead of smashing an existing language which is good for many other
tasks as it is?
--
---
Marko.Kohtala@hut.fi, Marko.Kohtala@compart.fi, Marko.Kohtala@ntc.nokia.com
Student at (not representative of) the Helsinki University of Technology
(This is an information virus: if you know of it, you are infected.)

Author: cat@wixer.bga.com (Dr. Cat)
Date: Mon, 27 Sep 1993 08:13:48 GMT Raw View

Before this latest language flamefest goes too far, I'd like to just toss out
what my personal experience has shown me a language MUST have in order to be
usable for making games with:

   1) There must be a working version of it installed on your system.
   2) You must know how to program in the language.

Experience has shown that pretty much any language that meets both of those
criteria can be used to make games, and has been.  Further, careful
scientific study has shown that no amount of time spent pondering what
language to use will ever lead to a brand new game spontaneously appearing on
your hard drive, nor will any amount of arguing with other people about
languages.  Indeed, these arguing and pondering activities do not seem to
produce even small fragments of game source code.

        Dr. Cat

Author: rrowe@halcyon.com (Robin Rowe)
Date: 27 Sep 1993 10:48:11 -0700 Raw View

You mention two problems as being your concern:

 1. Array indices straying out of bounds.
 2. Stray or null pointers.

How serious a problem are these really? With array bounds using an
asymmetrical boundary test will avoid most problems (i.e., always use
the < sign to compare on the upper bound). Assert() will protect against
problems in more complex situations. If pointers are always set to zero
when not actually pointing to something and (where reasonable) tested
for zero before use then most bugs there will be avoided.

Maybe if you gave some specific examples of the bugs you have
experienced, then ways to avoid them could be discussed. This would
make for practical solutions now, and perhaps even point out how the
language might be improved by building in some of the techniques in
the future.

     Robin

Author: osinski@panini.cs.nyu.edu (Ed Osinski)
Date: 27 Sep 93 21:52:02 GMT Raw View

In article <27skieINNarf@grumpy.symantec.com>, Kostya Vasilyev <Kostya@Symantec.com> writes:
|> In article <CDst5w.CKw@hatch.socal.com> Brendan Jones, bj@hatch.socal.com
|> writes:
|> >There must be tens of thousands of people who have wasted hundreds
|> >of hours each hunting these type of bugs down.  Think of all the
|> >wasted time that adds up to!  It's about time the C++ language
|> >specification was amended so these, the most common bugs, can
|> >be trapped.
|>
|> >C and C++'s power is its flexibility....
|>
|> Exactly, it has power and flexibilty for anyone to impelement what you're
|> asking for without modifying the language.
|>
|> It's very easy to implement bounds checking in arrays in C++. Just
|> implement arrays as classes, define operator[], and add any checking
|> code you like!  This requires no changes to the C++ language, and you
|> have complete flexibility and independence as to what kind of checking
|> is performed, how errors are reported, when it is turned off, etc.

I beg to differ.  This is far easier said than done.  Remember that you can do
all sorts of things to raw arrays that are hard to duplicate in a class.  For
example, you take the address of an array element and treat it is a pointer to
a smaller portion of the original array:

 int a[10];
 int *p = &a[4];  // p points to a 6-element sub-array

In order to support this with safety checking, the subscript operator for an
Array<T> cannot return a T&, but an object of some helper class ElemRef<T>, and
this would have to be rewritten as:

 Array<int> a(10);
 ElemRef<int> p = &a[4];

If nothing else, this would require many changes to the code to achieve
run-time checking.  Even without further problems, this seems a high price to
pay to check for what many consider simple run-time checking that exists in
other languages.  I understand Ada is an example.

Now the question of how to pass an element of an array to a function expecting
a reference arises.

The original code would look like this:

 void inc (int& i)
 {
    i++;
 }

I don't see how this code could easily be rewritten to use a completely safe
Array class.  A conversion to (or operation returning) int& leaves the door
open to an assignment to an unchecked int pointer.  On the other hand, not
having such an operation/conversion means that the int cannot be used as an
lvalue in this case, I think.  In effect, it seems to be impossible to define
an Array class that has the same capabilities as a built-in array and yet is
completely "safe" in that out-of-bounds references and similar errors are
guaranteed to be caught at run-time.

However, as I said in a previous post, I don't think that the language
definition precludes implementations from doing complete run-time checking of
array references, etc.  It's just that few or no implementors have done so.

|>
|> --------------------------------------------------------------------------
|> Kostya Vasilyev, SYMANTEC Corporation, Bedrock group
|> Cytomax Junkie  10201 Torre Avenue
|>    Cupertino, CA 95014
|>    (408) 446-7165
|>    eMail: Kostya_Vasilyev_at_SYMCU-DEV@Symantec.com
|> --------------------------------------------------------------------------

--
---------------------------------------------------------------------
 Ed Osinski                  |
 Computer Science Department | "I hope life isn't a big joke,
 New York University         |  because I don't get it."
 E-mail:  osinski@cs.nyu.edu |                           Jack Handey
---------------------------------------------------------------------

Author: bj@hatch.socal.com (Brendan Jones)
Date: Thu, 23 Sep 1993 08:34:43 GMT Raw View

In article <27n5i3$il7@charm.magnus.acs.ohio-state.edu> esova@magnus.acs.ohio-state.edu (Edward R Sova) writes:
>>The UNIX world has a great utility called PURIFY which is designed for just
>>this problem. You link it in to your code and it watches your memory
>>usage - using memory you have not allocated, reading memory you have
>>not written to, allocating memory but not freeing it (memory leaks).
>>
>>I don't think there is a DOS version - anyone know if there is a good
>>alternative?
>>
>I think that brings us back to BoundsChecker2.0

The easiest (and I am sure most frequent) programming errors in
C++ and C programs would have to be:

  1. array indexes straying out of bounds
  2. stray or null pointers

There must be tens of thousands of people who have wasted hundreds
of hours each hunting these type of bugs down.  Think of all the
wasted time that adds up to!  It's about time the C++ language
specification was amended so these, the most common bugs, can
be trapped.

This is particularly sad with C++ on the Intel platforms, for the
80186 and better have a BOUND instruction which can quickly check
an array subscript against its bounds.  This sort of mechanism
should be implemented against arrays, and a similar mechanism
against dynamically allocated arrays too.  (Imagine how much
time would have been saved if this was in the specifications
from the outset.  I know Pascal does this, and many people think
Pascal goes too far and ends up binding you with red tape, but
runtime checking on arrays isn't a bad idea!)

As for stray pointers, I'm open to ideas here.  On option could be
to force the compiler to initialise pointers as 0 or some impossible
magic number, and check that a pointer is not this magic number
before using it.  Of course, you'd want to be able to turn this
option off for "fully debugged" code.  Other options, such as
associating type information with allocated memory blocks could
be considered.

Magical programs that hide and try and spot bad memory references
are fighting the symptoms.  They aren't going to the heart of the
problem.

C and C++'s power is its flexibility, but that shouldn't stop it
from offering a few safety nets should we ask to work with them.

nuff said,
Brendan Jones.

Author: elan@tasha.cheme.cornell.edu (Elan Feingold)
Date: 23 Sep 1993 16:07:19 GMT Raw View



> C and C++'s power is its flexibility, but that shouldn't stop it
> from offering a few safety nets should we ask to work with them.
>
> nuff said,
> Brendan Jones.

Agreed!  The lcc compiler has a "check for NULL pointer dereferencing" option.
The point is, even if the checks are made using software techniques, and not
hardware, and the resulting code runs 10 times slower, the key point is to
have two versions of a program -- the debugging version, and the commercial
version.

I am finishing up a memory checker library that writes a full log on memory
usage, checks for invalid frees, and can detect an illegal memory access that
ran beyond (either above or below) an memory region.  The code using the
debug library runs slower, but it is very much worth it as it catches a lot
of bugs.  All I have to do is take out the -DDEBUG and everything disappears
and is reduced to malloc()/free().

Elan

--
---------------------------------------------------------------------------
|  Elan Feingold       |                                       |
|  CS/EE Depts.        |                          |
|  Cornell University  |     ( .sig currently under construction )     |
|  Ithaca NY 14850     |                        |
---------------------------------------------------------------------------

Author: David James Alexander Hanley <U34465@uicvm.uic.edu>
Date: Thu, 23 Sep 1993 12:10:53 CDT Raw View

  In c++ you can define your own array type, and include bounds checking.
Then, when you have the final version, simply remove the bounds checking
from the abstract array type, or change the refrences to regular arrays,
and you're home free!

  To clarify, you can overload the [] operator so the code can look like this

ARRAY <int> beasts(10);
.....
beasts[5]=beasts[2];
int *i = &beasts[1];

and it all works.

dave