Thread

Topic: Computational complexity of algorithms in the standard library

Author: Christopher Eltschka <celtschk@physik.tu-muenchen.de>
Date: 1998/06/27 Raw View

Jerry Coffin wrote:

[...]

> For example, in sorting collections, we presently have two fundamental
> abstractions: stable and unstable sorting.  Despite this, Matt Austern
> notes the presence of at least 5 sorting algorithms exposed in the
> standard library, and suggests that at least a couple more might be in
> order.
>
> With all due respect to Matt (and that's a great deal), I'd suggest
> that this is the wrong approach.  Instead of adding still more
> algorithms, with specifications on each, we should reduce the number
> of exposed algorithms to two, the number of abstractions being
> represented.
>
> From there, we leave it to the compiler to decide what algorithm will
> work best in a given situation.  The compiler generally has FAR more
> basis for choosing an algorithm than we do -- about all we can base
> decisions upon (portably) is the number of items in the collection.
> The compiler knows that, but also (at least potentially) knows things
> like how important locality of reference is on THIS particular
> machine.  Likewise, when given a really _large_ number of elements to
> sort, it might want to do a quick comparison of speed of comparison
> vs. swapping of elements.

[Note: I'm using the SGI STL reference, so I may cite algorithms
which are not in the standard STL. However my point is not touched
by this.]

The compiler may know somthing about locality of reference, but
it doesn't know anything about *your* need. Do you really need
the whole container sorted (sort, stable_sort)? Or do you just
need to know that all elements left of some position is sorted
(partial_sort)? Or do you just need the knowledge that you have
the nth element correct and all elements smaller left and all
element greater right of it (nth_element)? Or that all elements
fulfilling a given condition (say x<5) being left of those not
fulfilling that condition (partition, stable_partition)?
Or maybe all you need is to always get the biggest element
first (make_heap, push_heap, pop_heap). Maybe later you want the
complete sequence be sorted, but take advantage of the work already
being done (sort_heap). Maybe you already have a sorted sequence
and just want to sort in another already sorted range of elements
without destroying the order (inplace_merge).

This is what you tell the compiler, and this is what compilers
generally can't decide from source code.

But even giving interfaces for special sort algorithms like
insertion_sort, quick_sort, etc. (what at least the SGI STL doesn't
do) would make sense, since you might have special knowledge about
typical input data for the program.

[...]
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/06/25 Raw View

Rather than continuing the (more or less unproductive) point by point
refutation of points, I'm going to post my own observations on levels
of abstraction.  (Just as a warning, I tend to abbreviate
"implementation" as "compiler."  (I do realize that most of what's
being discussed in really in the library.)

At an _extremely_ low level of abstraction, everything necessary to
make something work, and particularly to make it efficient, comes from
the programmer.  E.g. an assembler does virtually NOTHING that isn't
specified by the programmer.

If you need to achieve absolute maximum efficiency on a particular
machine, this is the way to go: given sufficient time and effort to
put into it, a really good programmer will consistently beat even the
best compilers that we've come up with yet.

However, this has a problem: the code quickly becomes obsolete.  Even
if it continues to work on newer machines, the decisions that lead to
maximum efficiency usually end up being wrong on later machines, and
the code is no longer as fast as even a decent compiler could produce.

As we raise the level of abstraction, we give more and more control
over what happens to the compiler (or interpreter, or whatever) and
allow it to make decisions based on knowledge of its environment
rather than assuming that the programmer knows everything.

However, in the case of the complexity requirements in the standard,
we've refused to give the compiler control of things it really does
know better than we do, at least in a portable way.

For example, in sorting collections, we presently have two fundamental
abstractions: stable and unstable sorting.  Despite this, Matt Austern
notes the presence of at least 5 sorting algorithms exposed in the
standard library, and suggests that at least a couple more might be in
order.

With all due respect to Matt (and that's a great deal), I'd suggest
that this is the wrong approach.  Instead of adding still more
algorithms, with specifications on each, we should reduce the number
of exposed algorithms to two, the number of abstractions being
represented.