Topic: STL and sorted disk based list


Author: John Max Skaller <maxtal@suphys.physics.su.oz.au>
Date: 1995/11/27
Raw View
Robert Buren <erarobu@era-t.ericsson.se> wrote:
>(This is mailed and posted to comp.std.c++)
>
>I agree with everything you say here. However, I would really
>appreciate it if you could expand on the quote below:
>
>John Max Skaller wrote:
>>
>>   For example, how would you do a graphics library
>> after seeing STL? [The same way STL is done]. How about
>> doing a matrix library with optimisation opportunities
>> for parallel execution? [The same way as STL]
>
>Could you give just a tiny example of what you mean here?
>I think STL is great, but I feel that I can still only imagine
>STL as a set of container utilities. How would you make an
>STL-like graphics library?

One could do several things. For example, consider an image
as an 2 dimensional array of pixels. What kinds of navigations
are required for two dimensions?

For image processing, perhaps the notion of "iterator"
is replaced by a notion of "sampler", which takes a
weighted average near a point.

Exactly how to do this would require study of the
application domain, and the existing algorithms
and data structures.


--
John Max Skaller               voice: 61-2-566-2189
81 Glebe Point Rd              fax:   61-2-660-0850
GLEBE NSW 2037                 email: maxtal@suphys.physics.oz.au
AUSTRALIA                      email: skaller@maxtal.com.au



[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]






Author: John Max Skaller <maxtal@suphys.physics.su.oz.au>
Date: 1995/11/19
Raw View
ralphs@tesser.com (Ralph Shnelvar) wrote:
>
>I haven't even peeked at the STL but am wondering if there is support
>for a disk based sorting and/or indexing algorithms.
>
>That is, is there support for sorting monster arrays.

YES. STL sort is a completely generic algorithm. It can
sort "anything".

However, STL does not come pre-packaged with a
"Monster_Disk_Array" container. but you can write
one yourself, and if you follow the requirements
for such containers, STL sort will sort it,
reverse it, permute it or copy it.

[Note some algorithms might require some local storage.
It _should_ be documented how much if the amount
is not fixed. if it isn't we'd better complain
because programmers need to know about space
as well as time requirements]


--
John Max Skaller               voice: 61-2-566-2189
81 Glebe Point Rd              fax:   61-2-660-0850
GLEBE NSW 2037                 email: maxtal@suphys.physics.oz.au
AUSTRALIA                      email: skaller@maxtal.com.au
---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: John Max Skaller <maxtal@suphys.physics.su.oz.au>
Date: 1995/11/21
Raw View
matt@cyclops5.berkeley.edu (Matt Austern) wrote:
>
>The good news, though, is that the STL isn't a container class library
>in any conventional sense; it isn't even a container, iterator, and
>algorithm library either, although that comes closer.  What it really
>is is a set of requirements that containers, iterators, and algorithms
>must satisfy if they are to work together, and a fairly large but
>deliberately incomplete set of predefined containers, iterators, and
>algorithms.

   I agree STRONGLY with this characterisation. In fact,
I'd venture to do one step further (as I usual do :-)

   STL isn't even a set of requirements (to be met by various
components). It's a FRAMEWORK for creating such requirements,
for ANY kind of generic component. (See Stepanov's papers,
available from various WEB servers)

  TO explain: it is impossible to divorce C++ from STL,
especially the template facilities. What Stepanov, Musser and Lee
have done is provided the bootstrap, namely the notion of
sequences. The framework is implicit, but it is a DOCUMENTATION
STANDARD, not a piece of software, or even specs for such software,
or even specs for that category of software: its a specification
(by example) of how to specify generic library architectures.

  For example, how would you do a graphics library
after seeing STL? [The same way STL is done]. How about
doing a matrix library with optimisation opportunities
for parallel execution? [The same way as STL]

  None of this would have been possible in C++ without
Bjarne Stroustrup's patient and persistent efforts to
upgrade the C++ language -- despite the difficulties
presented by the standardisation effort.

>If you look in the draft standard, by the way, you'll see no mention
>of the STL in the table of contents.  It's a very important part of
>the standard C++ class library, .....

  .. according to my interpretation the WHOLE Working Paper is
nothing but a particular component of the STL standard,
so you have it backwards -- ISO Standard C++ is (will be)
part of STL, not the other way around :-)

  STL makes a number of things clear not present in other C++ libraries:

* the way to handle errors (DONT, document what leads to them),

* the way to make things efficient (DON'T, document complexity requirements),

* the way to make the system meet the widest needs:
   (DONT, make it extensible, flexible, generic and reusable,
   DECOUPLE the components, and document the requirements for coupling
   them together),

* the way to make things comprehensible
  (DON'T, define the notions rigorously, and make them mathematically
   simple -- leave comprehension to book writers).


In ISO C++, we have the categories Matt mentions: iterators,
containers (which are used to source the iterators), and
algorithms (which do work on the values stored in containers
via the iterators). These three _categories_ form a fairly
minimal sequential processing model. They're NOT
enough to handle real world requirments, which include
GUI's, databases, real time interactive I/O, and other
things. But they show the way so that these things can be
decoupled into categories, analysed mathematically,
and specifications formulated, to allow interoperable
and reusable software components covering a larger
range of requirements to be developed.

Isn't that, more than anything else, what is needed
in the software industry?

--
John Max Skaller               voice: 61-2-566-2189
81 Glebe Point Rd              fax:   61-2-660-0850
GLEBE NSW 2037                 email: maxtal@suphys.physics.oz.au
AUSTRALIA                      email: skaller@maxtal.com.au



[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]






Author: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
Date: 1995/11/21
Raw View
ark@research.att.com (Andrew Koenig) writes:

>ralphs@tesser.com (Ralph Shnelvar) writes:
>
>> I haven't even peeked at the STL but am wondering if there is support
>> for a disk based sorting and/or indexing algorithms.
>
>Sure.  All you have to do is write a class that meets the
>relevant iterator constraints and grants access to your
>disk file.  Then all the algorithms will work.

That misses the point a bit doesn't it?  To get reasonable efficiency,
wouldn't you want to use a different algorithm for disk based sorting
than for in-memory sorting?

--
Fergus Henderson              WWW: http://www.cs.mu.oz.au/~fjh
fjh@cs.mu.oz.au               PGP: finger fjh@128.250.37.3

[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]






Author: wilson@cs.utexas.edu (Paul Wilson)
Date: 1995/11/22
Raw View
In article <9532600.2695@mulga.cs.mu.OZ.AU>,
Fergus Henderson <fjh@munta.cs.mu.OZ.AU> wrote:
>ark@research.att.com (Andrew Koenig) writes:
>
>>ralphs@tesser.com (Ralph Shnelvar) writes:
>>
>>> I haven't even peeked at the STL but am wondering if there is support
>>> for a disk based sorting and/or indexing algorithms.
>>
>>Sure.  All you have to do is write a class that meets the
>>relevant iterator constraints and grants access to your
>>disk file.  Then all the algorithms will work.
>
>That misses the point a bit doesn't it?  To get reasonable efficiency,
>wouldn't you want to use a different algorithm for disk based sorting
>than for in-memory sorting?

Not necessarily.  Given that processor and disk speeds keep getting
further apart, locality of reference is increasingly important
for "in memory" data structures, so the same principles apply to
file sorting as to in-memory sorting.   What really matters is
the size and of the data set and the patterns of access, not whether
it's file data or a heap-allocated data structure.  A sort routine
for a large data structure should have good locality if you want
it to have good and robust performance.  In a virtual memory environment,
you don't want to assume that random access is actually a good idea.

In principle, a p-store should be more efficient than using files,
because you don't have to translate formats to move things into ad
out of virtual memory, and because the operating system doesn't
have to cache recently-transferred stuff in both the file cache
and the virtual memory cache.  After all, it's kinda silly that we
have two parallel memory hierarchies, both consisting of RAM and
disk, and we copy back and forth between them when doing I/O.

This is the motivation behind our work with persistent object stores.
Rather than having to read and write your C++ data structures to
and from files, you get the effect of a virtual memory that lives
forever.  If you want to operate on it, just follow pointers and
page it in automatically.

This means, of course, that data structure designers have to write
good classes for large data structures.  E.g., we've implemented B+
trees as a C++ class, so that you can get the nice locality effects
of B+ trees without having to externalize all of your data and go
through a krufty file interface.  We want to make the B+ trees STL
compatible, so that you can replace another collection implementation
with B+ trees if you decide that's more appropriate, locality-wise.

The nice thing about persistent C++ is that you can plug in different
data structures without worrying so much whether they should be coded
to be stored in files or "in memory".  The same code works for either.

Of course, you *do* have to pay attention to locality.  Getting
rid of the file/memory distinction doesn't make deep problems go
away;  it just makes tedious annoying unnecessary problems go away
by getting rid of the so-called "impedance mismatch" between file
and in-memory data.

--
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)

---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: Eugene Lazutkin <eugene@int.com>
Date: 1995/11/23
Raw View
: In article <48h0vg$8i6@meg.tesser.com> ralphs@tesser.com (Ralph Shnelvar)
: writes:

: > I haven't even peeked at the STL but am wondering if there is support
: > for a disk based sorting and/or indexing algorithms.
: >
: > That is, is there support for sorting monster arrays.

You can define appropriate iterators for you disk-based data.  After
that you can use QuickSort. :-)  I think it doesn't solve your problem
from the practical point of view.

If you are going to sort a huge array of disk-based data, you should
use something like balanced multiway merge with presorted sequences.
You have to deal with the allocation and the maintanance of temporary
storage.  You have to combine memory-based sorting algorithm with
disk-based.  You have to choose the "right" strategy for that.

STL doesn't support these things directly. One of possible solutions:
define special algorithm for such task, implement your own container,
solve the problem of additional storage maintanence (necessary for
multiway merge), pick up the good strategy of combination of different
approaches (or how to give a hint for your sorting algorithm about the
choosing of such strategy).

STL doesn't solve all programer's problems. We have to face it.  But it
defines the good foundation for good programming.

Eugene Lazutkin eugene@int.com

---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation
  policy is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: Robert Buren <erarobu@era-t.ericsson.se>
Date: 1995/11/23
Raw View
(This is mailed and posted to comp.std.c++)

I agree with everything you say here. However, I would really
appreciate it if you could expand on the quote below:

John Max Skaller wrote:
>
>   For example, how would you do a graphics library
> after seeing STL? [The same way STL is done]. How about
> doing a matrix library with optimisation opportunities
> for parallel execution? [The same way as STL]

Could you give just a tiny example of what you mean here?
I think STL is great, but I feel that I can still only imagine
STL as a set of container utilities. How would you make an
STL-like graphics library?

Regards,

Robert

--
     Robert Buren
     erarobu@era-t.ericsson.se
---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: kuehl@uzwil.rz.uni-konstanz.de (Dietmar Kuehl)
Date: 1995/11/23
Raw View
Hi,
Robert Buren (erarobu@era-t.ericsson.se) wrote:
: John Max Skaller wrote:
: >
: >   For example, how would you do a graphics library
: > after seeing STL? [The same way STL is done]. How about
: > doing a matrix library with optimisation opportunities
: > for parallel execution? [The same way as STL]
:
: Could you give just a tiny example of what you mean here?
: I think STL is great, but I feel that I can still only imagine
: STL as a set of container utilities. How would you make an
: STL-like graphics library?

I'm not that much concerned with graphic rather with graphs (I'm
currently constructing a [small] "STL-like" algorithm library for flow
problems). Thus the example my not be a real abstraction for
graphics... Apart from this, it would take far to long to explain the
details I have in mind while writing the scetch below.

The main part is something like identifying the common and essential
properties of the objects manipulated (viewed from the algorithms). In
case of a graphics library this could be something like a "screen": The
screens of the graphics library becomes what are the containers in the
current STL. The screens may differ in how the coordinates are
represented, how lines are drawn, and how drawing style (color, line
width, etc.) are choosen: The algorithms are not interested at these
details at all. To make such a screen useful some sort of accessor
objects are used e.g. something like a "LOGO-turtle": A turtle has a
current position and a direction it is looking. It can turn and it can
move forward in the given direction. A turtle is the equivalent to the
STL iterators. There may be different classes of turtles e.g.  simple
ones which can only move forward on a line (the "input/output
iterators) and others (more complex ones) which can move along a spline
(the random access iterators). However, the "turtles" are the objects
the algorithms work on. Finally, there can be algorithms taking
different turtles which can draw quadrangles, triangles, circles,
characters,... in different variations.

Such a "screen" abstraction together with a set of algorithms can
easily be used to implement output for different devices, e.g.
plotters, postscript-printers, rastered displays, or an in-memory
representation. It may even be used for some application is was
never intended to be used for e.g. moving a roboter.

While the graphics library could be useful I see more chances that
there will be a STL-like matrix library soon: For matrices there could
be a need to have a more precise representation of the values than
'double', the matrix can store 'int's instead of doubles, or some
interval arithmetic may be desirable. In addition the matrices can
easily vary in their representation (columnwise/rowwise,
sparse/full,...).

By identifying the abstraction used by the algorithms these libraries
can become very reusable: You can apply the algorithms to everything
which fullfills some minimal requirements. By indirection (the
iterators in STL) you can even apply algorithms to data structures
which where not designed with the given algorithms in mind!

PS: This discussion seems to become somewhat off-topic: Maybe it should
be moved somewhere else where it fits better than comp.std.c++...
--
dietmar.kuehl@uni-konstanz.de
http://www.informatik.uni-konstanz.de/~kuehl
I am a realistic optimist - that's why I appear to be slightly pessimistic
---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: matt@cyclops5.berkeley.edu (Matt Austern)
Date: 1995/11/17
Raw View
In article <48h0vg$8i6@meg.tesser.com> ralphs@tesser.com (Ralph Shnelvar) writes:

> I haven't even peeked at the STL but am wondering if there is support
> for a disk based sorting and/or indexing algorithms.
>
> That is, is there support for sorting monster arrays.

No.  The closest thing the STL has to what you're looking for is its
istream_iterator and ostream_iterator.  I don't think they'd really
be appropriate for this task, though.

The good news, though, is that the STL isn't a container class library
in any conventional sense; it isn't even a container, iterator, and
algorithm library either, although that comes closer.  What it really
is is a set of requirements that containers, iterators, and algorithms
must satisfy if they are to work together, and a fairly large but
deliberately incomplete set of predefined containers, iterators, and
algorithms.

In other words: if you define iterators that point into data files
(depending on your design, and on your OS's file I/O capabilities, you
might choose to use input iterators, forward iterators, or even random
access iterators), then you can use all of the STL algorithms that
make sense for iterators of that category.

If you look in the draft standard, by the way, you'll see no mention
of the STL in the table of contents.  It's a very important part of
the standard C++ class library, so the original STL manual has been
distributed into various parts of the draft standard.  The most
important places to look are Chapter 20 (General Utilities), Chapter
23 (Containers), Chapter 24 (Iterators), and Chapter 25 (Algorithms).
Other parts of the standard library are influenced by the STL too,
though; strings, for example, are now STL containers.
--
  Matt Austern                             He showed his lower teeth.  "We
  matt@physics.berkeley.edu                all have flaws," he said, "and
  http://dogbert.lbl.gov/~matt             mine is being wicked."

---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: wilson@cs.utexas.edu (Paul Wilson)
Date: 1995/11/18
Raw View
In article <DI8rpv.H5x@research.att.com>,
Andrew Koenig <ark@research.att.com> wrote:
>Distribution:
>In article <48h0vg$8i6@meg.tesser.com> ralphs@tesser.com (Ralph Shnelvar) writes:
>
>> I haven't even peeked at the STL but am wondering if there is support
>> for a disk based sorting and/or indexing algorithms.
>
>Sure.  All you have to do is write a class that meets the
>relevant iterator constraints and grants access to your
>disk file.  Then all the algorithms will work.
>--
>    --Andrew Koenig
>      ark@research.att.com

A cleaner way to do it is to make C++ persistent, and then use normal
"in memory" data structures to do your sorting.  Of course, you need
to pick data structures and algorithms that have good locality.

Mumit Khan has used our Texas persistent store (an efficient, high-tech,
but fairly portable p-store) for Persistent GNU STL.  Unfortunately,
GNU STL's current allocator scheme breaks orthogonal persistence, but if
you want all of your data structures to be persistent anyway, that's OK.

The next release of Texas will contain better support for STL.  (The
current release has trouble with nested classes, and requires a little
hand-fiddling to make it work.  The next release will handle nested
classes semiautomatically---you'll have to provide a list of header
files relevant to nested classes so that we can correctly generate
type descriptor information for them.)

--
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)


---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]





Author: ark@research.att.com (Andrew Koenig)
Date: 1995/11/18
Raw View
Distribution:
In article <48h0vg$8i6@meg.tesser.com> ralphs@tesser.com (Ralph Shnelvar) writes:

> I haven't even peeked at the STL but am wondering if there is support
> for a disk based sorting and/or indexing algorithms.

Sure.  All you have to do is write a class that meets the
relevant iterator constraints and grants access to your
disk file.  Then all the algorithms will work.
--
    --Andrew Koenig
      ark@research.att.com


---
[ comp.std.c++ is moderated.  Submission address: std-c++@ncar.ucar.edu.
  Contact address: std-c++-request@ncar.ucar.edu.  The moderation policy
  is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]