Topic: Trees, Boost, GSoC, and the Standard


Author: "Bernhard Reiter" <ockham@gmx.net>
Date: Thu, 9 Nov 2006 10:15:35 CST
Raw View
The post-Portland mailing contains N2101, "Hierarchical Data Structures
and Related Concepts for the C++ Standard Library", which is the result
of an ongoing effort of mine started in the course of Google Summer of
Code(tm) 2006 to develop a "Generic Tree Container" for Boost, and my
project mentor, Ren    Rivera. It mainly seeks to reunite "explicit"
trees as used for modelling hierarchical data, and balanced (binary)
trees.

One major conceptual result is an abstraction departing from node-based
implementations towards the concept of a cursor which is then
recurringly used in the context of hierarchies. Note that the current
state of the implementation for Boost is still a work in progress;
thus, the proposal is probably not bullet-proof yet. We felt however
that early submission (so that this would still be eligible for TR2)
with the underlying concepts pretty much readily worked out would be
worthwhile. While leaving implementational details to individually
required balanced binary trees such as AVL, red-black, splay trees,
treaps, but also B and B* trees, their interface is currently specified
to model the sequence requirements closely (as opposed to the
underlying hierarchical structures). The reason for this is that a way
was looked for to provide balanced (inorder-invariant) tree structures

* without exposing the underlying tree
* nor requiring specific associative container related features (such
as sorting or extraction mechanisms).

More implementation inspired experience is required in order to see how
to circumvent shortcomings of this approach, again particularly in the
important context of associative containers (roughly spoken, the idea
of the sequence approach is using a combination of lower_bound and
insert in order to achieve sorted storage of the elements, just like
this would also work for any of the currently present linear STL
containers). Inherent shortcomings of this approach are due to the fact
that in the underlying tree, the position yielded by a lower_bound
query (which, if successful, when dereferenced, is supposed to yield
the first value "not less than" the query) is not identical to the
position where order-preserving insertion would have to take place
(which is usually a leaf node or cursor) such that balanced tree
iterators might need to contain pointers to both these relevant
positions (updating which might cause some significant overhead).
Second, structures such as splay trees require rebalancing not only
after modifying (insert/erase), but also after non-modifying (lookup)
operations. Consequently, calling insert after lower_bound duplicates
this rebalancing in a redundant way.

Other than that, the B[*]-trees also mentioned are at the moment even
conceptually more academic in that they are allocator-relying (memory)
containers without anything secondary storage related specified.
Separation into multiple layers (nary, multiway and b_[star_]tree) the
former two of which are compatible with the hierarcy/cursor model is
yet hoped to prove useful with potential development of B[*]-trees with
secondary storage relevance. (I'm very unexperienced in that field, so
any thoughts on that are very welcome.) Tries are another eventually
intended addition.

Find proposal WG21/N2101=J16/06-0171 at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html
The relevant part of the Boost GSoC wiki is at
https://boost-consulting.com:8443/trac/soc/wiki/tree
The current (buggy!) state of the implementation can be downloaded via
svn from
https://www.boost-consulting.com:8443/svn/main/boost/soc/2006/tree/trunk
and browsed online at
https://boost-consulting.com:8443/trac/soc/browser/boost/soc/2006/tree/trunk

Note that documentation is currently out of date due to some major
changes during writing the proposal; it is recommended to refer to the
proposal (with the above notes in mind) instead.

Any comments, especially with regard to the issues as described above,
are highly welcome.

Bernhard Reiter


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]