Thread

Topic: std::tr1::unordered_set and the complexity of iterator::operator++

Author: caj@cs.york.ac.uk (Chris)
Date: Sat, 22 Jan 2005 03:47:41 GMT Raw View

Joaqu=EDn M L=F3pez Mu=F1oz wrote:
> I'm reading the documentation for std::tr1 hashed containers
> presented at
>=20
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html
>=20
> and found that both single- and double-linked lists are allowed.
>=20
> On single-linked lists implementations I've taken a look at,
> like for instance that of GCC...

While GCC contains a hashed container, it is an old pre-standard one and=20
therefore might not meet all the requirements of the standard. The next=20
version of gcc (4.0) will contain support for some but not all of tr1,=20
and may or may no contain a correct hashed container.

Chris

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "=?iso-8859-1?B?Sm9hcXXtbiBNIEzzcGV6IE118W96?=" <joaquin@tid.es>
Date: Sat, 22 Jan 2005 13:21:57 CST Raw View

>
> Yes, it is possible to have amortized contant time iterators to a
hash
> container implemented with single-linked lists (constant time as you
> mean it, i.e. with very low loading).  It isn't a matter of singly vs

> doubly linked lists.  It is a choice of:
>
>
> list_type<T>                   list_;
> vector<list_type<T>::iterator> buckets_;
>
> or
>
> struct node
> {
>     T     entry_;
>     node* next_;
>     node* prev_; // ?
> };
> vector<node*> buckets_;

Just to make sure, in case #1 you mean all elements
are linked in the same list, while in #2 each bucket has
its own list. Correct?

>
> Either design can be forward only or bidirectional.  They both have
> advantages and disadvantages, especially when viewed in a low load
> factor condition.
>
> The former has constant time iteration even in a low load condition,
but
> suffers from insert and erase getting expensive (linear).  The latter
is
> just the opposite (again only for a low load condition).
>

I'd say the former case can achieve constant time
iteration and insertion/deletion in the case of a
bidirectional list. Iteration is obvious, as for insertion,
the procedure is:

a. find the bucket (constant time)
b. if it already has elements, link with the first (constant time)
c. if it is empty, link the element with end().

And deletion is obvious too. Case #1 with a singly linked
list cannot perform constant time deletion, as you point out.

To sum it up, the situation is like this:

A: one singly-linked list:
constant iteration, linear insertion, linear deletion
B: one bidirectional list:
constant iteration, constant insertion, constant deletion
C: one singly-linked list per bucket:
linear iteration, constant insertion, constant deletion
D: one bidirectional list per bucket:
linear iteration, constant insertion, constant deletion

So this all boils down to the root of my original post, wich
is that (IMHO) complexity guarantees imposed bt std::tr1
rule out singly-linked implementations. Comments?
--
Joaqu   n M L   pez Mu   oz
Telef   nica, Investigaci   n y Desarrollo


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Mon, 24 Jan 2005 18:17:36 GMT Raw View

In article <1106414013.954620.21420@c13g2000cwb.googlegroups.com>,
 "Joaqu   n M L   pez Mu   oz" <joaquin@tid.es> wrote:

> >
> > Yes, it is possible to have amortized contant time iterators to a
> hash
> > container implemented with single-linked lists (constant time as you
> > mean it, i.e. with very low loading).  It isn't a matter of singly vs
>
> > doubly linked lists.  It is a choice of:
> >
> >
> > list_type<T>                   list_;
> > vector<list_type<T>::iterator> buckets_;
> >
> > or
> >
> > struct node
> > {
> >     T     entry_;
> >     node* next_;
> >     node* prev_; // ?
> > };
> > vector<node*> buckets_;
>
> Just to make sure, in case #1 you mean all elements
> are linked in the same list, while in #2 each bucket has
> its own list. Correct?

Yes.  Don't forget though that in #1, buckets_ points to the beginning
of each bucket in list_.  And in a low load condition, you're going to
have buckets_.size() much greater than list_.size().  Consider where all
of the buckets_[i] are going to point.  Remember that to find the number
of elements in a bucket, you compute distance(buckets_[i],
buckets_[i+1]), at least that has been my assumption.

> > Either design can be forward only or bidirectional.  They both have
> > advantages and disadvantages, especially when viewed in a low load
> > factor condition.
> >
> > The former has constant time iteration even in a low load condition,
> but
> > suffers from insert and erase getting expensive (linear).  The latter
> is
> > just the opposite (again only for a low load condition).
> >
>
> I'd say the former case can achieve constant time
> iteration and insertion/deletion in the case of a
> bidirectional list. Iteration is obvious, as for insertion,
> the procedure is:
>
> a. find the bucket (constant time)
> b. if it already has elements, link with the first (constant time)
> c. if it is empty, link the element with end().

How do you tell if the bucket has elements, and if so, how many
elements?  If your answer is to look at the adjacent buckets_[i]
iterator, then that means adjacent empty buckets_ iterators must be
fixed up during insert/erase.  In a low load condition, the number of
adjacent empty buckets_ grows.

> B: one bidirectional list:
> constant iteration, constant insertion, constant deletion

Code it up.  I did, and did not achieve what you claim.  Perhaps you are
seeing something I overlooked.  I suppose you could do something like:

list_type<T>                   list_;
vector<pair<list_type<T>::iterator, size_type> > buckets_;

So that you explicitly store the number of elements in each bucket.  I
really haven't looked at the option in detail.  I was not anxious to add
that extra space overhead, just to deal with extremely low loading
conditions.  My reasoning is that a hash container exists to out perform
the tree-based containers in a finely tuned environment.  If you start
making the hash containers fool-proof, you at the same time erode their
reason for existing by adding time or space overhead.

I.e. I absolutely do not want to penalize the coder who has done his
homework, and established a (near) perfect distribution and loading.

> So this all boils down to the root of my original post, wich
> is that (IMHO) complexity guarantees imposed bt std::tr1
> rule out singly-linked implementations. Comments?

I propose:

> that
> in the interest of standardizing a spec that does not mandate such
> choices, that complexity be specified for the hash containers, assuming
> an even distribution with a load factor of 1.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "=?iso-8859-1?B?Sm9hcXXtbiBNIEzzcGV6IE118W96?=" <joaquin@tid.es>
Date: Mon, 24 Jan 2005 17:43:29 CST Raw View

Howard Hinnant wrote:
> In article <1106414013.954620.21420@c13g2000cwb.googlegroups.com>,
>  "Joaqu   n M L   pez Mu   oz" <joaquin@tid.es> wrote:
> >
> > I'd say the former case can achieve constant time
> > iteration and insertion/deletion in the case of a
> > bidirectional list. Iteration is obvious, as for insertion,
> > the procedure is:
> >
> > a. find the bucket (constant time)
> > b. if it already has elements, link with the first (constant time)
> > c. if it is empty, link the element with end().
>
> How do you tell if the bucket has elements, and if so, how many
> elements?  If your answer is to look at the adjacent buckets_[i]
> iterator, then that means adjacent empty buckets_ iterators must be
> fixed up during insert/erase.  In a low load condition, the number of

> adjacent empty buckets_ grows.

Well, I guess this is our point of mutual misunderstanding.
The implementation I've got in mind is like:

list_type<T> list_;
vector<pair<list_type<T>::iterator>,list_type<T>::iterator> > buckets_;

where each bucket stores iterators to the first and the
last (*not one past the last*) element. This way, inserting/erasing
an element in a bucket is an operation local to the bucket,
no need to update adjacent buckets.
If the bucket is empty, its pointers are both end() (for
instance.)
For this to work, it is essential that the ranges stored
in a bucket are coded in a [front,back] fashion, instead of
[begin,end) (note the brackets.)

>
> > B: one bidirectional list:
> > constant iteration, constant insertion, constant deletion
>
> Code it up.  I did, and did not achieve what you claim.  Perhaps you
are
> seeing something I overlooked.

I've coded it for a library of mine called Boost.MultiIndex,
and achieved the complexity bounds I talk about, though
admittedly I need the additional pointer per bucket you
(reasonably enough) are afraid about.

Peeping into Dinumware's implementation, I found that they
use a std::list plus one pointer per bucket, like you propose.
Yet, Matt Austerns says in his TR1 proposal (quote):

"The space overhead for singly linked lists is N + B words, where
N is the number of elements and B is the bucket count, and the
space overhead for doubly linked lists is 2N + 2B."

So, he seems to be thinking on a 2 pointers per bucket
implementation, though I found none in actual use.

> I.e. I absolutely do not want to penalize the coder who has done his
> homework, and established a (near) perfect distribution and loading.
>
> > So this all boils down to the root of my original post, wich
> > is that (IMHO) complexity guarantees imposed bt std::tr1
> > rule out singly-linked implementations. Comments?
>
> I propose:
>
> > that
> > in the interest of standardizing a spec that does not mandate such
> > choices, that complexity be specified for the hash containers,
assuming
> > an even distribution with a load factor of 1.
>

This is very reasonable, since I'm growing more concerned
that complexity guarantees in the TR1 paper are too
fuzzily defined, as is apparent from the discussion we're having.

(Thanks for bearing with me)
--
Joaqu   n M L   pez Mu   oz
Telef   nica, Investigaci   n y Desarrollo

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "=?iso-8859-1?B?Sm9hcXXtbiBNIEzzcGV6IE118W96?=" <joaquin@tid.es>
Date: Thu, 20 Jan 2005 22:33:42 CST Raw View

I'm reading the documentation for std::tr1 hashed containers
presented at

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html

and found that both single- and double-linked lists are allowed.

On single-linked lists implementations I've taken a look at,
like for instance that of GCC, iterator::operator++ complexity
is on average equal to 1/f where f is the load factor of the
hash table (no of elements / bucket count.) Since the bucket
count typically grows by a constant factor G on each rehashing
operation, it is simple to see that, for an ever increasing
number of elements, 1/f is bounded by G. The problem is that the
implementations I've seen do not shrink the bucket array when
erasing elements, so it seems to me that the complexity of
iterator::operator++, in a sense, does not comply with the
"amortized constant" bound imposed by the standard. Comments?
--
Joaqu   n M L   pez Mu   oz
Telef   nica, Investigaci   n y Desarrollo


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pcarlini@suse.de (Paolo Carlini)
Date: Fri, 21 Jan 2005 17:23:27 GMT Raw View

Joaqu=EDn M L=F3pez Mu=F1oz wrote:
> I'm reading the documentation for std::tr1 hashed containers
> presented at
>=20
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html
>=20
> and found that both single- and double-linked lists are allowed.
>=20
> On single-linked lists implementations I've taken a look at,
> like for instance that of GCC,

..
                                          The problem is that the
> implementations I've seen do not shrink the bucket array when
> erasing elements, so it seems to me that the complexity of
> iterator::operator++, in a sense, does not comply with the
> "amortized constant" bound imposed by the standard. Comments?

Well, your premise is flawed: at this time Gcc does *not* provide
yet (neither pretends to!) tr1::unordered containers!! Indeed,
in include/tr1 you will only find code for tuple (already quite
stable) and type_traits (still needs some work).

Paolo.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "=?iso-8859-1?B?Sm9hcXXtbiBNIEzzcGV6IE118W96?=" <joaquin@tid.es>
Date: Fri, 21 Jan 2005 13:12:37 CST Raw View

Paolo Carlini wrote:
>
> Well, your premise is flawed: at this time Gcc does *not* provide
> yet (neither pretends to!) tr1::unordered containers!! Indeed,
> in include/tr1 you will only find code for tuple (already quite
> stable) and type_traits (still needs some work).
>

Yes, you're right. Let me restate then my question, as
it has more to do with the constraints imposed by
the implementation of such a container (be it standard or
not): Is it possible to have amortized contant time
iterators to a hash container implemented with
single-linked collision lists? My previous analysis,
if correct, shows that the obvious implementation does
not achieve this.
--
Joaqu   n M L   pez Mu   oz
Telef   nica, Investigaci   n y Desarrollo


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Sat, 22 Jan 2005 02:55:55 GMT Raw View

In article <1106331144.529768.93980@f14g2000cwb.googlegroups.com>,
 "Joaqu   n M L   pez Mu   oz" <joaquin@tid.es> wrote:

> Yes, you're right. Let me restate then my question, as
> it has more to do with the constraints imposed by
> the implementation of such a container (be it standard or
> not): Is it possible to have amortized contant time
> iterators to a hash container implemented with
> single-linked collision lists? My previous analysis,
> if correct, shows that the obvious implementation does
> not achieve this.

Yes, it is possible to have amortized contant time iterators to a hash
container implemented with single-linked lists (constant time as you
mean it, i.e. with very low loading).  It isn't a matter of singly vs
doubly linked lists.  It is a choice of:

list_type<T>                   list_;
vector<list_type<T>::iterator> buckets_;

or

struct node
{
    T     entry_;
    node* next_;
    node* prev_; // ?
};
vector<node*> buckets_;

Either design can be forward only or bidirectional.  They both have
advantages and disadvantages, especially when viewed in a low load
factor condition.

The former has constant time iteration even in a low load condition, but
suffers from insert and erase getting expensive (linear).  The latter is
just the opposite (again only for a low load condition).

So the choice boils down to:  what do you value more in a hash
container:  iteration or insert/erase?

Both designs have similar characteristics with a perfect hash function
and distribution (evenly distributed, load factor 1).  I propose, that
in the interest of standardizing a spec that does not mandate such
choices, that complexity be specified for the hash containers, assuming
an even distribution with a load factor of 1.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]