Topic: Behavior of "max_load_factor" in TR1's unordered containers
Author: petebecker@acm.org (Pete Becker)
Date: Mon, 13 Feb 2006 19:07:42 GMT Raw View
Scott Meyers wrote:
> Calling max_load_factor with some value z sets the maximum load factor
> for the hash table, but its constant-time complexity implies that no
> rehash is allowed at that point. When is it expected that such a rehash
> will take place? The logical time is the next insert, but the
> worst-case times for find and equal_range mean it could occur when those
> are called, too. (It doesn't seem possible after an erase, because
> erase isn't allowed to invalidate iterators to other than the erased
> elements.) Can somebody clarify what users are likely to be able to
> expect here? Specifically, once a maximum load factor is set that is
> below the current load factor, when should users expect the rehashing to
> take place? And where is that specified?
It's not specified anywhere. That allows for full rehashing on the next
insert or erase and for incremental rehashing. (Can't rehash on find,
etc., because they're not allowed to invalidate iterators, although I
can't offhand find that requirement.)
>
> Second, I note that the load factor passed to max_load_factor is only a
> hint. Can someone clarify why it's a hint and not a mandate (modulo
> floating point behavioral weirdnesses like catastrophic cancellation,
> etc.)?
>
Again, incremental rehashing is the immediate issue, but more generally,
allowing flexibility for implementations. Besides, with a degenerate
hash function that puts everything into one bucket, rehashing can change
the average number of elements per bucket, but doesn't actually
accomplish anything.
One of the biggest problems with hashed containers is that people will
expect the requirements to say something. They're low level containers
which really shouldn't be standardized. There are too many ways to
customize them, and the only real way to write rigorous, useful
requirements is to specify the exact implementation.
--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: wade@stoner.com
Date: Mon, 13 Feb 2006 13:29:58 CST Raw View
Howard Hinnant wrote:
> My personal preference would be that the following is a class invariant:
>
> size()/bucket_count() <= max_load_factor()
It would be nice to allow the implementation to replace size() with
count_of_unique_keys() for multi...set or map. Repeatedly putting the
same key into some hash_multimap implementations is an O(N*N)
operation, because insertion is O(number of empty buckets adjacent to
the bucket being updated), and the standard requires that O(N) buckets
be present (even when only one is in use).
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: usenet@aristeia.com (Scott Meyers)
Date: Wed, 15 Feb 2006 23:59:43 GMT Raw View
Pete Becker wrote:
> Scott Meyers wrote:
>> Second, I note that the load factor passed to max_load_factor is only
>> a hint. Can someone clarify why it's a hint and not a mandate (modulo
>
> Again, incremental rehashing is the immediate issue, but more generally,
> allowing flexibility for implementations. Besides, with a degenerate
> hash function that puts everything into one bucket, rehashing can change
> the average number of elements per bucket, but doesn't actually
> accomplish anything.
But calling rehash is guaranteed to respect max_load_factor, so we have a
situation where
constainer.max_load_factor(z);
need not do anything, even after subsequent insertions, but calling
container.rehash();
must arrange for container.size()/container.bucket_count() < z. So the value
passed to max_load_factor is a hint until rehash is called, at which point it
becomes a requirement.
Scott
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: dave@boost-consulting.com (David Abrahams)
Date: Thu, 16 Feb 2006 19:01:24 GMT Raw View
usenet@aristeia.com (Scott Meyers) writes:
> Pete Becker wrote:
>> Scott Meyers wrote:
>>> Second, I note that the load factor passed to max_load_factor is
>>> only a hint. Can someone clarify why it's a hint and not a mandate
>>> (modulo
>> Again, incremental rehashing is the immediate issue, but more
>> generally, allowing flexibility for implementations. Besides, with a
>> degenerate hash function that puts everything into one bucket,
>> rehashing can change the average number of elements per bucket, but
>> doesn't actually accomplish anything.
>
> But calling rehash is guaranteed to respect max_load_factor, so we have a
> situation where
>
> constainer.max_load_factor(z);
>
> need not do anything, even after subsequent insertions, but calling
>
> container.rehash();
>
> must arrange for container.size()/container.bucket_count() < z. So the value
> passed to max_load_factor is a hint until rehash is called, at which point it
> becomes a requirement.
That's a weird way of looking at it. I wouldn't call it a hint at
all; that just complicates the issue and makes the behavior seem more
mysterious than it is. The container is free to rehash in response to
other operations like insertion (the user needs to know that anyway),
and the requirement is that the max_load_factor will be respected by
the next rehash operation.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: Scott Meyers <usenet@aristeia.com>
Date: Thu, 16 Feb 2006 15:36:46 CST Raw View
David Abrahams wrote:
> That's a weird way of looking at it. I wouldn't call it a hint at
> all; that just complicates the issue and makes the behavior seem more
> mysterious than it is. The container is free to rehash in response to
> other operations like insertion (the user needs to know that anyway),
> and the requirement is that the max_load_factor will be respected by
> the next rehash operation.
I don't think that's what TR1 says. It says that the argument passed to
max_load_factor is a hint, and the complexity constraints imply that no rehash
is possible at that point, so setting a new z below the current z means that
your new z must be ignored at that point. If you then do an insert, there may
(or may not) be an incremental rehash, but that need not bring the load factor
under the z you specified. This may continue for several insertions.
As TR1 reads now, the only way you are guaranteed that your load factor request
will be respected is if you (or the implemenation) explicitly calls rehash. At
least that how it looks to me.
Scott
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: petebecker@acm.org (Pete Becker)
Date: Fri, 17 Feb 2006 06:59:57 GMT Raw View
Scott Meyers wrote:
>
> As TR1 reads now, the only way you are guaranteed that your load factor
> request will be respected is if you (or the implemenation) explicitly
> calls rehash. At least that how it looks to me.
>
That's how it is. As Dave said, the container is free to rehash in other
operations, as well. It is not required to. That allows for, among other
things, incremental rehashing. If you really want to force your load
factor you must call rehash.
--
Pete Becker
Roundhouse Consulting, Ltd.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: petebecker@acm.org (Pete Becker)
Date: Fri, 17 Feb 2006 15:42:06 GMT Raw View
Scott Meyers wrote:
>
> I don't think that's what TR1 says. It says that the argument passed to
> max_load_factor is a hint, and the complexity constraints imply that no
> rehash is possible at that point, so setting a new z below the current z
> means that your new z must be ignored at that point. If you then do an
> insert, there may (or may not) be an incremental rehash, but that need
> not bring the load factor under the z you specified. This may continue
> for several insertions.
>
TR1 does say it's a "hint", but that's a poor choice of word. It's not a
hint, because it imposes an absolute requirement: after a call to
rehash, the load factor must be less than or equal to the most recently
specified value.
--
Pete Becker
Roundhouse Consulting, Ltd.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: usenet@aristeia.com (Scott Meyers)
Date: Sat, 11 Feb 2006 01:54:28 GMT Raw View
Calling max_load_factor with some value z sets the maximum load factor for the
hash table, but its constant-time complexity implies that no rehash is allowed
at that point. When is it expected that such a rehash will take place? The
logical time is the next insert, but the worst-case times for find and
equal_range mean it could occur when those are called, too. (It doesn't seem
possible after an erase, because erase isn't allowed to invalidate iterators to
other than the erased elements.) Can somebody clarify what users are likely to
be able to expect here? Specifically, once a maximum load factor is set that is
below the current load factor, when should users expect the rehashing to take
place? And where is that specified?
Second, I note that the load factor passed to max_load_factor is only a hint.
Can someone clarify why it's a hint and not a mandate (modulo floating point
behavioral weirdnesses like catastrophic cancellation, etc.)?
Thanks,
Scott
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Sat, 11 Feb 2006 18:19:28 GMT Raw View
In article <11uqadodhp0b246@corp.supernews.com>,
usenet@aristeia.com (Scott Meyers) wrote:
> Calling max_load_factor with some value z sets the maximum load factor for
> the
> hash table, but its constant-time complexity implies that no rehash is
> allowed
> at that point. When is it expected that such a rehash will take place? The
> logical time is the next insert, but the worst-case times for find and
> equal_range mean it could occur when those are called, too. (It doesn't seem
> possible after an erase, because erase isn't allowed to invalidate iterators
> to
> other than the erased elements.) Can somebody clarify what users are likely
> to
> be able to expect here? Specifically, once a maximum load factor is set that
> is
> below the current load factor, when should users expect the rehashing to take
> place? And where is that specified?
>
> Second, I note that the load factor passed to max_load_factor is only a hint.
> Can someone clarify why it's a hint and not a mandate (modulo floating point
> behavioral weirdnesses like catastrophic cancellation, etc.)?
I believe (someone please correct me if I'm wrong), that
max_load_factor(z) is semantics-free (and thus useful only if writing to
a particular implementation).
My personal preference would be that the following is a class invariant:
size()/bucket_count() <= max_load_factor()
However you can not have that as a class invariant, AND have
max_load_factor(z) do something meaningful in constant time (for a
decreasing value of max_load_factor()).
Imho, allowing max_load_factor(z) to make max_load_factor() <
size()/bucket_count(), even if only till the next insert or erase, is a
poor interface design. So in the case that z < size()/bucket_count() we
are left with one of:
1. Ignore z completely.
2. Set max_load_factor() == size()/bucket_count() (or just barely
greater).
3. Allow max_load_factor(z) to increase bucket_count() to maintain
max_load_factor() >= size()/bucket_count(), which in turn violates the
constant complexity requirement.
Imho, choice 3 is the highest quality choice, but at the cost of not
conforming to TR1.
-Howard
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: usenet@aristeia.com (Scott Meyers)
Date: Mon, 13 Feb 2006 00:09:56 GMT Raw View
Howard Hinnant wrote:
> I believe (someone please correct me if I'm wrong), that
> max_load_factor(z) is semantics-free (and thus useful only if writing to
> a particular implementation).
That would be consistent with the passed z being only a hint.
> My personal preference would be that the following is a class invariant:
>
> size()/bucket_count() <= max_load_factor()
>
> However you can not have that as a class invariant, AND have
> max_load_factor(z) do something meaningful in constant time (for a
> decreasing value of max_load_factor()).
It seems to me that the logical thing is to get rid of the hint part and change
the complexity to constant if the specified load factor is less than or equal to
size()/bucket_count(), otherwise linear.
Does anybody know why the current specification is as it is? The hint and
constant time complexity didn't get there by accident, I'm sure.
Scott
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]