Topic: std::set searching
Author: marg64@yahoo.com (Vahan Margaryan)
Date: Mon, 13 May 2002 17:30:48 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> wrote in message news:<lldC8.520$1r.51735@news8-gui.server.ntli.net>...
> Garry Lancaster:
> Without an additional traits mechanism you won't get
> a graceful performance degradation: if you use an
> any order, linear complexity comparison predicate with
> a find function expecting a same ordering, log time
> predicate it wouldn't work correctly. I wouldn't really
> advocate a traits mechanism here: this solution we
> are devising is complex enough already.
Yes, you are right, it just won't work correctly. But again, I would
like to use the same analogy - if you provide an incorrect predicate
as the second parameter to set (as I have done several times) - it
eventually leads to a crash. Let this be another case, where the user
must be careful in specifying the predicate.
> 1. Pointless. If the two produce the same ordering why not
> just use one of them, as with my KeyAccessor suggestion?
They don't produce the _same_ ordering. There is an asymmetric
relation between these predicates. A sequence sorted by the sort
predicate is also sorted by the search predicate, but the opposite is
not necessarily true. Thus, if we have lexicographically sorted
sequence of words, they are also sorted by their first letter, but the
opposite is not true.
> 2. Difficult to get right. Programmers using the standard
> library have various responsibilities of course, but
> part of designing a good interface for them to use is
> to prevent likely mistakes from being made.
I realize that difficulty is something subjective, but in the cases I
see as frequent (the OP's case, for instance), writing the predicate
seems quite easy to me.
> Has anyone got a good example where you need
> multiple different predicates with the same ordering?
> I thought about your field1/field2 subkey search idea for
> a while until it occurred to me that std::set::lower_bound
> already allows such a search:
Yes, you are right, lower_bound works. I have been using it in such
cases. Again, this is perhaps a matter of personal preference, but I
tend to dislike that solution.
lower_bound is a little bit less efficient than a specialized
predicate would be. After all, I have a search tree data structure,
and it feels strange that I have to do something hack-like (and less
efficient) to perform a perfectly valid search in that tree. Also, for
lower_bound I need a 'minimum' value for the second field, which is
not always convenient.
-Vahan
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: jpotter@falcon.lhup.edu (John Potter)
Date: Thu, 9 May 2002 18:46:50 GMT Raw View
On Thu, 9 May 2002 08:48:30 GMT, "Anthony
Williams"<anthwil@nortelnetworks.com> wrote:
> "Garry Lancaster" <glancaster@ntlworld.com> wrote in message
> news:lldC8.520$1r.51735@news8-gui.server.ntli.net...
> > Has anyone got a good example where you need
> > multiple different predicates with the same ordering?
Consider a set<pair<T, T> > where there is a non-trivial and
non-reversable monotonic mapping from first to second.
P1.first < p2.first ==> p1.second < p2.second. The set is ordered
by second also, but requires a different predicate.
> As I pointed out before, std::lower_bound (the algorithm, not the std::set
> member) provides the necessary functionality, but would be very inefficient
> unless optimized for std::set::iterators. This is (currently) QoI, but it
> would be nice if the Standard guaranteed such an optimized version for
> std::set and std::map --- for such containers, the elements cannot be
> otherwise than in-order by the comparison function of the container, and
> std::lower_bound requires that it is in-order by the specified comparison
> function, so the two orderings _must_ be compatible, otherwise undefined
> behaviour ensues.
#include <whole_standard_library>
We may be close to that now anyway. In that case it would not be much
of a change.
> Also, I note that the std::binary_search set of algorithms only require
> forward iterators. Performing binary search with such iterators would be
> very inefficient in terms of iterator usage --- searching a range defined
> with forward iterators would require "traversing" twice as many elements as
> there are in the range, even though the comparison is only applied log(n)
> times, so for large ranges the traversal would dominate (do you fancy
> iterating through 200000000 elements, just to do 8/9 comparisons?).
My tests with list show that average time of lower_bound is one fourth
of find. A better O(N). However, using a list for searching is just
plain stupid.
> For
> set/map iterators this can be even worse --- unless each node in the binary
> tree contains the count of nodes on the left and right (thus requiring extra
> storage), implementing std::advance involves traversing the tree to find the
> next element, then the one after that, and so on (quick back-of-an-envelope
> calculation says I have to visit n log N nodes to advance n places in a tree
> with N elements => 2N log N in total. I hope its really less than that).
It is. With N nodes, there are N - 1 branches and each is traversed
twice in a complete traversal. On average, advance n visits 2n nodes.
> However, since the comparator must yield the same sort order as the
> container's comparator, you can just use the binary tree properties, and
> visit log N nodes.
In conclusion, algorithms designed to work on vector still work on any
thing which looks like a sequence. They just take forever. Let's admit
that STL is a total failure and rewrite every algorithm for every
container.
> I see this as related to std::sort requiring random access iterators --- you
> can implement an efficient (in terms of comparisons) sort algorithm for
> forward and bi-directional iterators, but the iterator house keeping would
> be inefficient, or the algorithm would require additional storage (copy to a
> vector, sort, copy back, for example).
There is a version of quick sort which works well with forward
iterators. It is much slower than std::sort of the vector but is
faster on a vector than using std::sort of a deque. It, of course,
can go O(N*N) on bad data.
> Surely it is up to the user to decide
> whether or not this is appropriate (and besides, we provide std::list.sort,
> and list is the only sortable (i.e. sequence) container for which
> non-random-access iterators are provided)
We just may see slist soon. There are several other algorithms which
are also members of list because the general algorithm does not work
well.
Algorithms + data structures == programs. General algorithms do not
work well on all data structures. The STL provides plug and play
but may not be very efficient. Stable sort of vector requires a
large space overhead or becomes much slower. On a list, it has very
little space overhead and is fast.
I think that we agree that adding binary search members which take a
predicate to the associatives is the best patch for the subject problem.
John
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Fri, 10 May 2002 15:27:09 GMT Raw View
Garry Lancaster:
> > > Has anyone got a good example where you need
> > > multiple different predicates with the same ordering?
John Potter:
> Consider a set<pair<T, T> > where there is a non-trivial and
> non-reversable monotonic mapping from first to second.
> P1.first < p2.first ==> p1.second < p2.second. The set is ordered
> by second also, but requires a different predicate.
That would certainly fit the bill, but I wonder
how common it is.
[snip]
> I think that we agree that adding binary search members which take a
> predicate to the associatives is the best patch for the subject problem.
I'm almost convinced now although I still think
the potential for accidental misuse is quite
high.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: 10 May 2002 18:36:41 GMT Raw View
Garry Lancaster:
> |> I am sure it is possible to do better if we are prepared to tweak
> |> std::set a little (see my reply to Vahan Magaryan's post).
James Kanze:
> I wouldn't really consider this as tweaking std::set, but rather
> proposing something new.
It's quite a modest change to std::set in terms of lines
of code. Furthermore, the default behaviour of the modified
template - before you override any of the template parameter
defaults - is just the same as the current std::set.
> I like the idea, but I think it definitly
> needs more work (at least in specifying).
Absolutely. It's just a first stab at an elegant solution.
> I'm particularly bothered by the fact that you cannot make
> the parts participating in the key const in some way.
It would be good to fix this but the problem is nothing
new: it affects the current std::set anyway.
Other than making the whole value_type const, which is
overly restrictive, I don't think the container can be formulated
to prevent changes to the key. This will continue to be down to
the discipline of the author of the value_type class and/or
the caller, just as it is now.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: marg64@yahoo.com (Vahan Margaryan)
Date: Sat, 11 May 2002 12:16:57 GMT Raw View
jpotter@falcon.lhup.edu (John Potter) wrote in message news:<3cda4185.3086630@news.earthlink.net>...
>
> In conclusion, algorithms designed to work on vector still work on any
> thing which looks like a sequence. They just take forever. Let's admit
> that STL is a total failure and rewrite every algorithm for every
> container.
I think STL wasn't supposed to provide a single version of algorithm
to work in all cases. It depends on the algorithm. There are
algorithms (such as for_each) that don't use randomness of iterator.
STL makes sure these can be used with random iterators too - and yes,
these are generic algorithms that (almost?) don't depend on the
specific data structure. There are algorithms (such as sort) that work
well with random 'and better' iterators. Those need to be rewritten
for containers that don't have random iterators.
My observation is that using functions like 'advance' and 'distance'
to allow 'weaker' iterators with 'stronger' algorithms usually means
suboptimal performance, although in some cases they're ok (such as
when better performance is impossible).
> I think that we agree that adding binary search members which take a
> predicate to the associatives is the best patch for the subject problem.
>
> John
>
I definitely do :)
-Vahan
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Thu, 2 May 2002 17:08:03 GMT Raw View
James Kuyper Jr:
> > It's not implemented, because std::set<> isn't meant to be used that
> > way. If somedata plays no part in your operator<(), then
> > std::map<sometype, somedata> would seem more appropriate than
> > std::set<C>.
Vahan Margaryan:
> Say, I have a class Customer in my program, which is sorted based on
> its 'name_' member, and holds lots of other information. You suggest
> that I initiate a redesign, and move name_ out of Customer, because
> set doesn't support this bit of functionality? I don't believe it is a
> correct approach. I would instead hold a dummy Customer and assign its
> name_ to the item being searched. Anyway, this is an inconvenience,
> and not an imaginary one.
Absolutely. I've been bitten by it myself.
> The following document describes a solution to a similar problem with
> binary search. I would imagine that something similar could be
> proposed for set.
http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2001/n1313.html
For std::set, how about something like the following?
(Best read whilst consulting the current definition of
std::set in the standard.)
std::set searching
==================
Solution:
1. Rename std::set's Key template parameter as T and update the
rest of the template so that Key=>T.
2. Add an extra template parameter, KeyAccessor, to the std::set
template. This must provide
(a) A type called key_type.
(b) A static function called get_key that takes a value_type or const
reference to value_type as its only parameter and returns a key_type
or a const reference to a key_type. The return value of the function
need only be guaranteed to be valid for the lifetime of the object
passed to the function.
The KeyAccessor arguement default is std::key_accessor<T>,
where std::key_accessor is defined so:
namespace std
{
template <typename T>
struct key_accessor
{
typedef T key_type;
static const key_type& get_key(const T& in) { return in; }
};
}
In other words, by default T and key_type are the same type and
get_key simply returns its unprocessed parameter. This ensures
the behaviour of the modified std::set is the same, by default, as
the current std::set.
KeyAccessor could become std::set's third template parameter,
shifting Allocator to the fourth position. This is considered
elegant since there is an unwritten convention throughout the
standard library that wherever an Allocator template parameter
is used it is the final template parameter. However, depending
on exactly how standard library version compatibility is
addressed (to be decided) this may break existing code that
overrides the Allocator default, so conceivably an exception could
be made and KeyAccessor placed after the Allocator parameter.
3. Add a key_accessor_type typedef to std::set as an alias for
KeyAccessor.
4. Change std::set's key_type typedef so that it is an alias for
KeyAccessor::key_type.
5. When doing comparisons internally between two value_types
or between a value_type and a key_type, std::set will use
key_accessor::get_key to obtain a key from the value_types.
Thoughts?
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: James Kanze <kanze@alex.gabi-soft.de>
Date: Thu, 2 May 2002 20:05:53 GMT Raw View
"James Kuyper Jr." <kuyper@wizard.net> writes:
|> > Why is find not implemented so that you could use any type for
|> > comparison if only you have the right operators? The way I code
|> > (and yes - the problem might be there), it would be a nice
|> > convenience.
|> It's not implemented, because std::set<> isn't meant to be used
|> that way. If somedata plays no part in your operator<(), then
|> std::map<sometype, somedata> would seem more appropriate than
|> std::set<C>.
That's my opinion as well, but I know that others think differently.
Having recently implemented a project based on a relational database,
I can understand their way of thinking.
On the other hand, I don't think that std::set can be bent to conform
to the database image.
--=20
James Kanze mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: 3 May 2002 18:52:58 GMT Raw View
Garry Lancaster:
> |> The problem really is that neither std::set nor std::map elegantly
> |> supports the situation where you want the key to be (a small) part
> |> of the value object being searched for rather than a separate
> |> object.
James Kanze:
> I've used std::map a lot for this.
> |> If you use set, you end up having to construct and pass a whole
> |> value for the set::find rather than just the key part, which is
> |> increasingly inefficient as the complexity of the value part
> |> relative to the key part increases.
>
> |> If you use map, you are forced to artificially separate the key
> |> and data into different objects.
> No. You're forced to duplicate the key part.
In truth we could take either approach.
Logically the key is part of the value. Yet in both cases
we are forced to create a separate key object. As you
point out there is nothing to stop it remaining a part
of the value as well.
> |> To combine the two into the single object you really wanted you
> |> have to use something like std::pair.
>
> |> This is a problem I have come across a number
> |> of times.
>
> |> It seemed to me this was what the OP was getting at.
> Probably, but having successfully used both std::set
> and std::map in such cases, I fail to see where it is a
> problem.
You seem to understand the issue and its current
workarounds/solutions. You just don't want to call
it a "problem". The reason I do is because I don't like
any of the current solutions much.
> If I use std::map, I get some duplication of data.
Maybe. More importantly, a separate key object is
needed.
> If I use std::set, I have to provide special constructors for the
> member type, so that just the key part is initialized.
Careful: the key part may be the only part *explicitly*
initialized, but any non-POD class members will still be
default-initialized, so we still pay for that. We are also
limited by the fact that the dtor, copy constructor and
copy assignment operator must still work correctly:
any members they expect to have particular values must
be given those values.
Aside from that, it just doesn't feel right. It feels like
bending the class to make it do something it
shouldn't really be doing.
> Both solutions work.
Do you really consider either of these elegant solutions?
I am sure it is possible to do better if we are prepared
to tweak std::set a little (see my reply to Vahan Magaryan's
post).
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: marg64@yahoo.com (Vahan Margaryan)
Date: Fri, 3 May 2002 20:02:54 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> wrote in message news:<nM8A8.1730$sB6.28949@news8-gui.server.ntli.net>...
> Solution:
> [snip]
This is one possible solution. The solution I was considering was to
provide templatized family of 'find' functions in set. Something like
this:
//within set
template<class UnaryPred>
iterator find(const UnaryPred& p);
UnaryPred takes an element of a set and specifies whether it's less,
greater or equal to the object being searched. Note that equality
needs to be indicated specially, so UnaryPred::operator() probably
needs to return int.
The reasons I was leaning more towards this solution, as opposed to
yours, are:
-no need to add template parameters to the set class.
-no need to wrap the key-part of the object into a special class (when
the key part consists of more than a single data member).
-the ability to perform 'subkey' search. For instance, say you have a
set of C objects, that are sorted first by field1, then by field2.
Sometimes you might want to find 'just any object with field1 equal to
X'. You don't know (and don't care) what value field2 might take. By
throwing in a special predicate, this can be achieved.
What do you think?
-Vahan
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: Jim Melton <jim.melton@lmco.com>
Date: 5 May 2002 00:55:21 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> wrote in message
news:2ksA8.388$jE6.26117@news8-gui.server.ntli.net...
>
> Garry Lancaster:
> > |> The problem really is that neither std::set nor std::map elegantly
> > |> supports the situation where you want the key to be (a small) part
> > |> of the value object being searched for rather than a separate
> > |> object.
>
> James Kanze:
> > I've used std::map a lot for this.
The OP specifically asked about searching a set, but what about searching a
generic container? A set implies a sort order, and as such can suggest an
optimal access path -- optimal that is, when the key to search is identical
to the sort key. When this condition is not met, we are back to the generic
search question.
It seems that we are dangerously close to database indexing and query
optimization strategies. If a database (arbitrary collection of data) does
not have an appropriate index for a particular query, it must resort to a
linear search of the table ("container"). If it does have an appropriate
index, the index can be searched in a binary or hash or whatever manner for
orders-of-magnitude performance improvement.
A std::map is nothing more than an indexed collection of data.
> > |> If you use set, you end up having to construct and pass a whole
> > |> value for the set::find rather than just the key part, which is
> > |> increasingly inefficient as the complexity of the value part
> > |> relative to the key part increases.
> >
> > |> If you use map, you are forced to artificially separate the key
> > |> and data into different objects.
>
> > No. You're forced to duplicate the key part.
Indexing always adds memory. (A sorted container, such as std::set, can be
considered an index that requires no additional memory, but that imposes a
computation penalty to sort the data in the first place). Optimal search
strategies are never free.
> You seem to understand the issue and its current
> workarounds/solutions. You just don't want to call
> it a "problem". The reason I do is because I don't like
> any of the current solutions much.
>
> > If I use std::map, I get some duplication of data.
>
> Maybe. More importantly, a separate key object is
> needed.
>
> > If I use std::set, I have to provide special constructors for the
> > member type, so that just the key part is initialized.
Bear in mind that your constructed value must sort lexically in the same
manner as the other elements of the set (that is, your sort is really on
some subset of the attributes of the value and your query ("find") is over
the same subset.
> > Both solutions work.
>
> Do you really consider either of these elegant solutions?
>
> I am sure it is possible to do better if we are prepared
> to tweak std::set a little (see my reply to Vahan Magaryan's
> post).
How about something like std::find_if()? I don't have a reference in front
of me, but can't you define your predicate function to match whatever
criteria you choose? This is syntactically more compact than iterating over
the container, if not any more efficient.
--
<disclaimer>
Opinions posted are those of the author.
My company doesn't pay me enough to speak for them.
</disclaimer>
--
Jim Melton
Software Architect, Fusion Programs
Lockheed Martin Astronautics
(303) 971-3846
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: pkl@mailme.dk (Peter Koch Larsen)
Date: 5 May 2002 19:56:03 GMT Raw View
Hi
My original post was for some reason delayed and i came under the
impression that it was lost. I now come back to see a flurry of
activity on this subject, which is good. Let me comments on some of
the interesting posts so far.
James Kanze proposes i either use a map or a set with no default
constructor. I do know about both approaches and tend to think, that
map is the cleanest approach since this will remove the need for me to
have the default key-only constructor. Still i will need to be able to
extract the key from the containing class.
In the actual case, i get these data from a relational dbms so James
did get it right in that my original design does reflect relational
thinking.
James Kuyper suggests that std::set was not meant to be used that way.
This may well be so, but in that case C++ seems to be lacking an
associative container, at least if you share mine and Garry Lancasters
view that neither the current set or the current map solution is
elegant. I might be to relational here, but does that offend anyone?
Garry Lancaster shares my opinion that the current solution is not as
elegant as could be. He also proposes a solution which i believe i can
grasp. What i do not understand, however is why the searching methods
(find, lower_bound, upper_bound) should insist on taking a parameter
to the element. My question is: why can't you just template that
parameter and allow for anything, that compares with the element?
Would it break existing code? I do not think so but then perhaps it's
just my ignorance or whatever that blinds me.
If it did break existing code then what about adding those three
searching functions (naming them e.g. find_key, lower_bound_key,
upper_bound_key) and muddle along in that direction. Is there a
dangerous beast hidden somewhere here?
My job has kept me busy lately; if not i would have liked to play with
my current set-implementation to check if either approach could be
useful. This is way down my things-to-pursue-list right now, however.
Kind regards
Peter Koch Larsen
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Mon, 6 May 2002 07:53:08 GMT Raw View
Vahan Margaryan:
> This is one possible solution. The solution I was considering was to
> provide templatized family of 'find' functions in set. Something like
> this:
>
> file://within set
> template<class UnaryPred>
> iterator find(const UnaryPred& p);
>
> UnaryPred takes an element of a set and specifies whether it's less,
> greater or equal to the object being searched. Note that equality
> needs to be indicated specially, so UnaryPred::operator() probably
> needs to return int.
>
> The reasons I was leaning more towards this solution, as opposed to
> yours, are:
> -no need to add template parameters to the set class.
>
> -no need to wrap the key-part of the object into a special class (when
> the key part consists of more than a single data member).
>
> -the ability to perform 'subkey' search. For instance, say you have a
> set of C objects, that are sorted first by field1, then by field2.
> Sometimes you might want to find 'just any object with field1 equal to
> X'. You don't know (and don't care) what value field2 might take. By
> throwing in a special predicate, this can be achieved.
>
> What do you think?
I think we have to be careful allowing std::set::find
to use a different sorting criteria to std::set::insert.
This could easily break the guarantee that the
ordering produced by both will be the same. This
guarantee is necessary to achieve std::set::find's
impressive logarithmic time performance. If the
sort orders are not the same you get linear time
performance, the same as the general std::find
algorithm.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: 6 May 2002 14:58:51 GMT Raw View
Jim Melton wrote:
>
> "Garry Lancaster" <glancaster@ntlworld.com> wrote in message
> news:2ksA8.388$jE6.26117@news8-gui.server.ntli.net...
> >
> > Garry Lancaster:
> > > |> The problem really is that neither std::set nor std::map elegantly
> > > |> supports the situation where you want the key to be (a small) part
> > > |> of the value object being searched for rather than a separate
> > > |> object.
> >
> > James Kanze:
> > > I've used std::map a lot for this.
>
> The OP specifically asked about searching a set, but what about searching a
> generic container? A set implies a sort order, and as such can suggest an
> optimal access path -- optimal that is, when the key to search is identical
> to the sort key. When this condition is not met, we are back to the generic
> search question.
>
> It seems that we are dangerously close to database indexing and query
> optimization strategies. If a database (arbitrary collection of data) does
> not have an appropriate index for a particular query, it must resort to a
> linear search of the table ("container"). If it does have an appropriate
> index, the index can be searched in a binary or hash or whatever manner for
> orders-of-magnitude performance improvement.
>
> A std::map is nothing more than an indexed collection of data.
>
> > > |> If you use set, you end up having to construct and pass a whole
> > > |> value for the set::find rather than just the key part, which is
> > > |> increasingly inefficient as the complexity of the value part
> > > |> relative to the key part increases.
> > >
> > > |> If you use map, you are forced to artificially separate the key
> > > |> and data into different objects.
> >
> > > No. You're forced to duplicate the key part.
>
> Indexing always adds memory. (A sorted container, such as std::set, can be
> considered an index that requires no additional memory, but that imposes a
The requirements on the complexities of the member functions and on the
validity of iterators, pointers, and references effectively mean that
std::set cannot be implemented as flat container that is kept in sorted
order. It can be implemented using some variant of a binary tree, but
then the order of the elements of the set is stored in the node pointers
that connect the tree, which IS additional memory.
There is, however, no intrinsic necessity that key values be duplicated
in a std::map; it would have been feasible for the standard to define a
container with a significantly different interface, that kept both the
key and the data together in a single user-defined type, allowing code
that must access both the key and the data to be a member function of
that type.
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: marg64@yahoo.com (Vahan Margaryan)
Date: 6 May 2002 14:58:52 GMT Raw View
Jim Melton <jim.melton@lmco.com> wrote in message news:<aav38o$hhb2@cui1.lmms.lmco.com>...
>
> The OP specifically asked about searching a set, but what about searching a
> generic container? A set implies a sort order, and as such can suggest an
> optimal access path -- optimal that is, when the key to search is identical
> to the sort key.
Well, not strictly identical. If the search key is a 'subkey' of the
sort key, in the sense that any sequence sorted by the sort key is
also sorted by the search key (but not necessarily the opposite), you
can still perform an optimal search. For instance, if you have
lexicographically sorted strings, you can efficiently find a string
that starts with 'q'. I mention this because I proposed (in a previous
post to this thread) to have a member function in std::set (like the
find_if you mention below) that can take 'subkey' predicates and
perform an optimal search.
>
> Indexing always adds memory. (A sorted container, such as std::set, can be
> considered an index that requires no additional memory, but that imposes a
> computation penalty to sort the data in the first place).
In the case mentioned by the OP, how does std::set impose a
computation penalty as compared with std::map?
>
> How about something like std::find_if()? I don't have a reference in front
> of me, but can't you define your predicate function to match whatever
> criteria you choose? This is syntactically more compact than iterating over
> the container, if not any more efficient.
I suggest not arbitrary criteria (that kind of searching can be
performed via std::find_if), but a criteria that can be searched for
optimally within a sorted sequence (as mentioned above).
-Vahan
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: James Kanze <kanze@alex.gabi-soft.de>
Date: 6 May 2002 16:58:03 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> writes:
|> > |> To combine the two into the single object you really wanted
|> > |> you have to use something like std::pair.
|> > |> This is a problem I have come across a number of times.
|> > |> It seemed to me this was what the OP was getting at.
|> > Probably, but having successfully used both std::set and
|> > std::map in such cases, I fail to see where it is a problem.
|> You seem to understand the issue and its current
|> workarounds/solutions. You just don't want to call it a
|> "problem". The reason I do is because I don't like any of the
|> current solutions much.
It's not a "problem" in the sense that there are easily implemented
solutions.
Obviously, when you need something like this, having a standard
solution is even better than having to implement one yourself,
regardless of how easy it is. On the other hand, there are literally
millions of problems; is this one more needing of a standard solution
than the others? I'm not convinced, but I'll admit that my experience
has been limited.
|> > If I use std::map, I get some duplication of data.
|> Maybe. More importantly, a separate key object is needed.
Which may or may not be a problem. If there is a possibility of
modifying objects in the map, in arbitrary ways, there is a definite
possibility of the key object no longer being equal to the object it
indexes. This is a real problem, and must be avoided in one way or
another. Of course, if you try and change the key part of an object
in a set, the results are undefined behavior, with possibly
disasterous consequences, so this isn't necessarily a solution either.
|> > If I use std::set, I have to provide special constructors for
|> > the member type, so that just the key part is initialized.
|> Careful: the key part may be the only part *explicitly*
|> initialized, but any non-POD class members will still be
|> default-initialized, so we still pay for that.
True. When I did this, my objects contains mostly strings, so I was
doing some work for nothing. It wasn't enough to show up in the
profile, however, so I didn't worry about it.
|> We are also limited by the fact that the dtor, copy constructor
|> and copy assignment operator must still work correctly: any
|> members they expect to have particular values must be given those
|> values.
Good point. In my case, I actually used sets of pointers -- the
identity of the objects in the set was important, and the objects
didn't support copy (although all of the elements did).
|> Aside from that, it just doesn't feel right. It feels like bending
|> the class to make it do something it shouldn't really be doing.
Yes and no. In my case, the only reason the class existed was to be
contained in these sets, so have special static "factory" functions to
create the keys didn't bother me. But in the general case, you are
right.
|> > Both solutions work.
|> Do you really consider either of these elegant solutions?
They work.
|> I am sure it is possible to do better if we are prepared to tweak
|> std::set a little (see my reply to Vahan Magaryan's post).
I wouldn't really consider this as tweaking std::set, but rather
proposing something new. I like the idea, but I think it definitly
needs more work (at least in specifying). I'm particularly bothered
by the fact that you cannot make the parts participating in the key
const in some way.
--
James Kanze mailto:kanze@gabi-soft.de
Conseils en informatique orient e objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelh ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: marg64@yahoo.com (Vahan Margaryan)
Date: Mon, 6 May 2002 20:01:01 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> wrote in message news:<v_OA8.1825$9W4.921649@news6-win.server.ntlworld.com>...
>
> I think we have to be careful allowing std::set::find
> to use a different sorting criteria to std::set::insert.
> This could easily break the guarantee that the
> ordering produced by both will be the same. This
> guarantee is necessary to achieve std::set::find's
> impressive logarithmic time performance. If the
> sort orders are not the same you get linear time
> performance, the same as the general std::find
> algorithm.
>
The requirement would be that any sequence sorted according to the set
predicate should be also sorted according to the predicate passed to
the template find.
True, if the requirement is broken, a performance degradation will
occur. However, for simple cases like the one showed by the OP,
writing the correct predicate is trivial. For more 'advanced' cases,
let users be careful.
This is somewhat similar to the situation with the set predicate - in
complex cases, it's possible to get it wrong and have a predicate that
doesn't impose total order on the target elements. So users need to be
careful...
-Vahan
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: marg64@yahoo.com (Vahan Margaryan)
Date: Wed, 1 May 2002 20:51:42 GMT Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message news:<3CCE0169.79C234B0@wizard.net>...
>
> It's not implemented, because std::set<> isn't meant to be used that
> way. If somedata plays no part in your operator<(), then
> std::map<sometype, somedata> would seem more appropriate than
> std::set<C>.
>
Say, I have a class Customer in my program, which is sorted based on
its 'name_' member, and holds lots of other information. You suggest
that I initiate a redesign, and move name_ out of Customer, because
set doesn't support this bit of functionality? I don't believe it is a
correct approach. I would instead hold a dummy Customer and assign its
name_ to the item being searched. Anyway, this is an inconvenience,
and not an imaginary one.
The following document describes a solution to a similar problem with
binary search. I would imagine that something similar could be
proposed for set. http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2001/n1313.html
-Vahan
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: 1 May 2002 21:57:05 GMT Raw View
Hyman Rosen wrote:
>
> Peter Koch Larsen wrote:
> > sometype key;
> > std::set< c >::iterator i = find(key);
>
> This will work fine if you define an operator==
> which takes an element type and a key type.
Such an operator==() won't do any good. The function std::set<C>::find()
is defined as taking an argument of type const C&. Passing it 'key'
won't work, in the absences of an implicit conversion from 'sometype' to
'C'. He's trying to avoid the expensive construction of a 'C' object;
defining such a conversion wouldn't exactly achieve that result.
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: 1 May 2002 21:57:04 GMT Raw View
> Peter Koch Larsen wrote:
> > std::set< C > cset;
> > bool operator <(const C c,const sometype& rhs) const
> > { return c.key < rhs; }
> > .....
> > sometype key;
> > std::set< c >::iterator i = find(key);
I think that last line should have been:
std::set< C >::iterator i = cset.find(key);
^ ^^^^^
Hyman Rosen:
> This will work fine if you define an operator==
> which takes an element type and a key type.
I don't believe it will. First, by default the operator
concerned is <, not == [1] (and this code is using the
default). Second, std::set::find's parameter is a const
reference to what this discussion is terming an
element type [2]. In this case it is const C&. Passing
anything that is not a C (or implicitly convertible to
a C) will be a compile error.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
NOTES:
1. A test for equality being concocted by
!( lhs < rhs ) && !( rhs < lhs )
Incidentally, it is hard to see how operator==
could ever be a viable comparison operator
for the purposes of sorting and finding within
a set without behaving rather oddly.
2. And what the standard, confusingly in view of
the terminology adopted here, terms a key_type.
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: 1 May 2002 21:57:04 GMT Raw View
Edward Diener:
> The key/data mapping concept, which is what you
> really want, is better implemented using a std::map<>.
> This would solve your problem below where you
> want to find some data based on a key.
The problem really is that neither std::set nor
std::map elegantly supports the situation where
you want the key to be (a small) part of the value
object being searched for rather than a separate
object.
If you use set, you end up having to construct and
pass a whole value for the set::find rather than
just the key part, which is increasingly inefficient
as the complexity of the value part relative to the
key part increases.
If you use map, you are forced to artificially separate
the key and data into different objects. To combine
the two into the single object you really wanted you
have to use something like std::pair.
This is a problem I have come across a number
of times.
It seemed to me this was what the OP was getting at.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: James Kanze <kanze@alex.gabi-soft.de>
Date: 2 May 2002 15:41:06 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> writes:
|> Edward Diener:
|> > The key/data mapping concept, which is what you really want, is
|> > better implemented using a std::map<>. This would solve your
|> > problem below where you want to find some data based on a key.
|> The problem really is that neither std::set nor std::map elegantly
|> supports the situation where you want the key to be (a small) part
|> of the value object being searched for rather than a separate
|> object.
I've used std::map a lot for this.
|> If you use set, you end up having to construct and pass a whole
|> value for the set::find rather than just the key part, which is
|> increasingly inefficient as the complexity of the value part
|> relative to the key part increases.
|> If you use map, you are forced to artificially separate the key
|> and data into different objects.
No. You're forced to duplicate the key part.
|> To combine the two into the single object you really wanted you
|> have to use something like std::pair.
|> This is a problem I have come across a number
|> of times.
|> It seemed to me this was what the OP was getting at.
Probably, but having successfully used both std::set and std::map in
such cases, I fail to see where it is a problem.
If I use std::map, I get some duplication of data.
If I use std::set, I have to provide special constructors for the
member type, so that just the key part is initialized.
Both solutions work. Most of the time, I find the map solution
cleaner; the only time I used the std::set solution was when mapping
to a relational database.
--
James Kanze mailto:kanze@gabi-soft.de
Conseils en informatique oriente objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: pkl@mailme.dk (Peter Koch Larsen)
Date: Tue, 7 May 2002 15:51:32 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> wrote in message news:<v_OA8.1825$9W4.921649@news6-win.server.ntlworld.com>...
> Vahan Margaryan:
>
[proposal of templatized find]
>
> I think we have to be careful allowing std::set::find
> to use a different sorting criteria to std::set::insert.
> This could easily break the guarantee that the
> ordering produced by both will be the same. This
> guarantee is necessary to achieve std::set::find's
> impressive logarithmic time performance. If the
> sort orders are not the same you get linear time
> performance, the same as the general std::find
> algorithm.
I do agree, that find can only use the same sort-criteria as
set::insert. But it is (in MY proposal at least) not my intention to
change the sort-order.
When searching with "my" templatized set::find-proposal, it would be
the programmers responsibility to assert that the same sort-order
should result if the new comparison was used.
Also, even if finds templated search did violate that order, the
result would not compromise std::set: the search-order when inserting
would not change.
Kind regards
Peter Koch Larsen
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: Jim Melton <jim.melton@lmco.com>
Date: 8 May 2002 15:56:15 GMT Raw View
"Vahan Margaryan" <marg64@yahoo.com> wrote in message
news:44d2ad1a.0205051140.76bb5021@posting.google.com...
>
> Jim Melton <jim.melton@lmco.com> wrote in message
news:<aav38o$hhb2@cui1.lmms.lmco.com>...
> >
> > The OP specifically asked about searching a set, but what about
searching a
> > generic container? A set implies a sort order, and as such can suggest
an
> > optimal access path -- optimal that is, when the key to search is
identical
> > to the sort key.
>
> Well, not strictly identical. If the search key is a 'subkey' of the
> sort key, in the sense that any sequence sorted by the sort key is
> also sorted by the search key (but not necessarily the opposite), you
> can still perform an optimal search. For instance, if you have
> lexicographically sorted strings, you can efficiently find a string
> that starts with 'q'. I mention this because I proposed (in a previous
> post to this thread) to have a member function in std::set (like the
> find_if you mention below) that can take 'subkey' predicates and
> perform an optimal search.
If a container of items is sorted lexically using a string key (e.g.,
std::less<std::string>), then finding items that "start with 'q'" is still
using a search identical to the sort key. Try finding items with 'q' as the
third letter, or based on some other value. Now your set is essentially
unsorted.
> > Indexing always adds memory. (A sorted container, such as std::set, can
be
> > considered an index that requires no additional memory, but that imposes
a
> > computation penalty to sort the data in the first place).
>
> In the case mentioned by the OP, how does std::set impose a
> computation penalty as compared with std::map?
The computational penalty is in determining the correct location in which to
insert each item. Since the set must always be in a sorted state, each
insertion involves some amount of processing to ensure this constraint is
met. Compare with std::vector<T>::push_back().
--
<disclaimer>
Opinions posted are those of the author.
My company doesn't pay me enough to speak for them.
</disclaimer>
--
Jim Melton
Software Architect, Fusion Programs
Lockheed Martin Astronautics
(303) 971-3846
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Garry Lancaster" <glancaster@ntlworld.com>
Date: Wed, 8 May 2002 19:54:20 GMT Raw View
Garry Lancaster:
> > I think we have to be careful allowing std::set::find
> > to use a different sorting criteria to std::set::insert.
> > This could easily break the guarantee that the
> > ordering produced by both will be the same. This
> > guarantee is necessary to achieve std::set::find's
> > impressive logarithmic time performance. If the
> > sort orders are not the same you get linear time
> > performance, the same as the general std::find
> > algorithm.
Vahan Margaryan:
> The requirement would be that any sequence sorted
> according to the set predicate should be also sorted
> according to the predicate passed to the template find.
>
> True, if the requirement is broken, a performance
> degradation will occur.
Without an additional traits mechanism you won't get
a graceful performance degradation: if you use an
any order, linear complexity comparison predicate with
a find function expecting a same ordering, log time
predicate it wouldn't work correctly. I wouldn't really
advocate a traits mechanism here: this solution we
are devising is complex enough already.
I suspect that permitting users to supply two different
sort criteria to std::set and requiring that they both
produce the same ordering will usually be:
1. Pointless. If the two produce the same ordering why not
just use one of them, as with my KeyAccessor suggestion?
2. Difficult to get right. Programmers using the standard
library have various responsibilities of course, but
part of designing a good interface for them to use is
to prevent likely mistakes from being made.
Has anyone got a good example where you need
multiple different predicates with the same ordering?
I thought about your field1/field2 subkey search idea for
a while until it occurred to me that std::set::lower_bound
already allows such a search:
struct value
{
int field1;
int field2;
bool operator<(const value& rhs) const
{
if( field1 == rhs.field1 )
{
return field2 < rhs.field2;
}
else
{
return field1 < rhs.field1;
}
}
};
void foo()
{
const int lowest_field2_value = 0;
std::set<value> set1;
fill_set_with_various_values( set1 );
value subkey_search_key = { 2, lowest_field2_value };
// Find any value with field1 = 2.
std::set<value>::iterator it = set1.lower_bound( subkey_search_key );
if( it == set1.end() || it->field1 != 2 )
{
std::cout << "No value with field1 = 2.\n"
}
else
{
std::cout << "Found a value with field1 = 2.\n";
}
}
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Anthony Williams"<anthwil@nortelnetworks.com>
Date: Thu, 9 May 2002 08:48:30 GMT Raw View
"Garry Lancaster" <glancaster@ntlworld.com> wrote in message
news:lldC8.520$1r.51735@news8-gui.server.ntli.net...
> Has anyone got a good example where you need
> multiple different predicates with the same ordering?
> I thought about your field1/field2 subkey search idea for
> a while until it occurred to me that std::set::lower_bound
> already allows such a search:
.... but requires creating an instance of the std::set value type which
contains only the subkey. This affects the overall usage of such a type,
such as requiring an additional constructor --- it might not even be
possible to construct such an object (depending on the details of the
implementation)
As I pointed out before, std::lower_bound (the algorithm, not the std::set
member) provides the necessary functionality, but would be very inefficient
unless optimized for std::set::iterators. This is (currently) QoI, but it
would be nice if the Standard guaranteed such an optimized version for
std::set and std::map --- for such containers, the elements cannot be
otherwise than in-order by the comparison function of the container, and
std::lower_bound requires that it is in-order by the specified comparison
function, so the two orderings _must_ be compatible, otherwise undefined
behaviour ensues.
Also, I note that the std::binary_search set of algorithms only require
forward iterators. Performing binary search with such iterators would be
very inefficient in terms of iterator usage --- searching a range defined
with forward iterators would require "traversing" twice as many elements as
there are in the range, even though the comparison is only applied log(n)
times, so for large ranges the traversal would dominate (do you fancy
iterating through 200000000 elements, just to do 8/9 comparisons?). For
set/map iterators this can be even worse --- unless each node in the binary
tree contains the count of nodes on the left and right (thus requiring extra
storage), implementing std::advance involves traversing the tree to find the
next element, then the one after that, and so on (quick back-of-an-envelope
calculation says I have to visit n log N nodes to advance n places in a tree
with N elements => 2N log N in total. I hope its really less than that).
However, since the comparator must yield the same sort order as the
container's comparator, you can just use the binary tree properties, and
visit log N nodes.
I see this as related to std::sort requiring random access iterators --- you
can implement an efficient (in terms of comparisons) sort algorithm for
forward and bi-directional iterators, but the iterator house keeping would
be inefficient, or the algorithm would require additional storage (copy to a
vector, sort, copy back, for example). Surely it is up to the user to decide
whether or not this is appropriate (and besides, we provide std::list.sort,
and list is the only sortable (i.e. sequence) container for which
non-random-access iterators are provided)
Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optical Components Ltd
The opinions expressed in this message are not necessarily those of my
employer
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: Hyman Rosen <hyrosen@mail.com>
Date: 29 Apr 2002 21:49:25 GMT Raw View
Peter Koch Larsen wrote:
> sometype key;
> std::set< c >::iterator i = find(key);
This will work fine if you define an operator==
which takes an element type and a key type.
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Edward Diener" <eldiener@earthlink.net>
Date: 29 Apr 2002 21:49:26 GMT Raw View
The key/data mapping concept, which is what you really want, is better
implemented using a std::map<>. This would solve your problem below where
you want to find some data based on a key.
"Peter Koch Larsen" <pkl@mailme.dk> wrote in message
news:61c84197.0204182320.12802623@posting.google.com...
> I have several times had a class in where comparison between two
> elements is done by comparing key:
> struct C
> {
> sometype key;
> somedata d;
> bool operator <(const C& rhs) const { return key < rhs.key; }
> };
> Often, i have wanted to keep elements in this class in a set - and all
> is well until i have wanted to do a search based on a specific key.
> What I would like to be able to do is something like:
> std::set< C > cset;
> bool operator <(const C c,const sometype& rhs) const { return c.key <
> rhs; }
> .....
> sometype key;
> std::set< c >::iterator i = find(key);
> but this is unfortunately not valid. I have to create a C with the
> appropriate key and use that value for searching and this could well
> be an expensive solution.
> Why is find not implemented so that you could use any type for
> comparison if only you have the right operators? The way I code (and
> yes - the problem might be there), it would be a nice convenience.
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 30 Apr 2002 18:51:12 GMT Raw View
Peter Kock Larson wrote:
> Hi all
>
> I have several times had a class in where comparison between two
> elements is done by comparing key:
>
> struct C
> {
> sometype key;
> somedata d;
> bool operator <(const C& rhs) const { return key < rhs.key; }
> };
>
> Often, i have wanted to keep elements in this class in a set - and all
> is well until i have wanted to do a search based on a specific key.
> What I would like to be able to do is something like:
>
> std::set< C > cset;
> bool operator <(const C c,const sometype& rhs) const { return c.key <
> rhs; }
> .....
> sometype key;
> std::set< c >::iterator i = find(key);
>
> but this is unfortunately not valid. I have to create a C with the
> appropriate key and use that value for searching and this could well
> be an expensive solution.
>
> Why is find not implemented so that you could use any type for
> comparison if only you have the right operators? The way I code (and
> yes - the problem might be there), it would be a nice convenience.
It's not implemented, because std::set<> isn't meant to be used that
way. If somedata plays no part in your operator<(), then
std::map<sometype, somedata> would seem more appropriate than
std::set<C>.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: pkl@mailme.dk (Peter Koch Larsen)
Date: 29 Apr 02 13:01:00 GMT Raw View
Hi all
I have several times had a class in where comparison between two
elements is done by comparing key:
struct C
{
sometype key;
somedata d;
bool operator <(const C& rhs) const { return key < rhs.key; }
};
Often, i have wanted to keep elements in this class in a set - and all
is well until i have wanted to do a search based on a specific key.
What I would like to be able to do is something like:
std::set< C > cset;
bool operator <(const C c,const sometype& rhs) const { return c.key <
rhs; }
.....
sometype key;
std::set< c >::iterator i = find(key);
but this is unfortunately not valid. I have to create a C with the
appropriate key and use that value for searching and this could well
be an expensive solution.
Why is find not implemented so that you could use any type for
comparison if only you have the right operators? The way I code (and
yes - the problem might be there), it would be a nice convenience.
Kind regards
Peter Koch Larsen
[ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]