Topic: STL binary functions on heterogenous types


Author: phalpern@truffle.ma.ultranet.com (Pablo Halpern)
Date: 1996/11/23
Raw View
mcg@wheezy.CS.Berkeley.EDU (Michael C. Greenspon) wrote:

>The elegance and practicality of STL is wearing off in my mind as we use
>it for various mundane implementation tasks.

Part of the elegance of the STL is is extensibility. I was able to fix
your problem (below) by defining several new, re-usable components,
which I'll describe. Although the solution is overkill for this one
problem, the re-usability of the solution will prevent having to
re-invent it for similar problems in the future.

>                                              Here's a simplification of
>one common task-- keeping a collection of pointers to named/tagged objects
>in a container, where there are comparison operators defined on the object
>for retrieval--
>
>struct obj {
>   obj(const string& name);
>   const string& getName() const;
>   bool operator==(const string& n) { return getName() == n; }
>};
>
>vector<obj*> objs;
>
>Now I'd want to do something conceptually like--
>
>obj* find(const string& name) {
>   return find ( objs.begin(), objs.end(), name );
>}

[ much text deleted ]

You correctly point out that the standard function objects, like
equal_to<T> force both arguments to be of the same type, even though
that is not always what is wanted. I suspect the reason for this is that
90% of the time, both arguments are the same. Defining equal_to<T1,T2>
would require users to type something like equal_to<int,int>, which
could be annoying. I solved this problem by very easily creating a
function object called het_equal_to<T1,T2> for comparing heterogeneous
arguments. (You might also want het_less_than<>, het_greater_than<>,
etc.)

You also point out some "missing" function objects and adaptors in the
libary. In particular, you cannot use the standard predicate function
objects when one or both of the arguments are pointers. ObjectSpace's
"solution" to the pointer problem is not, IMO, well conceived.

My approach to your problem was to define the following set of function
adaptors:

  dereference_arg(operation) // dereferences arg of unary operation
  dereference_1st(operation) // dereferences 1st arg of binary operation
  dereference_2nd(operation) // dereferences 2nd arg of binary operation
  dereference_both(operation) // dereferences both args of operation

Using everything together, I re-wrote your find function as follows
(believe it or not, this is real, tested, working code).

obj* find(const string& name)
{
  return *find_if(objs.begin(), objs.end(),
           bind2nd(dereference_1st(het_equal_to<obj, string>()), name));
}

*** OR ***

obj* find(const string& name)
{
  return *find_if(objs.begin(), objs.end(),
           dereference_arg(bind2nd(het_equal_to<obj, string>(), name)));
}

I also tried creating a simpler adaptor called dereference<> which was
applied to the argument object type, instead of to the function object.
I was uneasy about its use of subtle conversions between pointers and
non-pointers and it turns out my concern was valid. If obj::operator==()
is implemented as a member function instead of a friend function, then
my "simpler" adaptor would not work. Here is what the code would look
like if it worked:

// NOTE: this function uses the not-very-good dereference<> adaptor. It
// does not work if obj::operator==() is a member function (instead of a
// friend).
obj* find(const string& name)
{
  return *find_if(objs.begin(), objs.end(),
              bind2nd(het_equal_to<dereference<obj>, string>(), name));
}

These adaptors should work equally well with smart pointer types,
although I haven't tested that. I can provide you with source to these
(quite simple) adaptors and function objects if you email me. They could
be made part of the standard, but part of the beauty of STL is that it
provides a minimal set of components which can be extended and combined.
Perhaps some of the components I have described (or variants of them)
*should* be part of the standard, but it is too late now. Fortunately,
these are only libraries, not language concepts, so you can program them
yourself, or get them from 3rd parties.

>
>This is a greatly simplified example that I hope starts to illustrate some
>practical problems we're running into. Perhaps it's just our ignorance or
>low blood sugar? Or can someone sugggest a nice way to apply the STL
>pattern for this situation? Perhaps a templatized predicate for this case,
>but a general one? We don't want to have to redefine it for every sort of
>'named' class, every type of 'name', or instance of a storage scheme  (raw
>pointer, auto_ptr, double-deref, computed-index-deref, etc.)

Of course, if the problem were as simple as you stated, it would have
been easier to implement your find() function either by defining a
special-purpose predicate object by not using standard algorithm at all.
If, on the other hand, the same issues come up repeatedly, then it is
worth extending the STL (or obtaining an extension). It's not perfect,
but I haven't seen any other container class libraries that do a better
job at generalizing the vast array of possibilities you mentioned.

>The next step of complexity involves, e.g., adding a 'named' object (say
>it's a string again) to an associative container like a hash set, but
>wanting to compare(retrieve) using a char* (to avoid consing up a string
>~100,000 times a second.) Hashset won't work because it requires the
>stored object to be compared against an instance of the same class; we'd
>have to cons up a surrogate containing a name string just to do the
>compare when we really just want to use the char* we already have to do
>the lookup. Nor do we want to use a hash map and store two objects where
>we really only need to store one (would still have to turn the char* into
>a string for the key comparison.) I think the tree-based associative
>containers have the same problem. So again we're dealing with trying to
>use heterogenous comparison operators, where the right and left args are
>different types and perhaps involve some ops like a pointer deref to get
>to the actual object on which the comparison is defined. Perhaps someone
>has a better generalization of the problem and suggestions? (It's late.)

This problem occurs with the regular map and set classes as well as the
hashed containers. (Hashed containers are not part of the draft
standard, BTW.) I tried to think of a really general way that this
problem could be fixed in the STL without unduly complicating the
current interface but ran out of ideas. There are clearly several ways
to skin this cat; the problem is in avoiding having to redefine all of
the iterator classes, etc. for your special version of map. Good luck.


-------------------------------------------------------------
Pablo Halpern                   phalpern@truffle.ultranet.com

I am self-employed. Therefore, my opinions *do* represent
those of my employer.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: mcg@wheezy.CS.Berkeley.EDU (Michael C. Greenspon)
Date: 1996/11/10
Raw View
The real question was and still is: must the binary operator templates
(equal_to<T>, greater<T>, etc.) be declared over only one type T? Is there
a formal reason they couldn't be (perhaps additionally) declared over
heterogenous types, i.e. equal_to<T,U>??

In article <55tko6$ejo@decaxp.harvard.edu>, ken@digitas.org (Ken Shan) wrote:

> Michael C. Greenspon (mcg@wheezy.CS.Berkeley.EDU) wrote:
> > Now I'd want to do something conceptually like--
> > obj* find(const string& name) {
> >    return find ( objs.begin(), objs.end(), name );
> > }
>
> I think for this you really want to use find_if(), and forget about
> characterizing the comparison between an object's name with a string.
> Think of it this way:  The name of an object is just one of the many
> attributes of the object; find() is supposed to be used for true
> object *identity* while find_if() is to be used for searching for
> attributes (predicates), in this case, the predicate of having the
> specified name.  Of course, it might be nice to be able to have helper
> template classes out of which a "generic attribute comparison
> predicate" can be composed, but that's less important and doesn't
> require a fundamental change to STL -- just an addition.

whoops, I was a bit too cavalier in my simplifying the problem statement
by typing 'find' instead of 'find_if' even though the solution idea
further down used a compose'd predicate. This really wasn't the point but
ok I agree with your rationale for equality vs. attribute comparison
despite my perhaps obsolete habit of wanting to overload op== for the
object 'name'.

>
> > The next step of complexity involves, e.g., adding a 'named' object (say
> > it's a string again) to an associative container like a hash set, but
> > wanting to compare(retrieve) using a char* (to avoid consing up a string
> > ~100,000 times a second.) Hashset won't work because it requires the
> > stored object to be compared against an instance of the same class; we'd
> > have to cons up a surrogate containing a name string just to do the
> > compare when we really just want to use the char* we already have to do
> > the lookup....

> I think I agree with you here.  I believe the "right" way of doing
> associative containers is to expose the "pair" interface to the
> outside also, i.e., I should be able to supply a "pseudo-pair class"
> to the set<> (or multiset<>) template, which provides member functions
> first() and second() (*not* member variables first and second, which
> IMHO should be private to pair<> anyway), and the usual overloads.

Yes, good idea. Glad someone read this far! I've been a bit uncomfortable
with the idea of a canonicalization like naked members of pair being used
as the api standard.

What we've done for hashset is to parameterize it over both a key and
value type, with the KeyOfValue operator defined as ident<Key,Value> which
means the value must define a conversion operator that returns the key.
This is just a hack but we've got applications (not standard libraries) to
write so it's expedient. Again I point out this is a problem with the
associative container interfaces generally i.e. map, set, multiset,
multimap etc. not just the hashtable (unordered) flavors.

--Michael
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: mcg@wheezy.CS.Berkeley.EDU (Michael C. Greenspon)
Date: 1996/11/07
Raw View
The elegance and practicality of STL is wearing off in my mind as we use
it for various mundane implementation tasks. Here's a simplification of
one common task-- keeping a collection of pointers to named/tagged objects
in a container, where there are comparison operators defined on the object
for retrieval--

struct obj {
   obj(const string& name);
   const string& getName() const;
   bool operator==(const string& n) { return getName() == n; }
};

vector<obj*> objs;

Now I'd want to do something conceptually like--

obj* find(const string& name) {
   return find ( objs.begin(), objs.end(), name );
}

which of course won't work because we're storing a vector of pointers not
references to the objects. The implementation we're using (from
ObjectSpace) also provides some "pointer deref" operator functions, such
as "equals_to_p", which compares its deref'd args using op==. This doesn't
help either however. Nor do bind1st/compose/etc do much good. The basic
problem is that the binary_function equal_to is only templatized over one
type. Couldn't this be declared--

template <class T, class U>
struct equal_to : public binary_function<T,U,bool> {
   bool operator() (const T&, const U&) const;
};

instead of requiring the right and left args to both be type T&??

Also I want to avoid defining a conversion like const string&() on the
obj. In this simple example that could work but it would be a source of
error in a more involved case. I'd rather have explicit comparators using
op== on the class or globally.  If equal_to was over two types I could do
something like

   find ( objs.begin(), objs.end(),
      bind2nd (
         compose2 (
             equal_to< obj, string >(),
               deref< obj >(), ident< string >() ),
          name ));

assuming I define a deref<T> template for *.
Is this progress??

Of course I can punt and write

   vector<objs*>::iterator i;
   for ( i = objs.begin(); i != objs.end(); ++i )
      if (**i == name ) break;
   return *i;

but that isn't very elegant is it, and it doesn't make use of the
'reusable' find() algorithm.

This is a greatly simplified example that I hope starts to illustrate some
practical problems we're running into. Perhaps it's just our ignorance or
low blood sugar? Or can someone sugggest a nice way to apply the STL
pattern for this situation? Perhaps a templatized predicate for this case,
but a general one? We don't want to have to redefine it for every sort of
'named' class, every type of 'name', or instance of a storage scheme  (raw
pointer, auto_ptr, double-deref, computed-index-deref, etc.)

The next step of complexity involves, e.g., adding a 'named' object (say
it's a string again) to an associative container like a hash set, but
wanting to compare(retrieve) using a char* (to avoid consing up a string
~100,000 times a second.) Hashset won't work because it requires the
stored object to be compared against an instance of the same class; we'd
have to cons up a surrogate containing a name string just to do the
compare when we really just want to use the char* we already have to do
the lookup. Nor do we want to use a hash map and store two objects where
we really only need to store one (would still have to turn the char* into
a string for the key comparison.) I think the tree-based associative
containers have the same problem. So again we're dealing with trying to
use heterogenous comparison operators, where the right and left args are
different types and perhaps involve some ops like a pointer deref to get
to the actual object on which the comparison is defined. Perhaps someone
has a better generalization of the problem and suggestions? (It's late.)

Thanks,
--Michael


[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: ken@digitas.org (Ken Shan)
Date: 1996/11/07
Raw View
Michael C. Greenspon (mcg@wheezy.CS.Berkeley.EDU) wrote:
> Now I'd want to do something conceptually like--
> obj* find(const string& name) {
>    return find ( objs.begin(), objs.end(), name );
> }

I think for this you really want to use find_if(), and forget about
characterizing the comparison between an object's name with a string.
Think of it this way:  The name of an object is just one of the many
attributes of the object; find() is supposed to be used for true
object *identity* while find_if() is to be used for searching for
attributes (predicates), in this case, the predicate of having the
specified name.  Of course, it might be nice to be able to have helper
template classes out of which a "generic attribute comparison
predicate" can be composed, but that's less important and doesn't
require a fundamental change to STL -- just an addition.

> The next step of complexity involves, e.g., adding a 'named' object (say
> it's a string again) to an associative container like a hash set, but
> wanting to compare(retrieve) using a char* (to avoid consing up a string
> ~100,000 times a second.) Hashset won't work because it requires the
> stored object to be compared against an instance of the same class; we'd
> have to cons up a surrogate containing a name string just to do the
> compare when we really just want to use the char* we already have to do
> the lookup. Nor do we want to use a hash map and store two objects where
> we really only need to store one (would still have to turn the char* into
> a string for the key comparison.) I think the tree-based associative
> containers have the same problem. So again we're dealing with trying to
> use heterogenous comparison operators, where the right and left args are
> different types and perhaps involve some ops like a pointer deref to get
> to the actual object on which the comparison is defined. Perhaps someone
> has a better generalization of the problem and suggestions? (It's late.)

I think I agree with you here.  I believe the "right" way of doing
associative containers is to expose the "pair" interface to the
outside also, i.e., I should be able to supply a "pseudo-pair class"
to the set<> (or multiset<>) template, which provides member functions
first() and second() (*not* member variables first and second, which
IMHO should be private to pair<> anyway), and the usual overloads.
For the situation above, for example, the first() member function
would simply be implemented as second()->name() or something like
that.

--
blue | Ken; Shan, Chung-chieh; Sian7, Tiong1-kiat8; ccshan@fas.harvard.edu.
 ()  | Your code today becomes the mind tomorrow:  Your plan its means,
 /\  | your dream its ends, your ideal its elegance.  Hack on.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: Matt Austern <austern@sgi.com>
Date: 1996/11/08
Raw View
(I'm crossposting to comp.lang.c++.moderated, and directing followups
to that group, because my reply has very little to do with C++
standardization.)

mcg@wheezy.CS.Berkeley.EDU (Michael C. Greenspon) writes:

> The elegance and practicality of STL is wearing off in my mind as we use
> it for various mundane implementation tasks. Here's a simplification of
> one common task-- keeping a collection of pointers to named/tagged objects
> in a container, where there are comparison operators defined on the object
> for retrieval--
>
> struct obj {
>    obj(const string& name);
>    const string& getName() const;
>    bool operator==(const string& n) { return getName() == n; }
> };
>
> vector<obj*> objs;
>
> Now I'd want to do something conceptually like--
>
> obj* find(const string& name) {
>    return find ( objs.begin(), objs.end(), name );
> }

As Michael points out, find() can't be used for this task: it compares
for equality, and he's not interested in equality.

However, find_if can be used.  Find_if is an algorithm that's much
like find, but it uses an arbitrary user-supplied predicate instead
of just testing for equality.  In this case you could just write
your own predicate, something like this.
    struct equal_name : public unary_function<string, bool> {
       equal_name(const string& s) : str(s) {}
       bool operator()(const obj& o) { return o.getName() == str; }
       string str;
    };

Alternatively, you could build the predicate out of predefined
function objects: equal_to<string>, binder2nd, unary_compose, and
mem_fun.  The second approach is, I suppose, more elegant, but it's
also a bit more obscure.

In any case, this is all quite general: just as the STL includes both
find and find_if, so all STL algorithms and containers that perform
equality or magnitude comparisons have a version where users can
specify their own test.  So, for example, you can sort a range in
reverse order by providing greater<T> as the comparison function to
sort() instead of using the default, which is less<T>.  Similarly, you
can provide an alternate comparison function for set and map, or an
alternate hash function for hash_set and hash_map.

      [ Send an empty e-mail to c++-help@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]