Thread

Topic: Pointer comparisons and templates

Author: jbn@lulea.trab.se (Johan Bengtsson)
Date: 16 Dec 92 16:35:41 GMT Raw View

Dag Bruck (dag@bellman.control.lth.se) wrote:
: In <comp.std.c++> maxtal@extro.ucc.su.OZ.AU (John MAX Skaller) writes:
: > You could always write the progam in terms of cmp(T,T),
: >not <. The it would work for pointers too, you would just
: >have to define cmp(T,T) for each type, including pointers,

: I think that would be a serious effort, considering how many different
: types of pointers you have in an application.

: > template<class T> int cmp(T a1, T a2){return a1<a2;}
: > int cmp(void* a1, void* a2){return ptrcmp(a1,a2)<0;}

: I don't think this one flies.  For example,

I suggested a similar template in a previous response.  Jerry Schwarz
pointed out in private mail that it doesn't work.  Only _exact_ type
matches on functions are preferred over function templates, so all
pointers except void* would end up calling the template.

:  int* p;
:  int* q;
:  cmp(p, q);

: probably causes a template overloading ambiguity.

Well, no.  Since there is no trivial conversion from int* to void*,
the template is preferred.  The _standard_ conversion int* --> void*
is not considered.

: You have to
: explicitly define a cmp() for every type of pointer you have, I think.

So it seems.  And in practice it would be too error-prone to do that.

Jerry also said (I hope he doesn't mind me posting it):
>
> You could write
>
>  template<class T> inline int isless(T* a, T* b) {
>       return ptrcmp(a,b) ;
>         }
>
> but then you have duplicate definitions of isless(int*,int*),
> which isn't allowed.

So this is a general limitation for C++ templates then:
Using templates you can handle one unbounded set of types, but never
two unbounded sets, at least not with the same template name.

Dag's orignal claim that it is impossible to create templates
that use "<" for all objects and ptrcmp() for all pointers seems
to be valid.

Using a bit of trickery, it is possible to create a template that
works for pointers (ptrcmp) and objects (<), _except_ for classes
that define automatic conversion to pointers (Dag found that one).
I tested this, but won't describe the details here.  Mail me if
you are interested.

: > The real problem cannot be solved at all, namely, for
: >a user define < or cmp, ensuring it is a total order.

This is a good point (IMHO).

--
--------------------------------------------------------------------------
| Johan Bengtsson, Telia Research AB, Aurorum 6, S-951 75 Lulea, Sweden  |
| Johan.Bengtsson@lulea.trab.se; Voice:(+46)92075471; Fax:(+46)92075490  |
--------------------------------------------------------------------------

Author: jimad@microsoft.com (Jim Adcock)
Date: 16 Dec 92 20:20:54 GMT Raw View

In article <1992Dec15.164854.13070@ucc.su.OZ.AU> maxtal@extro.ucc.su.OZ.AU (John MAX Skaller) writes:
| I'm not sure we've all agreed on the relationship, if any,
|between < and ptrcmp, but I think we agree that
|
| "there will be a library function ptrcmp(void*,void*)
| which defines a total order"
|
|unless you can convince us the < is worth more than
|making assumptions about all future machine architectures.

This is silly. "We all agree unless you prove otherwise" ????

Agreement is agreement. If people agree, then they agree. If
they don't agree, they don't agree. No proof is involved. Standards
arise around areas of general agreement. Things that people can't
or won't agree upon don't get standardized. Frequently compromise
is involved in reaching agreement, such that people at both extremes
of some position give up something in order to reach a statement that
all can agree to.

I suggest that OODBMS writers need to be consulted on this matter,
since in practice I think they are the most likely to be heavily
affected.

Again, you do not improve portability by making conforming C++
implementations unavailable on some systems. You do not improve
portability by insisting on functionality that cannot be efficiently
implemented on some systems. You get portabilitiy by writing
your code in a manner that can be affordably implemented on all
systems that you are interested in porting to.

I believe OODBMSs, lisp machine, OOArchitectures, etc, should be
systems we are interested in porting C++ code to.

Author: maxtal@extro.ucc.su.OZ.AU (John MAX Skaller)
Date: Thu, 17 Dec 1992 21:41:38 GMT Raw View

In article <1992Dec15.222952.17059@lth.se> dag@bellman.control.lth.se (Dag Bruck) writes:
>
>The first error I made, and the fact that nobody found it, is a strong
>indication that specialization of templates is a real complication.
>

 Partly because the type system is not so hot either.
When you instantiate a template

 int i;
 f(i);

what get generated? f(int)? f(const int)? f(int&)? f(const int&)?
(and 4 more cases involving volatile)

IMHO all 8 of these functions should be considered redefintions:

 f(int);
 f(int&); // error, redefinition
 f( .. ) // as above for all the 6 other cases


--
;----------------------------------------------------------------------
        JOHN (MAX) SKALLER,         maxtal@extro.ucc.su.oz.au
 Maxtal Pty Ltd, 6 MacKay St ASHFIELD, NSW 2131, AUSTRALIA
;--------------- SCIENTIFIC AND ENGINEERING SOFTWARE ------------------

Author: dag@bellman.control.lth.se (Dag Bruck)
Date: Mon, 14 Dec 1992 07:58:53 GMT Raw View

In the recent discussion on pointer comparisons, Fergus Hendersson
suggested adding a standard library function ptrcmp() instead of
extending the definitions of "<" and ">" for pointers.  I think we
also agreed on a reasonable relationship between ptrcmp() and the
relational operators.

While I think ptrcmp() is better than nothing, I still think extending
the definition of the relational operators is far better.  Consider
this template class (the code may be wrong, I'm just inventing an
almost real example for the purpose of discussion):

 template <class T>
 class Tree {
 private:
  struct Node {
   T element;
   Node* left;
   Node* right;
   Node(const T& e) : element(e),left(0),right(0) {}
   ~Node() { delete left; delete right; }
  };
  Node* root;
  T& InsertElement(Node*& root, const T& element);
 public:
  Tree() : root(0) {}
  ~Tree() { delete root; }
  T& Insert(const T& e) { return InsertElement(root, e); }
 };

Most of the template class Tree is a skeleton around InsertElement()
which is the function that actually does the job:

 template <class T>
 T& Tree<T>::InsertElement(Node*& root, const T& element)
 {
  if (root == 0) {
   root = new Node(element);
   return root->element;
  }
  if (element < root->element)
   return InsertElement(root->left, element);
  if (root->element < element)
   return InsertElement(root->right, element);
  return root->element;
 }

Note that a copy of the element argument is inserted into the tree,
and that InsertElement() always returns a reference to this copy.

Now, here's the problem: this implementation works fine for all
element types that have an "operator < ()" including plain built-in
types and many user-defined types.  However, it is not guaranteed to
work with pointer elements for reasons we have discussed before.

They way around this problem is to make a specialization for pointers:

 void*& Tree<void*>::InsertElement(Node*& root, const void*& element)
 {
  if (root == 0) {
   root = new Node(element);
   return root->element;
  }
  if (ptrcmp(element, root->element) < 0)  // ****
   return InsertElement(root->left, element);
  if (ptrcmp(root->element, element) < 0)  // ****
   return InsertElement(root->right, element);
  return root->element;
 }

The changes are marked with ****, and the function signature is also
changed, of course.  We must also make a specialized Tree class for
pointers to do the appropriate type casting to/from void*.

 template <class T>
 class PtrTree : private Tree<void*> {
 public:
  PtrTree() {}
  ~PtrTree {}
  T*& Insert(const T*& e)
   { return (T*&) Tree<void*>::Insert(e); }
 };

I think this covers it all, but I'm not quite sure if the derivation
should be public instead.  In any case, here are a few pros and cons:

1.  It can be done with a minimal change to the language (adding a
standard library function).  It can also be done in a non-portable way
without the library function.

2.  In the general case we cannot use the trick of converting to a
long integer, but it could be used in the specialization for void*.

3.  The programmer using class Tree must remember to use a different
class for pointers.  There is no type checking that prevents the user
from using Tree<Foo*> -- it will compile fine but produce the wrong
results on some machines.

4.  The implementor and maintainer of similar classes must remember to
handle pointers in a special way.  The public interface of each data
structure must be replicated, and each class requires at least a
specialization for void*.

I regard (3) is biggest problem, in particular as there is no way (as
far as I know) to diagnose the problem.   It may work fine on some
machines, and then break when you port the code to a new machine or a
new memory model on the same machine.

I think (4) is a lesser problem because you only have to get it right
every time you write or modify a general data structure, not every
time you use it.  I think the handling of pointers adds a significant
complexity to a rather simple data structure "right out of the
textbook."

Any comments or suggestions would be welcome.  I would be particularly
happy if you could show that I have described a problem that in fact
does not exist if you use a suitable programming style.

Dag Bruck
--
Department of Automatic Control  E-mail: dag@control.lth.se
Lund Institute of Technology
P. O. Box 118    Phone: +46 46-104287
S-221 00 Lund, SWEDEN   Fax:    +46 46-138118

Author: fjh@munta.cs.mu.OZ.AU (Fergus James HENDERSON)
Date: Tue, 15 Dec 1992 08:50:09 GMT Raw View

dag@bellman.control.lth.se (Dag Bruck) writes:

>In the recent discussion on pointer comparisons, Fergus Hendersson
>suggested adding a standard library function ptrcmp() instead of
>extending the definitions of "<" and ">" for pointers.  I think we
>also agreed on a reasonable relationship between ptrcmp() and the
>relational operators.
>
>While I think ptrcmp() is better than nothing, I still think extending
>the definition of the relational operators is far better.
[...]
>3.  The programmer using class Tree must remember to use a different
>class for pointers.  There is no type checking that prevents the user
>from using Tree<Foo*> -- it will compile fine but produce the wrong
>results on some machines.
[...]
>I regard (3) is biggest problem, in particular as there is no way (as
>far as I know) to diagnose the problem.   It may work fine on some
>machines, and then break when you port the code to a new machine or a
>new memory model on the same machine.

The problem is that there are two different sorts of "ordering" operations
that are useful in different circumstances. The first is comparing
pointers to objects in the same array, the second is comparing arbitrary
pointers. On some architectures, the former is much faster than the latter.

Extending the definition of the relational operators would make many programs
considerably less efficient, break some conformant but not strictly conformant
programs, and would create more problems than it would solve, IMHO.

Although ptrcmp() is not a perfect solution as far as template data-structures
go, the alternative would also have difficulties -- what if I want a Tree
of pointers all of which _are_ in the same array, and efficiency is
very important? In that case, Tree<Foo*> is exactly what I want.

--
Fergus Henderson             fjh@munta.cs.mu.OZ.AU
This .signature virus is a self-referential statement that is true - but
you will only be able to consistently believe it if you copy it to your own
.signature file!

Author: jbn@lulea.trab.se (Johan Bengtsson)
Date: 15 Dec 92 01:14:31 GMT Raw View

Dag Bruck (dag@bellman.control.lth.se) wrote:
: In the recent discussion on pointer comparisons, Fergus Hendersson
: suggested adding a standard library function ptrcmp() instead of
: extending the definitions of "<" and ">" for pointers.  I think we
: also agreed on a reasonable relationship between ptrcmp() and the
: relational operators.

: While I think ptrcmp() is better than nothing, I still think extending
: the definition of the relational operators is far better.  Consider
: this template class (the code may be wrong, I'm just inventing an
: almost real example for the purpose of discussion):

[ class template needing total ordering on objects _and_ pointers deleted ]

: Now, here's the problem: this implementation works fine for all
: element types that have an "operator < ()" including plain built-in
: types and many user-defined types.  However, it is not guaranteed to
: work with pointer elements for reasons we have discussed before.

: They way around this problem is to make a specialization for pointers:

Dag, there is another way:

Have the class template rely on a function template isless() instead,
with a specialization for pointers:

template<class T> inline int isless(T a, T b)
{ return a < b; }

inline int isless( const void* a, const void* b )
{ return ptrcmp(a,b) < 0; }

That ought to do it.  NOTE: I don't yet have a compiler that does
function templates, so this is untested.  Perhaps someone with a
more real compiler can check that this really works?

--
--------------------------------------------------------------------------
| Johan Bengtsson, Telia Research AB, Aurorum 6, S-951 75 Lulea, Sweden  |
| Johan.Bengtsson@lulea.trab.se; Voice:(+46)92075471; Fax:(+46)92075490  |
--------------------------------------------------------------------------

Author: maxtal@extro.ucc.su.OZ.AU (John MAX Skaller)
Date: Tue, 15 Dec 1992 16:48:54 GMT Raw View

In article <1992Dec14.075853.3399@lth.se> dag@bellman.control.lth.se (Dag Bruck) writes:
>In the recent discussion on pointer comparisons, Fergus Hendersson
>suggested adding a standard library function ptrcmp() instead of
>extending the definitions of "<" and ">" for pointers.  I think we
>also agreed on a reasonable relationship between ptrcmp() and the
>relational operators.

 I'm not sure we've all agreed on the relationship, if any,
between < and ptrcmp, but I think we agree that

 "there will be a library function ptrcmp(void*,void*)
 which defines a total order"

unless you can convince us the < is worth more  than
making assumptions about all future machine architectures.

>
>Now, here's the problem: this implementation works fine for all
>element types that have an "operator < ()" including plain built-in
>types and many user-defined types.

 No...see below...

>However, it is not guaranteed to
>work with pointer elements for reasons we have discussed before.

 So clearly it doenst work with arbitrary user defined types
either for precisely the same reason. The user defined type
need not be a total order.

 Imagine my suprise when qsort failed! I couldnt
understand it until I realised by definition of < was wrong!

 The only way to fix this is to have a proper assertion
mechanism so you  can state that < for type T is a total order.
Gee,   T::operator< could return anything!

>
>3.  The programmer using class Tree must remember to use a different
>class for pointers.  There is no type checking that prevents the user
>from using Tree<Foo*> -- it will compile fine but produce the wrong
>results on some machines.

>I regard (3) is biggest problem, in particular as there is no way (as
>far as I know) to diagnose the problem.   It may work fine on some
>machines, and then break when you port the code to a new machine or a
>new memory model on the same machine.

 There are no constraints on templates that will ensure anything
except the existance of operator <.

 You could always write the progam in terms of cmp(T,T),
not <. The it would work for pointers too, you would just
have to define cmp(T,T) for each type, including pointers,

 template<class T> int cmp(T a1, T a2){return a1<a2;}
 int cmp(void* a1, void* a2){return ptrcmp(a1,a2)<0;}

>
>Any comments or suggestions would be welcome.  I would be particularly
>happy if you could show that I have described a problem that in fact
>does not exist if you use a suitable programming style.

 The problem you mention might be solved with 'cmp'.

 The real problem cannot be solved at all, namely, for
a user define < or cmp, ensuring it is a total order.
Except of course by paper and pencil proof for each case.

--
;----------------------------------------------------------------------
        JOHN (MAX) SKALLER,         maxtal@extro.ucc.su.oz.au
 Maxtal Pty Ltd, 6 MacKay St ASHFIELD, NSW 2131, AUSTRALIA
;--------------- SCIENTIFIC AND ENGINEERING SOFTWARE ------------------

Author: jimad@microsoft.com (Jim Adcock)
Date: 15 Dec 92 20:01:12 GMT Raw View

In article <1992Dec14.075853.3399@lth.se> dag@bellman.control.lth.se (Dag Bruck) writes:
|  if (element < root->element)
|   return InsertElement(root->left, element);
|  if (root->element < element)
|   return InsertElement(root->right, element);
|  return root->element;
| }
|
|Note that a copy of the element argument is inserted into the tree,
|and that InsertElement() always returns a reference to this copy.
|
|Now, here's the problem: this implementation works fine for all
|element types that have an "operator < ()" including plain built-in
|types and many user-defined types.  However, it is not guaranteed to
|work with pointer elements for reasons we have discussed before.

I disagree.  Consider an implementation that fully implements a "strict"
interpretation of IEEE doubles.  Such an implementation fails to "work"
for exactly the same reason as the pointer implementations in question.
Namely your implementation assumes "<" always returns an ordering,
but neither IEEE doubles, PC pointers, nor user implemented classes
may interpret "<" in such a manner.  In the case of PC pointers all
pointers must be to the same array for your code to work.  In the
case of a strict implementation of IEEE numbers there better not be
any NANs presented to this code.  In the case of user implemented
classes, god only knows what "<" might mean -- I/O an element up
the wazoo, for all we know!

This is not an issue of "<" at all, rather it is an issue of templates.
Templates in C++ are the equivalent to "accidental reuse" in Smalltalk.
If the syntax fits, you've got a template class, no matter that the
semantics don't work.  No matter how slow you make pointers, you still
are left with this template "accidental reuse" problem.

======

"Trust the programmer."

"Make it fast, even if it is not guaranteed to be portable."

-- The ANSI-C Rationale

Author: dag@bellman.control.lth.se (Dag Bruck)
Date: Tue, 15 Dec 1992 22:29:52 GMT Raw View

In <comp.std.c++> maxtal@extro.ucc.su.OZ.AU (John MAX Skaller) writes:
> You could always write the progam in terms of cmp(T,T),
>not <. The it would work for pointers too, you would just
>have to define cmp(T,T) for each type, including pointers,

I think that would be a serious effort, considering how many different
types of pointers you have in an application.

> template<class T> int cmp(T a1, T a2){return a1<a2;}
> int cmp(void* a1, void* a2){return ptrcmp(a1,a2)<0;}

I don't think this one flies.  For example,

 int* p;
 int* q;
 cmp(p, q);

probably causes a template overloading ambiguity.  You have to
explicitly define a cmp() for every type of pointer you have, I think.

> The real problem cannot be solved at all, namely, for
>a user define < or cmp, ensuring it is a total order.

That is true, of course, for many applications.

I would also like to point out a few errors in the code for class Tree
in an earlier posting.  The specializations for void* in a few places
says "const void*&" which should be "void* const&".  Another error is
that Insert() returns a T&, which should be a const T&.

The first error I made, and the fact that nobody found it, is a strong
indication that specialization of templates is a real complication.

    -- Dag