Topic: Tagged Pointers for C++


Author: maxtal@extro.ucc.su.OZ.AU (John MAX Skaller)
Date: Fri, 7 Aug 1992 01:54:22 GMT
Raw View
Below is a first draft informal proposal for adding tagged pointers
to C++. I have posted to comp.object because some readers there may
have experience with tagged pointers.

Tagged pointers have been developed specifically to cope with
heteromorphism, and as a sound alternative to downcasting or
user supplied Run-Tim-Type-Information in those cases where
heterogeneity is desired.

      TAGGED POINTERS FOR HETEROMORPHISM
      ----------------------------------


1) Description
--------------


A tagged pointer is a pointer to an object which is one
of several discrete and declared types, together with
a tag or discriminant indicating which type the pointer
actually points to.

Tagged pointers are fully statically type safe,
the only run-time errors are those normally associated
with pointers. In addition, corrupt tag values can be detected.
Tagged pointers are also highly efficient.

This informal proposal is expected to be complemented by
a proposal for tagged unions. Tagged pointers and tagged unions
are similar but separate concepts, differing principally on
the location of the type information.

2) Declaration
--------------

The syntax for tagged pointer declaration is given by

   type:
     ordinary-type
     based-type-list

   ztype:
     type
     0

   type-list:
     ztype, type-list
     ztype

   het-type-list:
     [type-list]

   based-type-list
     ordinary-type het-type-list

   tag-type-list:
     het-type-list
     based-type-list

   tagged-pointer-declarator
     tag-type-list pointer-list

   pointer-list:
     *identifier, pointer-list
     *identifier


   abstract-tagged-pointer-type:
     tagged-pointer-declarator *

Some examples are

  [int, double, complex] *p;
  f(base[derived1,derived2]);
  person[employee[permanent,casual],staff[manager,sales]]* z;
  [const char, void, 0] *s, **y;

The form

  [T1, T2, ..., Tn]

represents a set of arbitrary types, whereas the form

  B[D1, D2, ..., Dn]

requires that B be a class and D1..Dn derived from it publically,
it is otherwise equivalent to

  [B, D1, D2, ..., Dn]


The symbol 0 implies assignment or initialisation from
generic 0 is allowed.

3) Default initialisation
-------------------------

The pointer of a tagged pointer will by default be initialised to 0,
and the tag field set to indicate that the tagged pointer
has not been initialised.


4) Assignment and initialisation
--------------------------------

Assignment or initialisation of a tagged pointer by a pointer of
type T1*, T2*, ... or Tn* will set the pointer part of the tagged
pointer to the same value, and set the tag field to the corresponding
value. The tag is set even if the pointer is 0.

If the source pointer p is not one of the listed types, but can be
converted unambiguously to one of those types, then this is done
before the assignment or initialisation takes place.

The same rules as for function overloading are used, in particular.
the assignment is unambiguous if, and only if, the compiler would be
able to resolve a call f(p) to

        f(T1*);
        f(T2*);
        ...
        f(Tn*);

unambiguously.

If there is no unambiguous conversion, a compiler error is generated.
If the type void is one of the tag types, there is always at least
one conversion, to void*, that can be done.


5) Using a tagged pointer as an rvalue
--------------------------------------

A tagged pointer can be used as an rvalue if, and only if,
each and every one of the tag pointer types can be used
where the tagged pointer is used.

When a tagged pointer is used as an rvalue, the compiler
generates code for each case and the appropriate case is
selected at run-time using a switch on the type tag,
or some equivalent code. In particular, assignment of
one tagged pointer to another of the same type is covered by
this rule, but clearly no switch is required.

The expression in which a tagged pointer is used must
have the same return type for all cases.


6) Operators
------------

The delete operator can be applied to a tagged pointer.
The selected object is deleted.

The ++ and -- operators can be applied to a reference to a tagged pointer,
the result is the reference to the tagged pointer before of after
incrementing or decrementing.

An integral type can be added to or subtracted from a tagged pointer.
The assignment form may also be used.

Tagged pointers may be compared for equality or inequality.


7) Explicit type switch
-----------------------

Given a tagged pointer, one may wish to execute code dependent
on the tag. The syntax chosen here resembles that of the proposed
exception handling mechanism rather than the flawed switch statement.


  select(t)
  type()       { ... }  // uninitialised
  type(0)      { ... }  // explicit 0
  type(T1* p1) { ... }
  type(T2* p2) { ... }
  ....
  type(Tn* pn) { ... }
  type(void*)  { ... } // void* or none of he above

The select statement implies an argument with pointer types
T1..Tn. Such a tagged pointer type must be constructible,
that is, the types must be distinct as usual.
A void pointer is also allowed.

The tagged pointer t need not be of the type implied by the
select statement, however is must possible for it to be
be implictly converted to such a type. If not, a compiler
error is generated.

As such, such a list is always exhaustive and unambiguous
or a compiler error is generated by the implied conversion.

At run-time, the type of the tagged pointer t is examined,
the appropriate pointer variable initialised from the tagged
pointer, and the following block executed. Processing
continues after the last type statement. A break  may be used
inside the type block to exit prematurely.


7) Tagged pointers and templates
--------------------------------

A tagged pointer type can be used as a template argument just like
any other type. The generated template instances have a single
tagged pointer argument.

Generic code inside the function will be expanded to
copy with all the cases using the tagged pointer mechanism.

The use of a tagged pointer in selecting a family of functions
suggests templates my be useful for generating these instances.
This is more efficient than generating a single instance
for each tagged pointer type, since common operations will
be shared as object code.

To facilitate this case use of a type list creates a template instance
function for each type in the argument tagged pointer typelist.


  template<[class T]*, class U> f(T t, U u){ return *t+u; }

  [A,B,C]* abc;

  f(abc,i);

generates the instance functions

  f(A*,int);
  f(B*,int);
  f(C*,int);

(and of course the switch on the appropriate calls).
A subsequent call

  [B,C,D]* bcd;

  f(bcd,i);

will now only generate

  f(D*,int);

since

  f(B*,int);
  f(C*,int);

have already been generated.


8) Discussion -- comparison with polymorphism
----------------------------------------------

Tagged pointers are basically a generic facility, which,
especially using in conjunction with templates, provide
facilites for heteromorphism in the same way virtual
functions and inheritance provide facilities for
polymorphism.

Tagged pointers use an indexing mechanism where virtual
functions use indirection. Both are type safe and
highly efficient.

In polymorphism, the interfaces are identical, and each
actual object 'isA' base object. Base classes are subject
to arbitrary extension by derivation.

In heteromorphism, the interfaces are similar in some respects,
whereas the objects merely 'like' each other.
Tagged pointers can only refer to a discrete predeclared set
of types, extension requires the declaration be modified.

Tagged pointers support source code reuse whereas
polymorphism supports object code reuse.

The use of tagged pointer declarations of the form


  Base[Derived1, Derived2]* b12;

suggests that tagged pointers can be used to combine
polymorphism and heteromorphism without downcasting or
run-time type information. The call

  b12->print();

where print is not virtual, will call the derived class print
routines if the pointer is tagged for the derived classes,
and the base class print routine otherwise.

Had print been virtual, the exact print routine would have been
called.

More detailed special case handling is achieved with a selector:

  select(b12)
  type(Base *b) { b->print(); }
  type(Derived1 *d1) { d1->print(20); }
  type(Derived2 *d2) { d2->print("Title"); }


9) Example. Poor mans multiple dispatch
---------------------------------------

Multiple dispatch over finite cases is automatically
implemented by tagged pointers.

The code

  [double, float] *p1, *p2;
  f(p1,p2);

is equivalent to

  struct tp
  {
   enum tag {Double, Float};
   union
   {
     double *dp;
     float  *fp;
   };
  };

  tp p1,p2;

  switch(p1.tag)
  {
   case Double:
     switch(p2.tag)
     {
      case Double: f(p1.dp,p2.dp); break;
      case Float:  f(p1.dp,p2.fp); break;
     };
   break;

   case Float:
     switch(p2.tag)
     {
      case Double: f(p1.fp,p2.dp); break;
      case Float:  f(p1.fp,p2.fp); break;
     };
   break;
  };









--
;----------------------------------------------------------------------
        JOHN (MAX) SKALLER,         maxtal@extro.ucc.su.oz.au
 Maxtal Pty Ltd, 6 MacKay St ASHFIELD, NSW 2131, AUSTRALIA
;--------------- SCIENTIFIC AND ENGINEERING SOFTWARE ------------------