Thread

Topic: RTTI In Strongly Typed Languages

Author: carroll@sde.mdso.vf.ge.com (Carroll James)
Date: Thu, 6 May 1993 02:06:25 GMT Raw View

                          RTTI In  Strongly
                           Typed Languages
                          =================

I have recently read an article in C++ Report March/April edition on Run
Time Type Identification (RTTI) that I found rather disturbing.
There seems to be a misunderstanding about the fundamental differences
between a strongly type checked OO language and a loosely type checked or
symbolic language.(examples of OO languages and which category they belong
in can be found at the bottom of this post).

One of the most crucial and significant skills in OO design and
implementation is to be able to look at a system and define the
abstractions that are going to make up that system and the subsystems
built around those abstractions. In a strongly typed language this MEANS
the inheritance hierarchy and the behavior defined for the base classes.
Appropriately defining theses abstractions is what gives flexibility and
extensibility to the system being developed. Allowing RTTI in a strongly
typed system completely defeats the advantages to correct abstractions.
Its analogous to adding a 'goto' statement to a structured language.

As background I need to define and explain what is meant by abstraction in
both strongly typed languages and loosely typed languages. To start with
let me define what I mean by polymorphism. In object oriented
software design, the ability to send a message (or call a method) to an
object without knowing exactly what the object is allows for polymorphism.
In this way we can take a set of 'objects' and send the same message to
each of them, and they will behave appropriately based on what they really
are. This form of polymorphism allows the handling of objects abstractly,
i.e. it allows me to have any 'object' and pass a message to it.

As an example lets take the following (very C++ish) pseudo code:

 object = GetNextObject();
 object  -> Message();

If we are handling 'object' abstractly then when we write the line 'object ->
Message();' (meaning send the message 'Message' to 'object') then exactly what
will happen will depend on what 'object' is referring to. If this code is
executed more than once then completely different things may happen depending
on what 'object' is referring to at any given time. i.e. two completely diff-
erent pieces of code may execute for the given call to 'Message()'. This is, of
course, polymorphism (where two different objects can be treated the same
way, yet respond according to there own characteristics).

There is a significant difference between strongly typed languages
and loosely typed languages with respect to polymorphism and inheritance,
which causes a very significant differences in object oriented approaches
to solving problems.

In a strongly type checked language this polymorphism is intimately tied
to inheritance. In order for the above example to work, 'object' must be
typed and that type must be able to respond to the message 'Message()'.
For a strongly type checked language, 'object' must refer (be typed to)
to some "parent" class of a hierarchy that defines 'Message()' which is
implemented by child classes. This is HOW polymorphism, and "abstraction"
is accomplished in a strongly type check language, i.e. via inheritance
and reimplementation of a parent class' defined  methods. At run
time exactly which 'Message()' will be called depends on exactly
which "subclass" of this common "parent" class 'object' happens
to be referring to. There are many references one can go to to find out
how strongly typed checked languages use this coupling of inheritance
and polymorphism.

However, in loosely typed languages, polymorphism is decoupled from inheritance
(or maybe this should be stated in reverse, i.e. inheritance is decoupled
from polymorphism). By this I mean that if the example above were implemented
in a loosely typed language it would work reguardless of whether or not each
object being referred to by  'object' was inherited from a common parent
class that defines 'Message()'. Here the only requirement is that 'Message()'
be defined on any object that 'Message()' is called on.

This difference even cuts to the heart of the multiple inheritance debate.
How does this relate to Multiple Inheritance? In a strongly typed object
oriented language, Polymorphic Abstraction (i.e. being able to send common
messages to different objects without knowning the specific object type,
but referring to them in a General sense) can be equated with Inheritance.
Therefore, in these languages, in order to "Abstract" in two different
directions you NEED multiple inheritance. My usual example is Transportation,
and Animals. Suppose I have a system that deals with 'Animals' in an
abstract sense. For this example I will use 'Eat' as the common thing that
all 'Animals' do. I also have a system (set of classes) that deal with
'Transportation' in a abstract manner, say, 'Move'. Therefore I have a
set of classes that treat ANY 'Animal' by passing the message 'Eat' to it
and another set of classes that handle ANY piece of 'Transportation' by
sending the Message 'Move' to it.

If I want an Animal that 'IS' also a piece of Transportation I
MUST inherit from both 'Animal' and 'Transportation' to define, say, a
'Horse' that can be used by both systems. I would multiply inherit from
'Animal' and redefine 'Eat' and ALSO inherit from 'Transportation' and
redefine 'Move'. The ability to do this has always been the main argument
for Multiple Inheritance.

However, in a loosely typed language I can define Horse which is NOT inherited
from either 'Animal' or 'Transportation' but that does define 'Eat' and
'Move' message responses, and Horse will now work in either system. Because
responding to the messages (Polymorphism) is not coupled with generalization
(Inheritance).

If a developer finds himself with an object, and he needs to ask what
type the object is, and based on the response, will carry out an
appropriate action, there is a problem in the abstraction hierarchy (there
is in general one exception to this, which, if anyone is really interested,
I will gladly discuss with them). In a strongly typed language where 'types'
and 'classes' are synonymous, this is like admitting that there is a
problem with the abstraction because there is a subclass that IS being
handled through a particular abstraction yet CANNOT be handled through
that abstraction appropriately. Before getting into fixing this problem,
I would like to illustrate some of the effects on the main advantage of OO.

The big advantage that this aforementioned abstraction allows is
flexibility and extensibility of a system developed with it. By handling
objects on abstract levels it allows the system to grow through the
definition of new subclasses (of the abstractions) without an effect
on the inner workings of the system. For example: if I have a system that
deals with 'Transportation' objects by sending appropriate messages to
them, then if I get in a new mode of transportation that was not considered
when the system was designed, adding this information becomes as easy
as adding a new subclass. Suppose I write a simulator of Transportation
objects. In this simulator I have 'Cars' and 'Boats'. Latter I decide
to add 'Planes' to my system. If the abstractions where appropriately
identified and defined, then adding 'Planes' to the system is done by simply
defining the class 'Plane' to have 'Transportation' as an abstraction.
There will be no effect on the simulation code.

However, if in the simulation code I decide that I need some type specific
code so that I basically have pseudo code that looks like:

 if (object -> IsA("Car")) { ..... }
 else if (object -> IsA("Boat") { .... }

then when I add the 'Plane' I need to go back and change my system to
specifically recognize this new type. In this case why bother abstracting at
all. If every object could be asked its type at runtime as in RTTI, then
abstraction would not be necessary HOWEVER none of the benefits of flexibility
and extensibility demonstrated by this example would be realized either.
So much for flexibility, strike that one from the "Benefits of OO" list.

In a loosely typed language the type (or position in a hierarchy) is
independent of the methods that can be called on it. Therefore code
that looks like whats above will not exist. RTTI is then basically an
attempt to gain some of the benefits of a loosely typed language in
a strongly typed language, however these fundmentally different approaches
to the design and implementation of an OO problem don't overlap that
easily.

In my experience, which is fairly extensive, yet mainly C++, I have seen
the amazing benefits of an appropriately constructed abstraction hierarchy
as well as the major problems with inflexibility of incorrectly identified
and implemented abstractions that where "fixed" with the kludge of RTTI.

I hope that the people considering language support for such a mechanism
will realize that it's analogous to the 'goto' of structured programming
(some of you Fortran66 programmers may scoff at that) in defeating the
advantages of the paradigm.



___________________________________________________________________________
         __                                          |
        /  \ =========   Jim Carroll                 |
       /   |             carroll@sde.mdso.vf.ge.com  |
      /   /                                          |
     /   (   ________                                |
    |    (                                           |
   <|----------                                      |
    |    (   ________                                |
     \   (                                           |
      \   \                                          |
       \   |                                         |
        \__/ =========                               |
                                                     |
___________________________________________________________________________


PS Sorry about the digression into multiple inheritance, it's just a little
peeve of mine.


Strongly Typed OO Languages    |  Loosely Typed OO Languages
________________________________________________________________
 Ada9x                            CLOS
 C++                              Objective-C under NextStep
        GNU - Objective-C ?              Smalltalk

NON OO Languages: Ada, C (sorry, no polymorphism and inheritance => NON OO)

Keywords:

Author: detlefs@src.dec.com (Dave Detlefs)
Date: Thu, 6 May 93 17:33:30 GMT Raw View

[In a previous message, Jim Carroll gave a number of reasons why one
should not use RTTI, and worries about its being added to C++.  I try
to calm his fears.]


Jim --

If it's any consolation, I think Stroustrup is pretty much in
agreement with most of your points.  Have you read "Run Time Type
Identification for C++" by Stroustrup and Lenkov, in the last Usenix
C++ conference?  It is full of admonitions like "you almost never need
to use this!  Don't use this to write switch-like statements where you
could have used methods!"  but still has a number of convincing
examples for where the use of RTTI is appropriate.  The other
observation they make is that essentially all the major class
libraries have found a need for building in some sort of ad-hoc
library-specific RTTI convention because instances of these convincing
examples arise in practice.  Therefore, it would be best to build it
into the language.  (Note further that other than C++, I know of no
other object-oriented language that does not allow run-time type
queries; a partial list of those that do: Clu, Modula-3, Trellis,
CLOS, Dylan.  A brief scan of the Eiffel book does not reveal whether
or not it has this feature.)

So I would recommend reading that article -- it will calm any fears
about Dr. Stroustrup abandonding the path of object-oriented purity
:-)  I hope that programmers consider the caveats pointed out by Jim
and in the Stroustrup/Lenstra article before using RTTI.

Dave

Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Fri, 7 May 1993 20:02:56 GMT Raw View

In article <1993May6.173330.22968@src.dec.com> detlefs@src.dec.com (Dave Detlefs) writes:
>
>[In a previous message, Jim Carroll gave a number of reasons why one
>should not use RTTI, and worries about its being added to C++.  I try
>to calm his fears.]
>
>If it's any consolation, I think Stroustrup is pretty much in
>agreement with most of your points.

 Its no consolation. Unfortunately Bjarne doesnt write all
the C++ programs in the world.

>Have you read "Run Time Type
>Identification for C++" by Stroustrup and Lenkov, in the last Usenix
>C++ conference?

 I have. How many programmers out there in the real world
will have read that? How many will follow the advice given
there even if they had read it?

 I think C++ has a major hole in it: there is no
convenient support for 'discriminated unions'. Without
that support, people *will* use RTTI to do the job because
there is no other viable alternative. They *will*.
Even *I* will. Because the idiom for a discriminated
union is ugly, huge, incomprehensible, and not really
that safe.

>It is full of admonitions like "you almost never need
>to use this!  Don't use this to write switch-like statements where you
>could have used methods!"

 What other methods can you use conveniently for
heterogenous aggregates?

>but still has a number of convincing
>examples for where the use of RTTI is appropriate.  The other
>observation they make is that essentially all the major class
>libraries have found a need for building in some sort of ad-hoc
>library-specific RTTI convention because instances of these convincing
>examples arise in practice.

 I guess most of the uses of RTTI in libraries are wrong.
Either the library should have been designed with mixin technology,
or discriminated unions should have been used.

 But mixins are a relatively immature idea in C++,
and discriminated unions havent even been conceived yet.

>So I would recommend reading that article -- it will calm any fears
>about Dr. Stroustrup abandonding the path of object-oriented purity
>:-)  I hope that programmers consider the caveats pointed out by Jim
>and in the Stroustrup/Lenstra article before using RTTI.

 They will be considered and ignored in many cases
because no alternative has been suggested. And then,
because the syntax for accessing RTTI has been deliberately
made ugly and inconvenient, we will get an even worse
maintenance nightmare than we have now (when at least
virtual bases inhibit downcasting)

 I suggested that

 select(x)
 {
  type(D1* d1) { .. }
  type(D2* d2) { .. }
  ..
 }

was a cleaner syntax for typecasing (and is also safe,
whereas a checked cast is not).

The response indeed suggested that Stroustrup has not abandoned
statics in favour of dynamics .. I suspect Bjarne took
on the issue partly to make sure only the most minimal support
for dynamic typing was provided.

 Now I suggest that the Standard ought to do one
of two things:

 a) Adopt my nice clean syntax for dynamic type casing.
 b) Adopt a proposal for discriminated unions.

If, however,

 c) do nothing

is adopted by default, the result will almost certainly
be widespread use of hacked, unconstrained RTTI. IMHO.
Even *I* will use RTTI for typecasing if (b) is not adopted.

Therefore, I urge the readers of this group to consider
that we need discriminated unions, and to consider
how best to extend the language to allow them.

I suggest the 'select' statement above is in fact the correct
form of use for discriminated unions. But there are
many other issues to resolve. Not the least of
which is the formation of a "killer argument" against (c),
and a powerful argument in favour of (b).

(There is nothing wrong with type selection of a finite
number of pre-declared types (IMHO). But RTTI allows
*unconstrained* downcasting, and it is an invasive
technique as well.)

--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,      CSERVE:10236.1703
        6 MacKay St ASHFIELD,     Mem: SA IT/9/22,SC22/WG21
        NSW 2131, AUSTRALIA