Thread

Topic: Downcasting from virtual base case

Author: jvsb@ra.alcbel.be (Johan Vanslembrouck)
Date: 30 Nov 93 08:16:38 GMT Raw View

This article answers to remarks made by
 <1993Nov26.142447.2591@sco.COM> (Simon Tooke)
 <754336008snx@trmphrst.demon.co.uk> (Nikki Locke)
 <KANZE.93Nov29130011@slsvhdt.us-es.sel.de> (James Kanze)
on the article:
 <1993Nov25.154139@ra.alcbel.be>

------------------------------------------------------------------------

In <1993Nov25.154139@ra.alcbel.be> jvsb@ra.alcbel.be (Johan Vanslembrouck)
(i.e. myself) writes:

>>Using virtual inheritance ...

>>class L { ... };
>>class A : virtual public L { ... };
>>class B : virtual public L { ... };
>>class C : public A, public B { ... };

>>,however, the following layout is used:

>>      ----------------
>>     |    A part      |
>>     |                |
>>     |      ----------------------
>>     |----------------|           |
>>     |    B part      |           |
>>     |                |           |
>>     |       ---------------------|
>>     |----------------|           |
>>     |    C part      |           |
>>     |                |           |
>>     |----------------|           |
>>     |   A+B's L part |<----------
>>     |                |
>>      ---------------

>>The common L part has moved from the top to the bottom of the
>>memory layout. Calculating the offset of a derived class part given
>>a pointer to the virtual base class part is considered too difficult
>>by the ARM
>>(p.227: "Casting from a virtual base class to a derived class
>>is disallowed to avoid requiring an implementation to maintain
>>pointers to enclosing objects").

In <1993Nov26.142447.2591@sco.COM>, Simon Tooke replies:

>The problem here is that the compiler would have to know at compile time to
>insert a pointer to the derived class in the base class.  This would entail
>knowledge of exactly how many classes are going to derive from this base class
>AT COMPILE TIME - clearly infeasable.  Especially if the base class
were a PODS
>inherited from C or FORTRAN.

Is this correct? Inserting pointers to a derived classes
in the C-struct generated for the base class is not the right thing to
do.
Extra pointers should be inserted in the struct generated for the
derived class, before (or after) the base class part.

>>Another disadvantage of this approach is that it is impossible to
>>create
>>a derived class object at the location of a virtual base class object
>>*without* explicitly copying the contents of the base class object;
>>the virtual base class data members are moved to another location.

In <1993Nov26.142447.2591@sco.COM>, Simon Tooke writes:

>Since there is no guarantee the sizes are the same, this won't work
often
>anyways.

You're right. I only want it to work in those cases where it works
for nonvirtual base classes as well. However, the size itself should
not be the problem, because in most cases you can reserve space which
is
large enough to contain any derived class object. The (re)location of
the
derived class parts is the problem. And in case of

class L { ... };
class A : public L { ... };
class B : virtual public L { ... };

the L part is at a different location in the L and A class layout on
the one hand, and the B class layout on the other hand. Simple making
a base class virtual instead of nonvirtual may corrupt code without the
compiler complaining about it.

>>And why not introduce a pointer at the end of each base class part to
>>the
>>immediate derived class parts? Too much pointer overhead?
>>Only in case of very small classes, I think.
>>An advantage of this approach could be that an implementation
>>could choose to locate the different parts at different places in
>>memory
>>instead of using contiguous memory.

In <1993Nov26.142447.2591@sco.COM>, Simon Tooke writes:

>Ah, non-continguous classes - propose it at the next X3J16/WG21
gathering.

Please send me an invitation. Unfortunately,
I don't consider non-contiguous classes as a requirement.

------------------------------------------------------------------------
-

>> The layout for the virtual inheritance case looks like:
>>
>>       -----------------
>>      |  A+B's L part  |<---------- <------
>>      |                |           |       |
>>      |----------------|           |       |
>>      |       ---------------------|       |
>>      |    A part      |           |       |
>>      |                |           |       |
>>      |----------------|           |       |
>>      |       ---------------------        |
>>      |    B part      |                   |
>>      |                |                   |
>>      |----------------|                   |
>>      |       -----------------------------|
>>      |       -----------------------------
>>      |    C part      |
>>      |                |
>>       ----------------

In  <754336008snx@trmphrst.demon.co.uk>, Nikki Locke writes:

>OK, now lets introduce a new class

>class D : public B, public A { ... }; // Different order -> different
layout

>and the following code ...

>void function(int i)
>{
>  C c;
>  D d;
>  L *l = i ? c : d;
>  B* b = (B *)l;  // Cast virtual base to derived
>
>  // ...
>}

>Please explain how your layout would enable the compiler to output code
>for the last (commented) line.

You have a similar problem for nonvirtual base classes.
This is my solution for nonvirtual base classes:

void function(int i)
{
 C c;
 D d;
 L* l = i ? (L*) (B*) &c : (L*) (B*) &d; // have to mention
      // whose L part
 B* b = (B*) l;
}

For virtual base classes, you will have to write:

void function(int i)
{
 C c;
 D d;
 L* l = i ? (L*) (B*) &c : (L*) (B*) &d; // casting to (B*)
      // is redundant in
      // case of virtual
      // inheritance
 B* b = i ? (B*) (C*) l : (B*) (D*) l; // does not
      // work in current
      // implementation
}

When going from the common L base class to the derived A or B class,
you have to go via the "least common derived class" of A and B,
which is either C or D.

------------------------------------------------------------------------
-

|> This is my alternative, which, I believe, is more intuitive and
might
|> make downcasting easier. The common L part is again at the top,
|> and a pointer from the derived class part to the base class part
|> is inserted BEFORE the derived class part.

|>       -----------------
|>      |   A+B's L part |<----------
|>      |                |           |
|>      |----------------|           |
|>      |       ---------------------|
|>      |    A part      |           |
|>      |                |           |
|>      |----------------|           |
|>      |       ---------------------
|>      |    B part      |
|>      |                |
|>      |----------------|
|>      |    C part      |
|>      |                |
|>       ----------------

In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

>This would also be a legal implementation.  But I don't see where it
>would help in the downcasting problem.  Given an L* (and no other
>declarations except L and B), how can the compiler know where B is
>relative to L.

Well, you will have to tell the compiler that the B you want is part
of a larger object, being a C:

 B* bp = (B*) (C*) lp;

The compiler itself isn't smart enough to know it.

>It would also require pointers to members to contain negative values,
>which could conceivably cause problems for some systems.

A pointer to a member is not a regular pointer, but a struct with
3 fields, 1 being a real pointer, and one field used as the offset
to the pointer (ARM p.158-161).
This offset can be positive or negative (I hope), but the result
of adding (subtracting) this offset to the real pointer must yield
a genuine pointer value of course.

|> I would use the same mechanism for nonvirtual inheritance, like in:

|>       -----------------
|>      |   A's L part   |<----------
|>      |                |           |
|>      |----------------|           |
|>      |       ---------------------
|>      |    A part      |
|>      |                |
|>      |----------------|
|>      |   B's L part   |<----------
|>      |                |           |
|>      |----------------|           |
|>      |       ---------------------
|>      |    B part      |
|>      |                |
|>      |----------------|
|>      |    C part      |
|>      |                |
|>       ----------------

In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

>And increase the size of all of the derived classes.

>Note that in the above, it is perfectly legal to instantiate an A
>alone.  And of course, such an A must be identical to an A which is a
>base class of C.

>Now I have a number of classes which are just one pointer long (and
>contain no virtual functions).  I often use derivation to add to there
>functionality.  (This is not really subclassing in an OO sense, but is
>very practical none the less.)  Are you proposing to double the size
>of my class for this?

Why not? This makes me think of a political party that was very happy
because it doubled its votes. From 1 percent to 2 percent.
Or, if you have 1 dollar and you get another one, then you have doubled
your fortune.

In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

>While this idea might win votes from memory manufacturers, it is in
>direct conflict with the rule: you don't pay for what you don't use.
>Of course, it is not forbidden as an implementation.  But what
>advantages does it bring in return for its price.

So, I hope to get some positive reactions from memory manufacturers.
This
can't be the problem, because memory is getting cheaper all the time.
The major advantage is to have a line-up of the layout of virtual
and non-virtual inheritance structures. Some people (like me) have
to know this layout to implement a generic mechanism for streaming
and unstreaming objects.

|> And why not introduce a pointer at the end of each base class part
to
|> the
|> immediate derived class parts?

In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

>Principally, I suppose because the compiler cannot know how many
>immediately derived class parts it will have.  For that matter, in the
>case of virtual base classes, it is not at all clear what "immediately
>derived class" means.

As mentioned above, it is not the intention to insert those pointers
in the struct generated for a base class, but only in the derived
class before the base class part.

|> An advantage of this approach could be that an implementation
|> could choose to locate the different parts at different places in
|> memory
|> instead of using contiguous memory.

>Hmmm.  And how do you propose implementing pointers to members in this
>case?

May be a problem. But I don't insist on this feature.

-----------------------------------------------------------------------
Johan Vanslembrouck - SE99                 Tel    : +32 3 2407739
Alcatel Bell Telephone                     Telex  : 72128 Bella B
Francis Wellesplein  1                     Fax    : +32 3 2409932
B-2018 Antwerp                             e-mail : jvsb@ra.alcbel.be
Belgium
-----------------------------------------------------------------------

Author: kanze@us-es.sel.de (James Kanze)
Date: 30 Nov 1993 13:09:39 GMT Raw View

In article <1993Nov30.091638@ra.alcbel.be> jvsb@ra.alcbel.be (Johan
Vanslembrouck) writes:

|> This article answers to remarks made by
|>  <1993Nov26.142447.2591@sco.COM> (Simon Tooke)
|>  <754336008snx@trmphrst.demon.co.uk> (Nikki Locke)
|>  <KANZE.93Nov29130011@slsvhdt.us-es.sel.de> (James Kanze)
|> on the article:
|>  <1993Nov25.154139@ra.alcbel.be>

|> ------------------------------------------------------------------------

|> In <1993Nov25.154139@ra.alcbel.be> jvsb@ra.alcbel.be (Johan Vanslembrouck)
|> (i.e. myself) writes:

As the original article was already reasonably long, and it is getting
longer with each response, I will just quote and respond to a few
specific issues.

|> |> This is my alternative, which, I believe, is more intuitive and
|> might
|> |> make downcasting easier. The common L part is again at the top,
|> |> and a pointer from the derived class part to the base class part
|> |> is inserted BEFORE the derived class part.

|> |>       -----------------
|> |>      |   A+B's L part |<----------
|> |>      |                |           |
|> |>      |----------------|           |
|> |>      |       ---------------------|
|> |>      |    A part      |           |
|> |>      |                |           |
|> |>      |----------------|           |
|> |>      |       ---------------------
|> |>      |    B part      |
|> |>      |                |
|> |>      |----------------|
|> |>      |    C part      |
|> |>      |                |
|> |>       ----------------

|> In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

|> >This would also be a legal implementation.  But I don't see where it
|> >would help in the downcasting problem.  Given an L* (and no other
|> >declarations except L and B), how can the compiler know where B is
|> >relative to L.

|> Well, you will have to tell the compiler that the B you want is part
|> of a larger object, being a C:

|>  B* bp = (B*) (C*) lp;

|> The compiler itself isn't smart enough to know it.

I'll bet a lot of programmers aren't either.  This looks like an
invitation to hard to find run-time bugs.

On the other hand, you can already do:

 B* bp = dynamic_cast< B* >lp ;

At least according to the working papers.

This solves the problem, at a small runtime expense.  And I suspect
that in a typical application, there will be no need for extra
information in the object itself.  All of the necessary information
will be in the virtual function table.

Have you read the proposal for RTTI and dynamic_cast?  It occurs to me
that this will probably accomplish everything you want.

|> >It would also require pointers to members to contain negative values,
|> >which could conceivably cause problems for some systems.

|> A pointer to a member is not a regular pointer, but a struct with
|> 3 fields, 1 being a real pointer, and one field used as the offset
|> to the pointer (ARM p.158-161).
|> This offset can be positive or negative (I hope), but the result
|> of adding (subtracting) this offset to the real pointer must yield
|> a genuine pointer value of course.

Actually, this is not the case.  What you are referring to is a
*possible* implementation for pointers to member functions.  Most of
the actual implementations I know of (Cfront excepted) use an 'int'
for a pointer to data member, and a 'void (*)()' as a pointer to
member function.

But I acknowledge that any implementation *will* have to be able to
handle negative offsets anyway, although maybe not in this context,
since it is legal to add a negative value to a pointer.

|> |> I would use the same mechanism for nonvirtual inheritance, like in:

|> |>       -----------------
|> |>      |   A's L part   |<----------
|> |>      |                |           |
|> |>      |----------------|           |
|> |>      |       ---------------------
|> |>      |    A part      |
|> |>      |                |
|> |>      |----------------|
|> |>      |   B's L part   |<----------
|> |>      |                |           |
|> |>      |----------------|           |
|> |>      |       ---------------------
|> |>      |    B part      |
|> |>      |                |
|> |>      |----------------|
|> |>      |    C part      |
|> |>      |                |
|> |>       ----------------

|> In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

|> >And increase the size of all of the derived classes.

|> >Note that in the above, it is perfectly legal to instantiate an A
|> >alone.  And of course, such an A must be identical to an A which is a
|> >base class of C.

|> >Now I have a number of classes which are just one pointer long (and
|> >contain no virtual functions).  I often use derivation to add to there
|> >functionality.  (This is not really subclassing in an OO sense, but is
|> >very practical none the less.)  Are you proposing to double the size
|> >of my class for this?

|> Why not? This makes me think of a political party that was very happy
|> because it doubled its votes. From 1 percent to 2 percent.
|> Or, if you have 1 dollar and you get another one, then you have doubled
|> your fortune.

This is not quite relevant.  As a simple example, my current project
requires >250 Megabytes of heapspace.  Not exactly little.  And this,
although most of the objects involved are 24 bytes or less.  Of
course, the bigger objects are assembled out of these little objects
(a typical case, I would think).  Save me 4 bytes for each little
object, and I will gain almost 50 Megabytes.  On the other hand, each
of these little objects has an inheritance hierarchy of at least 4
layers, often more.  Add four pointers to each object, and I would
need around 400 Megabytes, instead of 250.

|> In <KANZE.93Nov29130011@slsvhdt.us-es.sel.de>, James Kanze writes:

|> >While this idea might win votes from memory manufacturers, it is in
|> >direct conflict with the rule: you don't pay for what you don't use.
|> >Of course, it is not forbidden as an implementation.  But what
|> >advantages does it bring in return for its price.

|> So, I hope to get some positive reactions from memory manufacturers.
|> This
|> can't be the problem, because memory is getting cheaper all the time.
|> The major advantage is to have a line-up of the layout of virtual
|> and non-virtual inheritance structures. Some people (like me) have
|> to know this layout to implement a generic mechanism for streaming
|> and unstreaming objects.

In my experience, streaming does not have to know the layout.  In
fact, it cannot in general use the layout per se; what do you do about
classes which allocate memory to store variable length data (a
string class, for example)?  Classes which contain pointers (a linked
list element, for example)?  Streaming is an entirely different (and
very complex) issue.
--
James Kanze                             email: kanze@us-es.sel.de
GABI Software, Sarl., 8 rue du Faisan, F-67000 Strasbourg, France
Conseils en informatique industrielle --
                   -- Beratung in industrieller Datenverarbeitung