Topic: ARC++ and Exportable Classes


Author: jones@cais.com (Ben Jones)
Date: Thu, 24 Feb 1994 11:11:11 -0500 (EST)
Raw View
This is the first in a series of articles which will explore various
extensions to the C++ language.  These extensions are implemented in
an experimental preprocessor called ARC++ which takes the extended C++
syntax and generates ANSI C++.  Those who wish to try out the ideas
being presented here may obtain a copy of ARC++ for the PC, Mac,
Sparcstation, or Iris from the anonymous ftp on: "arcfos1.arclch.com".
Please go to the directory "/pub" and download the file "arc.READ_ME"
for instructions.

              EXPORTABLE CLASSES AND PACKAGES
              ===============================

                        Ben Jones
              (c) 1994 ARSoftware Corporation
                 jones@jameson.arclch.com


INTRODUCTION
============

C++ forces us to manually split our class definitions into interface
and implementation parts.  While some might say that this is just
good programming technique, I would argue that it represents a
maintenance nightmare:

* It causes a large number unnecessary recompilations during the course
of development.

* It makes it harder to make your code self-documenting.

* It forces us to do manually what could be very easily automated
by computer.

A solution is "Exportable Classes and Packages" implemented in a
language extension called "ARC++".


C++ SPLIT-UP CLASS DEFINITIONS
==============================

In C++, we have to manually split our class declarations into the
interface (which is usually in a header file) and implementation
(which is usually in a separate file).


Header and Source
-----------------

For example, in order to create a class called "X" we might create a
header file "X.h":

    #ifndef X_h         // Prevent "X.h" from being included twice
    #define X_h
    class X
    {
      private:
        int x,y,z;      // Private members are visible to user
        static int sd1; // even though they can't be accessed.
        int s1(...);    // Private functions declared here though
        int s2(...);    // of no interest to the user of class "P"
      public:
        int f(int a=100,int b=200); // Default arguments declared here
        void clz() { z = 0; }       // Inline function
        static int i,j,k;
        X();                        // Constructor declared
    };
    #endif

Separately, we define a source file "X.cc" containing things which
are not allowed in the declaration:

    #include "X.h"          // Include class declaration
    X::X()
    {
        x = 0;              // Member initializations must be done
        y = 0;              // in the constructor
        z = 100;
    }
    int X::sd1 = 100;       // Static intializations must be done here
    int X::i = 1;
    int X::j = 2;
    int X::k = 3;
    int X::s1(...) { ... }  // Internal support functions are implemented
    int X::s2(...) { ... }  // here as needed
    int X::f(int a,int b) { ... }

Note that each member declared in this source file must be qualified
by "X::".  Modules which use class "X" must include "X.h":

    #include "X.h"
    X x1,x2;


Makefiles
---------

A Makefile would indicate that "X.h" is a dependency for each
module which includes it:

    P: X.o Y.o Z.o
      CC -o P X.o Y.o Z.o
    X.o: X.h X.cc
      CC -c X.cc
    Y.o: X.h Y.cc
      CC -c Y.cc
    Z.o: X.h Z.cc
      CC -c Z.cc

If "X.h" is edited, all modules which include it are recompiled
whether or not the change actually affects them.

Suppose that in the course of development, new private functions
and/or static data are added to class "X".  They are of no interest to
users of class "X" yet they must be added into the class declaration in
the header file, needlessly triggering recompilation of all modules
which include "X.h".

This is true even for C++ systems which automatically figure out
dependencies.  They work by scanning the source files for #include
directives and then recompile modules whose object files are older
than the included header files.


Duplicate Class Declarations
----------------------------

One way to prevent unwanted recompilations would be to declare class
"X" twice.  That is, the class X declaration shown above in "X.h"
would be moved into "X.cc".  Anything not needed to establish the
interface would be stripped out, leaving:

    #ifndef X_h
    #define X_h
    class X
    {
      private:
        int x,y,z;
      public:
        int f(int a=100,int b=200);
        void clz() { z = 0; }       // Inline function
        static int i,j,k;
        X();
    }
    #endif

In a much more complex class, this technique is very dangerous.  All
it would take to introduce a mysterious problem would be to have a
virtual function declared out of order in the two separate places.


ARC++ INTEGRATED CLASS DEFINITION
=================================

In any case, all of this is manual labor which could just as easily be
automated.  ARC++ provides for a self-exporting class which generates
its own interface.


Single Integrated Source
------------------------

The above example would be written as follows:

    export class X              // Indicate that "X" is of interest to
    {                           // other modules
      private:
        int x=0,y=0,z=100;      // Members will have these intial values
        static int sd1 = 100;   // Static members may also be initialized
        int s1(...) { ... }     // Private support functions
        int s2(...) { ... }     // defined as needed
      public:
        int f(int a=100,int b=200)  // Public function completely defined
            { ... }
        inline void clz()       // Explicitly inline function
            { z = 100; }
        static int i=1,j=2,k=3; // Static members initialized
    };

The "export" keyword tells the compiler to generate an interface so
that other modules can use this class.  This interface does not
necessarily have to be a text file.  The current implementation
of ARC++ does generate such a file, "X.ph", which contains:

    class X
    {
      private:
        int x,y,z;
      public:
        int f(int a=100,int b=200);
        void clz() { z = 100; }
        static int i,j,k;
        X();
    };

Initializations are stripped out.  Member initializations are put
into the constructor, which is generated in the defining module if
necessary.  Function bodies are stripped out unless they were
explicitly inline.  Only private members which contribute to the size
of the object are output.  Private functions and private static
members are not output at all.


Import
------

A module which uses class "X" simply imports it:

    import X;
    X x1,x2,x3;

or references it as a base class, importing it implicitly:

    class Y: public X
        { ... };

An import is only performed once.


Dependencies
------------

A module which exports "X" needs to be compiled before any module
which which uses "X" so that a header file will exist.  A Makefile
could show this:

    P: X.o Y.o Z.o
      arcc -o P X.o Y.o Z.o
    X.o: X.acc
      arcc -c X.acc
    X.ph: X.o
    Y.o: X.ph Y.acc
      arcc -c Y.acc
    Z.o: X.ph Z.acc
      arcc -c Z.acc

That is, the compilation of X causes X.ph to be generated.  Since Y and
Z import X, they need to have X.ph as a dependency.  ARC++ only
regenerates X.ph if it is going to be different.  Thus Y and Z are only
recompiled if the public or protected part of X changes.


Automatic Make
--------------

This makefile looks a bit more complicated for ARC++ because of the
need to worry about generated files.  For large programs with lots of
exported classes, this could be really complicated.  Fortunately, ARC++
does not require makefiles.  The user merely supplies a list of
modules to a program.  ARC++ figures out the dependencies
automatically just like some C++ systems do.  Not only does it look for
#include directives but also for export, import, and class
declarations (which implicitly import).


FUTURE IMPROVEMENTS
===================

Currently, modules which import a generated header need to be
recompiled if anything in the public and protected part of the
interface changes.  This leaves the developer free to add private
static members and private member functions without triggering
recompilation.  Things can be improved even further.


Detecting Significant Changes
-----------------------------

Actually, the only changes which really need to trigger recompilation
are the following:

* Any visible members were removed.

* Any virtual functions were added in a way which changed the
positions of existing functions in the virtual function table.

* If any members were added which change the size of the object or
change the positions of existing members within the object.
A date could be added to the generated header (as a comment line)
which says when these kinds of changes occur.  ARC++ could check
this date to see if dependent modules need to be recompiled.

This means that nested classes, non-virtual functions, virtual
functions added at the end of the class, static members, and non-
static members which fill in gaps in the alignment of data can all be
added to an exportable class without triggering recompilation and
without the user having to worry about when this happens.


Precompiled Headers
-------------------

The syntax of import and export also lends itself very nicely to
generating "precompiled headers".  Nothing says that the generated
header has to be a text file.


Minimum Object Size
-------------------

Another refinement would be to allow a minimum size to be
specified for the private part of a class:

    export class X
    {
      private[32]:
        int x,y,z;
        int w,u,v;
    };

In the generated header, all private members are replaced by a one or
more anonymous arrays which reserve space.  Only members referenced by
public or protected inline functions need to be named.  Other private
members could be added and subtracted without affecting the generated
header as long as they did not exceed the declared minimum size for
the private section.

The compiler could even have a switch which enables/disables the
size specified for private so that after development is finished, any
unused space could be removed easily.


PACKAGES
========

Having an exportable class, even one containing nested classes, does
not completely get rid of the need for header files.  What is needed
is a way to deal with any kind of global symbol.

Recently, the ANSI commitee approved an extension to C++ which
allows a "namespace" to be defined:

    namespace N
    {
        int s1,s2;
        int f() { ... }
    };

Everything inside the braces which would have been globally
defined is now associated with the name "N".  This protects it from
conflicting with other groups of symbols.  A given scope can directly
reference symbols in namespace "N" by saying:

    using N;


Classes and Namespaces
----------------------

This is a desirable thing to have but classes already provide this
capability.  Static members and static member functions have
external linkage which incorporates the name of the class:

    class N
    {
        static int s1,s2;
        static int f();
    };

If N is inherited by some other class, the static members are also
directly accessible.

A "namespace" cannot be inherited in the current ANSI specification
and seems to have no relation whatsoever to the concept of a class.


Packages in ARC++
-----------------

ARC++ provides a concept called a "package" which is very similar
to "namespace" but it goes much further:

* A package can have public, protected, and private members.

* A package can be exported.

* A package can be inherited.

* A package can inherit.

* A package can be converted to a class merely by calling it a class
instead of a package.  This will make the transition from a non-
object-oriented scheme to an object-oriented one very trivial.


Package vs Class
----------------

In ARC++ a package is simply a class in which all members are
static by default.  Anything that is declared static stays that way.
When a package is exported, only public and protected members
are output to the generated header.


Inheritance
-----------

A package can inherit another package:

    package N: M { ... };

This is equivalent to saying:

    namespace N
    {
        using M;
        ...
    };

A package can also inherit a class, struct, or union.  The effect is to
make the inherited item an anonymous member of the package.

Just to round out this capability, ARC++ allows unions to be inherited
by classes.


Converting a Package to a Class
---------------------------------

Because a package can inherit a class, it is very easy to change a
package into a class.  Merely substitute class for package and
put in a public access specifier (or if you are really lazy, substitute
struct).  Suddenly you have the ability to make instances of the
(formerly) global data of the package.

Since ARC++ automatically creates a constructor to account for
initialized members, the transformation is complete.


Member Function Pointers
------------------------

There is only one other obstacle to transforming a package into a
class.  Pointer to member function syntax is different.  Not only
different but ugly.  ARC++ remedies this problem as follows:

* A member function pointer may be assigned to a member function
pointer without qualifying it.

* A member function may be executed without the use of the ".*" or
"->*" operators.  That is, if parameters are given to a member
function pointer, "this->*" is assumed.

Thus any manipulations of function pointers originally designed
without classes in mind are handled almost transparently:

    export package X         // If we substitute "class" for
    {                        //  "package" we get:
      public:
        int (X::*pf)(int a);
        int f(int a) { ... }
        int f1()
            {
            pf = f;          //      pf = &X::f;
            ...
            pf(100);         //      (this->*pf)(100);
            }
    };

The declaration of "pf" looks like a member function declaration but
in the context of a package, it would merely check for "f" being a
member of "X".

We could make it so that an ordinary function pointer would become a
member function pointer in the context of a class but that might break
existing C++ code.  A future version might provide this feature and
then let the user disable it with a compiler switch.


CLASS AND PACKAGE EXTENSION
===========================

It is not necessary to have a class or package completely defined
in a single module.  If the body of a function is not specified in an
exported class, it may still be specified in a separate module as in C++:

    import X;
    int X::f(int a) { ... }
    int X::f1(int a,int b) { ... }

ARC++, however, goes a step further and allows a name space to be
re-entered and extended at any time.  Suppose that the declaration of
class X included some functions without bodies:

    class X++                   // Re-enter package of "X"
    {
        int s1;                 // Add private data
        int f1(int a) { ... }   // Define the body of "f1" here
      public:
        int f2(int a,int b) ... // Define a new function here
    };

The ++ operator applied to a class name indicates that a previously
defined class is is to be extended.  Only static members, non-virtual
functions, and the bodies of previously declared functions may be
defined here.  Anything else would change the size of the object or
the virtual function dispatch table.


Future Enhancements
-------------------

Currently, a class extension cannot be exported.

An enhancement would be to make extensions exportable so that new
functions and static members could be made available to importing
modules without having to go back and edit the original exporting
module.  This would work in conjunction with the enhancement that
would allow public and protected functions and static members to be
added without triggering recompilation in the automatic make.  The
generated header file would simply have extension sections marked as
having been generated from the various exporting modules. from the
modules which exported extensions.


SUMMARY
=======

The exportable class/package has immediate advantages for coding and
maintenance.

* The programmer needs to maintain only a single source file to define
a class.

* It is easy to comment this source file and there is no need to refer
to a separate file for additional documentation.

* There is no need to clutter the program with a lot of class name
qualifiers.

* It is unambiguous as to the module which owns the class declaration,
the virtual function table, and the initial values of static members.

* It eliminates the need for preprocessor directives.

* It lends itself nicely to the concept of "pre-compiled header files"
since there is no requirement that the interface generated by the
"export" directive must be a text file.

* It provides namespace separation in a manner consistent with object
oriented programming.

* It allows constructors and initializations to be given inline, making
for more intuitive class definitions.

* It eases the transition to object-oriented programming.