Topic: Idea for C++ Modules


Author: Christopher Eltschka <celtschk@physik.tu-muenchen.de>
Date: 1999/06/16
Raw View
I though a bit about how modules could be done in C++.

- Fundamental points:

* A module is thought as describing a whole library. It is not
  meant as direct replacement for header files (although one
  could use them as such). Of course, there's nothing wrong
  with a library being built of more than one module, and with
  even main programs which consist of more than one module.
  However, the rules are written with a "write module, then use
  module" rule in mind. Circular dependencies are difficult
  (and forbidden in some cases, to allow defining initialisation
  order). Essentially, the demand of "first compile module,
  then use module" allows circular dependency only to be
  introduced after both modules exist (at least as sceletons
  of public definitions). The rules are done with the concept
  of a single library in mind.

* There is only one new keyword: "module". This is achieved
  by reusing the keywords "using" and "public". The "using"
  notation is IMHO quite natural; however, the reuse of
  "public" could be questioned; maybe it's better to replace
  it with another keyword.

* The rules are defined to not change the meaning of any
  currently legal C++ program (except for those using an
  identifier "module", of course). If they do, it's a bug.
  This also means that a conforming compiler could implement
  them even today by just renaming the keyword "module"
  to a reserved name like __module. No conforming program
  should see any difference.

* The modules bring two additions:
  - First, an additional barrier for name clashes, as two
    equal names in different modules don't create a link
    ambiguity; only if both modules are used in the same
    translation unit, and this translation unit tries to
    access that name, an error occurs. I've set the rule
    that new names in using modules override the ones
    imported from used modules; maybe instead an error
    should occur.
  - Second, a mechanism to define order of initialisation,
    together with a way to exit early if initialisation
    fails. It's up to the library writer to decide if
    his library needs this feature. There is no way for
    the user of the library to override this; maybe such
    a way should be introduced. Initialisation is triggered
    with a module main function; since modules with main
    function bay not use each other circularly, they form
    a directed acyclic path, so defining initialisation
    order is easy. Modules without main don't need this
    restriction.

* Module names live in a separate namespace, independant of
  the program namespace. That is, a module name that is not
  prefixed by the keyword "module" has no special meaning
  and can be used independently from the module mechanism.
  However I'd expect them to be used for a namespace inside
  the module with the same name, so names from module
  foobar also live in namespace foobar. However this is not
  required. Module names would probably be mapped to filenames
  in some implementation defined way, to access a module
  interface file produced by compiling a module.

* While I've put some thoughts on it, I've not thought
  through it too much, so there surely are many points
  I haven't considered. After all, this is just discussion
  material; it may even have a hidden serious flaw, or not
  be implementable that way. It's just how I think it
  could be done.


- Defining modules:

Any translation unit is part of exactly one module.
The module the translation unit is part of is indicated by
writing

module Name;

into the file. Before this statement, there may be nothing
but whitespace and preprocessor directives. The keyword "module"
must be the first token of the translation unit after the
preprocessing phases are finished.

The module name is not inserted into the global namespace,
so it can be used for other names in the module. (A good idea
would be:

module Name;

namespace Name
{
...
}

to put all declarations of the module into a namespace of
the same name).

Every declaration in a module is per default private (i.e.
only translation units which belong to the same module can
see the names). A global/namespace scope declaration name can be
made public (i.e. visible by other bodules) by prefixing the
declaration with public (The idea is that the compiler gathers
those public declarations into a "module interface file" when the
module is compiled). More than one declaration can be published at
the same time by using a "public block":

public
{
  class X { ... };
  void foo(X*);
}

Macros cannot be exported from modules. If a module relies on
macros, those have to be supplied with an extra header file
(which probably also uses the module; see below).

Note that this use of public doesn't interfere with the other
uses, since currently global/namespace scope symbols simply
cannot be declared public.

The definition of a function as public is allowed; however,
it results in the same behaviour as declaring it as public
and defining it separately. (It may, however, affect an
implementation to allow inlining the function across module
boundaries, by copying the definition - probably in a tokenized
form - to the module interface file. The as-if rule must be
preserved, of course).
If types are defined public, the whole type definition
is visible on using the module; declaring the type public but
defining it non-public only publishes the forward declaration.
If any name inside a namespace is made public, the namespace
itself is public as well. Namespaces themselves cannot be
made public (if there's no name in that namespace, it would
be of no use anyway). Names in unnamed namespaces and names
with internal linkage may not be declared public.

Each module may have its own main function. If a module has
a main function, it is initialized before any code using that
module is executed. That means, the non-local static variables
of the module are initialized according to the usual C++ rules
(i.e., inside a single module, the usual static initialisation
order problems remain), except that every translation unit of
the module must be initialized before the modules main function
gets called. This happens directly after all translation units
are initialized. If the main function of the module returns 0 or
EXIT_SUCCESS, initialisation of other not yet initialized
modules and finally execution of the program's main function
proceeds. If the module's main function returns anything
except 0 or EXIT_SUCCESS, exit is called with the return value.

If a module has no main function, initialisation proceeds
on the basis of translation units, following the current
initialisation rules of C++. Especially, unlike modules with
main, modules without main are not initialized in one block,
and initialisation can be interleaved with the initialisation
of other modules.

This way the module writer can decide if automatic
initialisation before entering the program's main function
is needed by simply writing or not writing a module main
function (even if it is just empty - i.e. implicit return 0).


- Using modules

A module is used by writing

using module Name;

at global scope. The module shall be compiled before any
translation unit which uses that module is (however, if
the module is modified, a recompile is only necessary
if the interface changed). Repeating an using statement
for the same module has no effect.

This imports all public names of a given module into the current
translation unit. It doesn't create a namespace for the module;
if the module's contents should be in a namespace, the module must
define the namespace itself (as shown above).

A module A uses another module B if any of the module A's
translation units uses the module B.

Two modules which both have a main function may not use
each other recursively (directly or indirectly). That is,
the following is not allowed:

module A;
using module B;

int main() {}

and

module B;
using module A;

int main() {}

Note that common types, functions and variables can be put
into a third module that is used by both. If circularity
cannot be prevented entirely, one should consider combining
the two modules into one (since they are so closely related).
However, a simple solution is to let only one of them
have a module main function. The other one can still be
initialized with an init function called from the first
module's main function. However, IMHO such circular
dependencies are best avoided entirely anyway.

Entities imported from a module are considered different
from entities defined in the current module, or imported
from another module. That is, the following code is ill-formed:

// file A.cc
module A;

public class X; // only forward declaration

public void foo(X*);

// file main.cc

using module A;

class X {}; // this does _not_ define the forward-declared type
            // from module A, but defines a completely unrelated
            // type class X

int main()
{
  foo((X*)0); // Error: type mismatch
  foo(0);     // Ok: converted to (module A)X.
}

However, there's no mechanism to resolve module name conflicts;
that's what namespaces are for. Instead, names in the current
module hide names in the used modules, and equal names imported
from different modules into the _same_ translation unit give
an ambiguity error if using the name is actually tried. (As
usual, with functions the signatures are taken into account
as well.).
Templates imported from a module may be partially specialized,
if the partial specialisation involves a type that is not part
of the module the template was imported from, or one of the
modules used by that template.
Namespaces of the same name are also different, but for name
lookup except Koenig lookup they behave as if they were united
into one namespace of that name. However, adding to a module's
namespace from outside the module itself is not possible, except
for partial specialisation of templates. Example:

// file A.cc
module A;
namespace NS
{
  public void foo(int) {}
}

// file B.cc
module B;
namespace NS
{
  public void foo(double) {}
  public template<class T> void foo(T*) {}
  public template<class T> struct X {};
}

// file main.cc
using module A;
using module B;

void bar()
{
  NS::foo(3); // calls module A: NS::foo(int);
  NS::foo(1.5); // calls module B: NS::foo(double);
  NS::foo((int*)0); // calls module B: NS::foo<int>(int*);
}

void baz()
{
  using namespace NS;
  foo(3); // calls module A: NS::foo(int);
  foo(1.5); // calls module B: NS::foo(double);
  foo((int*)0); // calls module B: NS::foo<int>(int*);
}

void NS::foo(int); // Error: no NS::foo to forward declare

namespace NS // a different namespace!
{
  void foo(int); // Declaration  of a *new* function
}

class A {};
template<> struct X<A> { int x; };
 // but this is a partial specialisation of struct X


Names in different modules have different link names, so
the following does work:

// file A.cc
module A;
public int x=1;

// file B.cc
module B;
public int x=2;

// file foo.cc
using module A
int foo()
{
  return x; // returns x from module A
}

// file main.cc
#include <iostream>

using module B;

int main()
{
  std::cout << x << std::endl; // uses x from module B
  std::cout << foo() << std::endl;
}

Module A's x and module B's x don't interfere despite having
the same name, since they have different link names (similar
to variables in unnamed namespaces; however the latter cannot
be used across different translation units). A conflict only
arises if one translation unit uses both modules and tries to
access x.

All modules with main function used by a given module are
guaranteed to be initialized before any code of the using
module is executed. The order in which the used modules are
initialized is unspecified, except if it is determined by
dependencies between the used modules (i.e. if one of the
used modules itself uses another of the unused modules).
Note that this rule can be followed strictly since
circular dependencies between modules with main function
are not allowed.

using modules is not transitive; that is, if a translation
unit of module A uses module B, and module B uses module C,
the names of module C are not visible in the translation unit
of module A unless the translation unit uses module C explicitly.

- The main module

There's one special module, the main module. This module
has no name and contains the programs main function.
A file is part of the main module if it doesn't contain a

module Name;

line. Unlike other modules, the main module must contain a
main function; however, on entry of the main function,
static variables in its translation units need not be
initialized before entry to main; instead, the current C++
initialisation rules apply.
All programs written according to the current standard
would therefore be considered to be completely in the main
module. Especially due to the special rules for this module,
their semantics is not modified at all.

Note that the module initialisation rules above guarantee
that the main module's main function is run as last module
main function, and only if all required module initialisations
succeeded.


- Standard modules

To make sure that each module works with the same standard
classes, those should be in special modules as well.
For each standard header which adds nonmacro names, there
should be a standard module with the same name. Using that
module should have the same effect as including the header,
except that no macros are defined. The headers should be
implemented as if they had an using module Name, followed by
appropriate defines. For example, the header <cstdio> would
bahave as if it contained

using module cstdio;
#ifndef EOF
#define EOF (-1)
#endif
// other defines needed for <cstdio>

Note that the special rules for namespace std follow
automatically from the module namespace rules (if not,
I got those rules wrong - they are meant to do so).


Now the field for destructing this suggestion is open ;-)
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]