Topic: Order of Initialisation


Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Mon, 11 Oct 1993 17:25:14 GMT
Raw View
In my opinion, the order of initialisation problem is one that
needs to be solved: it will become particularly important
with multiple large class libraries supported by namespaces,
especially ones with lots of complex templates.

Ideally, the linker/compiler can do a complete dependency
analysis on individual initialisations, but I think
that  a REQUIREMENT for that in the Standard may create
two problems:

 1) Its hard to do, I'm not sure its even possible
    due to aliasing problems.
    So many compilers will get it wrong.

 2) Its likely to be VERY slow even if its possible.

On the other hand, I am not sure that leaving the order of
initialisation unspecified is acceptable either,
and the de-facto specification: within a translation unit
in order, between translation units no order: is not
good enough either.

I think. Do you agree?

I have a possible, somewhat hacked, and non perfect, solution.
I dont claim its ideal or solves all problems. It definitely
solves *some* problems. It also requires an extension,
and if its not considered soon we may be stuck either
with no solution, or delays in acceptance of the Standard
while a solution is retrofitted.

Here is an outline of my idea:

1) You can write:

 module fred { .. }
 module joe depends on fred { .. }

in a file. After pre-processing, each module is treated
exactly as a separate translation unit.
(Sorry, these are NOT real modules, MODULA or Ada style)

The names of the modules are not visible to the program.
(Well, they could be accessed by a __MODULE__ macro I suppose).
In particular, a module is NOT a namespace.

 module max {
  int i;
  max::i++; // error, max not known
 }
 module tal depends on max {
  extern int i;
  i++; // fine, refers to 'i' in module max
 }

The module names must be unique per program. They correspond
to the module names in a DOS object code library file,
and represent the smallest loadable unit.

An immediate advantage is you can create a library with
100 separately loadable functions in it, from a single
source file.

Wrapping module statements around things acts as an
impenetrable barrier, guarranteeing isolation
from context. Use of modules helps deprecate use
of the preprocessor and the file system, and provides
a mechanism in the C++ language proper to deal
with 'translation units'.

The 'depends on ' clause indicates an initialisation dependency.

Separate statements can be written:

 module A depends on B, C;
 module X depends on B, Y;

and I envisage a file of these per program to resolve
order not intrinsic to the modules themselves.

To make this work for templates, we need to force
them to be instantiated somewhere. There is a separate
proposal for this in its second revision already,
basically:

 template list<complex>;

forces instantiation 'here'. These forced instantiations
do not affect overloading. I imagine them in modules
in the same file (or same sort of file)
as the 'extra' module dependencies.

Some extra issues: code not in explicit modules
is wrapped in a module with a compiler generated unique
name. (The file name would be a good place to start :-)

Modules can be nested, but this is mainly because
a file is a module, and an explicit module defined
in a file is thereby automatically nested,
so it has to be allowed. A nested module is exactly
like a non-nested one, the compiler can move a nested
module around (after preprocessing) because of
the isolation guarrantee.

 // file FRED.CC
 int i;
 module bertha {
  extern int i; // refers to i in module 'FRED'
  module nested {
   i++; // error: i not declared
  }
 }

Circular dependencies should cause link-time diagnostics
to be issued.

The module idea  gives the programmer some control over
granularity, and, by making the dependencies
explicit and the modules fairly big (compared to
individual initialisations) should allow fast,
reliable, mechanical solution to the order of
initialisation problem --- with the bulk
of the responsibility on the programmers shoulders.

Of course, I hate giving programmers responsibility.
But in this case it might be an advantage compared
with denying them a sensible mechanism to take control
of at least some problems.

That is, I'd rather be responsible than helpless.

Disadvantages
-------------

It doesnt solve all problems.

Module name pollution. (There are a number of possible solutions
to this)

Confusion with namespaces.

Its yet another extension.

Comment
-------

The code above is  probably already well formed:
no one said a 'translation unit' had to be equal to the
native operating systems concept of a file,
and no one said that you cant control the linker
with some statements: these things are implementation
defined now, and I just gave an example implementation.

There is a possible advantage to portability, however,
in Standardising this, or some, mechanism.

There is also the potential *disadvantage* that it restricts
future solutions and restricts implementors.

Of course, thats exactly the point: the lack of constraints
on order of initialisation is exactly the problem.

In effect, what I've defined could be done with pre-processing
statements, except they affect the front, rather than the
back, end of the system.  I'd rather not extend the pre-processor
when effort has been put into deprecating it (eg #define).


--
        JOHN (MAX) SKALLER,         INTERNET:maxtal@suphys.physics.su.oz.au
 Maxtal Pty Ltd,      CSERVE:10236.1703
        6 MacKay St ASHFIELD,     Mem: SA IT/9/22,SC22/WG21
        NSW 2131, AUSTRALIA




Author: daniels@biles.com (Brad Daniels)
Date: Mon, 11 Oct 1993 19:44:58 GMT
Raw View
< init order proposal requiring syntax extension deleted >

I wrote up a proposal on this topic a few weeks ago and sent copies to
Bjarne Stroustrup and Peter (argh...  what's peju@research.att.com's
last name?) and haven't received comment back yet.  I also sent a copy
to some of the folks at DEC, but I really have no idea what's happening.

For any who are interested, here is the proposal:
-----------------

   Initialization Order and Nonlocal Static Objects:
 A Proposed Modification to the X3J16 Working Document


        Brad Daniels

    Biles and Associates
   6161 Savoy Dr, Ste 500
         Houston, TX  77036  USA
       E-mail: daniels_b@biles.com


1.  Description

    The current version of the working document has the following to
    say about the order of initialization of static variables in
    multiple compilation units (section 3-4, paragraph 5):

 "The initialization of nonlocal static objects in a translation
 unit is done before the first use of any function or object
 defined in that translation unit.  Such initializations may be
 done before the first statement of main() or deferred to any
 point in time before the first use of a function or object
 defined in that translation unit.  ...  No further order is
 imposed on the initialization of objects from different
 translation units..."

    The condition in the first sentence above is often not satisfiable,
    in that the constructor of a static object is often implemented in
    the same compilation unit as the object, meaning that it needs to
    be constructed before it can call its constructor.  It seems
    reasonable to assume, therefore, that the intent is to make the
    restriction apply only to use of any object or function defined in
    a translation unit which does not occur during construction of a
    nonlocal static object in that translation unit.

    Most (if not all) existing implementations of C++ have made the
    further assumption that the restriction applies only to use of any
    object or function defined in a translation unit which does not
    occur during construction of _ a_ n_ y nonlocal static object in
    _ a_ n_ y compilation unit, in effect meaning that there is no
    ordering whatsoever for construction of nonlocal static objects in
    different compilation units if the implementation chooses to
    construct all such objects before entering main().  This choice of
    assumptions is supported by the discussion of problems with
    ordering of initialization in the Annotated Reference Manual, and
    is also quite understandable, given that it allows a simpler
    implementation, but it has severe consequences concerning the
    usability of nonlocal static data.  For this reason, section 3-4
    paragraph 5 should be modified to read as follows (changes are
    emphasized by underlining, but should not be underlined in the
    document):

 "The initialization of nonlocal static objects in a translation
 unit is done before the first use of any function or object
 defined in that translation unit_  _ b_ y_  _ a_ n_ y_  _ f_ u_ n_ c_ t_ i_ o_ n_  _ o_ r_  _ o_ b_ j_ e_ c_ t_
 _ d_ e_ f_ i_ n_ e_ d_  _ i_ n_  _ a_ n_ o_ t_ h_ e_ r_  _ t_ r_ a_ n_ s_ l_ a_ t_ i_ o_ n_  _ u_ n_ i_ t.  Such initializations may
 be done before the first statement of main() or deferred to any
 point in time before the first_  _ s_ u_ c_ h use of a function or object
 defined in that translation unit.   _ I_ f_  _ t_ h_ e_  _ c_ o_ n_ s_ t_ r_ u_ c_ t_ i_ o_ n_  _ o_ f_  _ a_
 _ n_ o_ n_ l_ o_ c_ a_ l_  _ s_ t_ a_ t_ i_ c_  _ o_ b_ j_ e_ c_ t_  _ r_ e_ q_ u_ i_ r_ e_ s_  _ u_ s_ e_  _ o_ f_  _ a_  _ f_ u_ n_ c_ t_ i_ o_ n _ o_ r_  _ a_  _ s_ e_ c_ o_ n_ d_
 _ o_ b_ j_ e_ c_ t_  _ d_ e_ f_ i_ n_ e_ d_  _ i_ n_  _ a_ n_ o_ t_ h_ e_ r_  _ c_ o_ m_ p_ i_ l_ a_ t_ i_ o_ n_  _ u_ n_ i_ t_ ,_  _ a_ n_ d_  _ t_ h_ a_ t_  _ u_ s_ e_  _ i_ n_
 _ t_ u_ r_ n _ r_ e_ q_ u_ i_ r_ e_ s_  _ u_ s_ e_  _ o_ f_  _ a_  _ f_ u_ n_ c_ t_ i_ o_ n_  _ o_ r_  _ o_ b_ j_ e_ c_ t_  _ i_ n_  _ t_ h_ e_  _ c_ o_ m_ p_ i_ l_ a_ t_ i_ o_ n_
 _ u_ n_ i_ t_  _ w_ h_ i_ c_ h_  _ d_ e_ f_ i_ n_ e_ s_  _ t_ h_ e_  _ f_ i_ r_ s_ t_  _ o_ b_ j_ e_ c_ t_ ,_  _ t_ h_ e_  _ b_ e_ h_ a_ v_ i_ o_ r_  _ i_ s_  _ u_ n_ d_ e_ f_ i_ n_ e_ d_ .
 ...  No further order is imposed on the initialization of
 objects from different translation units..."

    The above change is logically sufficient to require constructor
    ordering, though the committee may also wish to include words to
    the effect that use of a function or object in one translation unit
    during construction of an object in another translation unit is
    explicitly defined to be "use" for the purposes of initialization
    ordering.  The second sentence, while admittedly somewhat convoluted,
    explicitly states that "loops" between translation units result in
    undefined behavior even if there is no actual loop in data dependencies,
    thus simplifying implementation substantially.

2.  Motivations

    The common interpretation of the current text presents difficulties
    in the case where the constructor of a variable in one compilation
    unit directly or indirectly accesses a static object defined in
    another compilation unit.  Such an access is clearly "use" of an
    object in another compilation unit, meaning that the object must be
    initialized before its use.  Most implementations have avoided this
    concern by considering use during construction to be excluded from
    the provisions of section 3.4.

    Given the interpretation taken by existing implementations, there
    has been some support for modifying the language definition to
    conform to existing practice rather than require compilers to
    support the functionality which would be required by proposed
    modification to the working document.  Indeed,  it has been argued
    that the current state of affairs is intended, and that any
    implication that there should be cross-compilation-unit ordering of
    initialization is unintentional.  In particular, the use of the
    "init counter" hack in iostreams tends to support this contention.

    Regardless of the original intent, the current state of affairs
    makes nonlocal complex static objects an unsafe language feature.
    When implementing a class containing nonlocal static objects, there
    is absolutely no way, given the current state of affairs, to
    guarantee that no consumer will access those objects before they
    are initialized, since any consumer may define a nonlocal static
    object whose constructor uses those objects, either directly or
    through function calls.

    Even the "init counter" hack is not completely safe.  Consider the
    following case:

 a.h:
 class A {
   public:
     A();
 }
 ---
 a.C:
 #include <iostream.h>

 A::A() { cout << "A constructed\n"; }
 ---
 b.C:
 #include "a.h"

 class B {
     A x;
   public:
     B() {}
 };

 B v;
 ---

    Notice that in the above example, a.h does not include iostream.h,
    meaning that the iostream init counter class never gets referenced
    in b.C.  Thus, there is no guarantee that cout will be initialized
    before it is used in a.C.  The obvious work-around is to always
    include all of the headers used in an implementation file in the
    corresponding header file, but this solution is unwieldy at best,
    and sometimes impractical in situations where classes make many
    cross-references to one another, or when one wishes to restrict the
    visibility of e.g.  certain little-used or internal-only library
    routines.  Worse still, the error may not show up at all until a
    later change (such as a modified build procedure or additional new
    module) causes the implementation to change the order in which it
    calls initializers.

    Given the above problems, nonlocal static objects which are
    instances of classes with constructors are never safe, and even
    pointers to such objects initialized using init counter classes are
    not safe.  The semantics specified in the standard must be
    strengthened to clearly require that nonlocal static objects must
    be initialized before they are used, even if they are used, either
    directly or indirectly, from the constructor of an object in
    another compilation unit.

3.  Consequences

    The most serious consequence of this change to the working document
    is that its adoption would require modifications to most, if not
    all, existing C++ implementations.  The changes can be made,
    however, without breaking any existing programs, since the current
    assumption made by programs must be that there is no ordering to
    initialization of objects defined in different compilation units.

4.  Experience

    Unfortunately, there is (to my knowledge) no existing
    implementation of C++ which addresses this issue.  Other languages
    have addressed similar issues successfully, though reportedly only
    after much difficult consideration.  This proposal, however, is
    deliberately limited to access from outside a compilation unit, and
    explicitly addresses complications such as two constructors each
    using objects or functions in each others' compilation units by making
    such cross dependencies result in undefined behavior.  Implementors
    are, of course, free to explore the algorithmic complexities involved
    in allowing circular dependencies between modules where there is no
    actual data dependency, but there is no need to require them to do
    so.

4.1 Sample algorithm outline

    Each nonlocal static object will have an associated initialization
    flag and an initialization entry point.  The initialization entry
    point for an object will be an alias for its compilation unit's
    initialization routine.  The compilation unit's initialization
    routine will first test and set a compilation unit-wide
    initialization flag, then initialize all nonlocal static objects in
    the compilation unit, testing and setting the object's
    initialization flag before commencing with initialization of the
    object.  At the beginning of any function which directly accesses a
    nonlocal static object, the compiler includes code to check that
    object's initialization flag, and invoke the object's
    initialization routine if the flag is not set.  Any function which
    uses an inline function containing a direct reference to a nonlocal
    static object is considered to directly access that object.  Note
    that access here is considered to include taking a reference to an
    object, and for the purposes of this algorithm, any nonlocal static
    object initialized to contain a reference to another nonlocal
    static object must cause initialization code to be generated which
    will initialize the object it refers to.  In this way, we avoid
    aliasing problems.  Additionally, any function defined in a
    compilation unit must check the compilation unit-wide
    initialization flag, and call the compilation unit's initialization
    routine if it is not set.  Obviously, this check could replace the
    checks for the initialization flags for those nonlocal static
    variables which are defined in the same module in which they are
    accessed, since such compilation unit-internal references have the
    known effect of invoking that unit's initialization routine.  These
    compilation unit-wide checks are necessary to retain the semantics
    of ensuring that all nonlocal static objects be initialized before
    the first use of any object or function in a compilation unit.
    Since this requirement is vital to the implementation of much
    existing code (and especially the current init counter approach),
    changing the requirement is not feasible.  Also, the compilation
    unit-wide flag prevents use of local functions from interfering
    with initialization order within a compilation unit.

    Initialization proceeds as is commonly the case at present.  At
    link time, the implementation gathers the initialization routines
    from all modules being linked, and inserts code before main() to
    call each routine in turn, in no specific order.

    Any access to a nonlocal static object must occur either directly
    from inside some function, or indirectly through a pointer.  Either
    type of access, including access through a static pointer, will
    result in proper initialization of the objects using the method
    described above, before entry to the function making the accesses.
    This means that if the construction of a nonlocal static requires
    that another nonlocal static object be initialized, the other
    object and all nonlocal static objects defined in its compilation
    unit will be initialized before entry to the function which
    accesses the object.

    By maintaining a per-variable initialization flag, this algorithm
    should actually correctly handle most cases where two compilation
    units each refer to objects in the other, but analyzing the
    situation to determine precisely how well it does so is
    non-trivial.  The proposed modification to section 3-4, however,
    explicitly declares such situations to result in undefined
    behavior, so it is not necessary to analyze that case at this
    point.  It would also be feasible to have each per-variable
    initialization flag be an alias for the compilation unit-wide flag,
    but this approach would provide less functionality for no gain in
    run-time efficiency, and only a minor gain in space efficiency.

5.  Summary

    The modification proposed in this paper makes complex nonlocal
    static objects a safe, predictable feature of the language without
    substantially changing the language's requirements.  It achieves
    this simply by clarifying existing stated requirements on
    initialization ordering which have hitherto been subject to varied
    interpretation.  Further, it will not break existing code, and will
    tend to increase the robustness of new and existing applications
    and libraries written in C++ by eliminating an entire class of
    errors.  Further, it can be implemented simply, though perhaps not
    trivially, since the implementation involves dependency graph
    searches to determine the presence of references to nonlocal static
    data.  The benefits of this modification far outweigh the risks
    associated with any change to the language definition requiring
    that existing implementations change.
--
----------------------------------------------------------------------
+ Brad Daniels   | "Let others praise ancient times;  +
+ Biles and Associates  |  I am glad I was born in these."   +
+ These are my views, not B&A's |   - Ovid(43 B.C - 17 A.D)    +