Topic: Order of Initialisation
Author: maxtal@physics.su.OZ.AU (John Max Skaller)
Date: Mon, 11 Oct 1993 17:25:14 GMT Raw View
In my opinion, the order of initialisation problem is one that
needs to be solved: it will become particularly important
with multiple large class libraries supported by namespaces,
especially ones with lots of complex templates.
Ideally, the linker/compiler can do a complete dependency
analysis on individual initialisations, but I think
that a REQUIREMENT for that in the Standard may create
two problems:
1) Its hard to do, I'm not sure its even possible
due to aliasing problems.
So many compilers will get it wrong.
2) Its likely to be VERY slow even if its possible.
On the other hand, I am not sure that leaving the order of
initialisation unspecified is acceptable either,
and the de-facto specification: within a translation unit
in order, between translation units no order: is not
good enough either.
I think. Do you agree?
I have a possible, somewhat hacked, and non perfect, solution.
I dont claim its ideal or solves all problems. It definitely
solves *some* problems. It also requires an extension,
and if its not considered soon we may be stuck either
with no solution, or delays in acceptance of the Standard
while a solution is retrofitted.
Here is an outline of my idea:
1) You can write:
module fred { .. }
module joe depends on fred { .. }
in a file. After pre-processing, each module is treated
exactly as a separate translation unit.
(Sorry, these are NOT real modules, MODULA or Ada style)
The names of the modules are not visible to the program.
(Well, they could be accessed by a __MODULE__ macro I suppose).
In particular, a module is NOT a namespace.
module max {
int i;
max::i++; // error, max not known
}
module tal depends on max {
extern int i;
i++; // fine, refers to 'i' in module max
}
The module names must be unique per program. They correspond
to the module names in a DOS object code library file,
and represent the smallest loadable unit.
An immediate advantage is you can create a library with
100 separately loadable functions in it, from a single
source file.
Wrapping module statements around things acts as an
impenetrable barrier, guarranteeing isolation
from context. Use of modules helps deprecate use
of the preprocessor and the file system, and provides
a mechanism in the C++ language proper to deal
with 'translation units'.
The 'depends on ' clause indicates an initialisation dependency.
Separate statements can be written:
module A depends on B, C;
module X depends on B, Y;
and I envisage a file of these per program to resolve
order not intrinsic to the modules themselves.
To make this work for templates, we need to force
them to be instantiated somewhere. There is a separate
proposal for this in its second revision already,
basically:
template list<complex>;
forces instantiation 'here'. These forced instantiations
do not affect overloading. I imagine them in modules
in the same file (or same sort of file)
as the 'extra' module dependencies.
Some extra issues: code not in explicit modules
is wrapped in a module with a compiler generated unique
name. (The file name would be a good place to start :-)
Modules can be nested, but this is mainly because
a file is a module, and an explicit module defined
in a file is thereby automatically nested,
so it has to be allowed. A nested module is exactly
like a non-nested one, the compiler can move a nested
module around (after preprocessing) because of
the isolation guarrantee.
// file FRED.CC
int i;
module bertha {
extern int i; // refers to i in module 'FRED'
module nested {
i++; // error: i not declared
}
}
Circular dependencies should cause link-time diagnostics
to be issued.
The module idea gives the programmer some control over
granularity, and, by making the dependencies
explicit and the modules fairly big (compared to
individual initialisations) should allow fast,
reliable, mechanical solution to the order of
initialisation problem --- with the bulk
of the responsibility on the programmers shoulders.
Of course, I hate giving programmers responsibility.
But in this case it might be an advantage compared
with denying them a sensible mechanism to take control
of at least some problems.
That is, I'd rather be responsible than helpless.
Disadvantages
-------------
It doesnt solve all problems.
Module name pollution. (There are a number of possible solutions
to this)
Confusion with namespaces.
Its yet another extension.
Comment
-------
The code above is probably already well formed:
no one said a 'translation unit' had to be equal to the
native operating systems concept of a file,
and no one said that you cant control the linker
with some statements: these things are implementation
defined now, and I just gave an example implementation.
There is a possible advantage to portability, however,
in Standardising this, or some, mechanism.
There is also the potential *disadvantage* that it restricts
future solutions and restricts implementors.
Of course, thats exactly the point: the lack of constraints
on order of initialisation is exactly the problem.
In effect, what I've defined could be done with pre-processing
statements, except they affect the front, rather than the
back, end of the system. I'd rather not extend the pre-processor
when effort has been put into deprecating it (eg #define).
--
JOHN (MAX) SKALLER, INTERNET:maxtal@suphys.physics.su.oz.au
Maxtal Pty Ltd, CSERVE:10236.1703
6 MacKay St ASHFIELD, Mem: SA IT/9/22,SC22/WG21
NSW 2131, AUSTRALIA
Author: daniels@biles.com (Brad Daniels)
Date: Mon, 11 Oct 1993 19:44:58 GMT Raw View
< init order proposal requiring syntax extension deleted >
I wrote up a proposal on this topic a few weeks ago and sent copies to
Bjarne Stroustrup and Peter (argh... what's peju@research.att.com's
last name?) and haven't received comment back yet. I also sent a copy
to some of the folks at DEC, but I really have no idea what's happening.
For any who are interested, here is the proposal:
-----------------
Initialization Order and Nonlocal Static Objects:
A Proposed Modification to the X3J16 Working Document
Brad Daniels
Biles and Associates
6161 Savoy Dr, Ste 500
Houston, TX 77036 USA
E-mail: daniels_b@biles.com
1. Description
The current version of the working document has the following to
say about the order of initialization of static variables in
multiple compilation units (section 3-4, paragraph 5):
"The initialization of nonlocal static objects in a translation
unit is done before the first use of any function or object
defined in that translation unit. Such initializations may be
done before the first statement of main() or deferred to any
point in time before the first use of a function or object
defined in that translation unit. ... No further order is
imposed on the initialization of objects from different
translation units..."
The condition in the first sentence above is often not satisfiable,
in that the constructor of a static object is often implemented in
the same compilation unit as the object, meaning that it needs to
be constructed before it can call its constructor. It seems
reasonable to assume, therefore, that the intent is to make the
restriction apply only to use of any object or function defined in
a translation unit which does not occur during construction of a
nonlocal static object in that translation unit.
Most (if not all) existing implementations of C++ have made the
further assumption that the restriction applies only to use of any
object or function defined in a translation unit which does not
occur during construction of _ a_ n_ y nonlocal static object in
_ a_ n_ y compilation unit, in effect meaning that there is no
ordering whatsoever for construction of nonlocal static objects in
different compilation units if the implementation chooses to
construct all such objects before entering main(). This choice of
assumptions is supported by the discussion of problems with
ordering of initialization in the Annotated Reference Manual, and
is also quite understandable, given that it allows a simpler
implementation, but it has severe consequences concerning the
usability of nonlocal static data. For this reason, section 3-4
paragraph 5 should be modified to read as follows (changes are
emphasized by underlining, but should not be underlined in the
document):
"The initialization of nonlocal static objects in a translation
unit is done before the first use of any function or object
defined in that translation unit_ _ b_ y_ _ a_ n_ y_ _ f_ u_ n_ c_ t_ i_ o_ n_ _ o_ r_ _ o_ b_ j_ e_ c_ t_
_ d_ e_ f_ i_ n_ e_ d_ _ i_ n_ _ a_ n_ o_ t_ h_ e_ r_ _ t_ r_ a_ n_ s_ l_ a_ t_ i_ o_ n_ _ u_ n_ i_ t. Such initializations may
be done before the first statement of main() or deferred to any
point in time before the first_ _ s_ u_ c_ h use of a function or object
defined in that translation unit. _ I_ f_ _ t_ h_ e_ _ c_ o_ n_ s_ t_ r_ u_ c_ t_ i_ o_ n_ _ o_ f_ _ a_
_ n_ o_ n_ l_ o_ c_ a_ l_ _ s_ t_ a_ t_ i_ c_ _ o_ b_ j_ e_ c_ t_ _ r_ e_ q_ u_ i_ r_ e_ s_ _ u_ s_ e_ _ o_ f_ _ a_ _ f_ u_ n_ c_ t_ i_ o_ n _ o_ r_ _ a_ _ s_ e_ c_ o_ n_ d_
_ o_ b_ j_ e_ c_ t_ _ d_ e_ f_ i_ n_ e_ d_ _ i_ n_ _ a_ n_ o_ t_ h_ e_ r_ _ c_ o_ m_ p_ i_ l_ a_ t_ i_ o_ n_ _ u_ n_ i_ t_ ,_ _ a_ n_ d_ _ t_ h_ a_ t_ _ u_ s_ e_ _ i_ n_
_ t_ u_ r_ n _ r_ e_ q_ u_ i_ r_ e_ s_ _ u_ s_ e_ _ o_ f_ _ a_ _ f_ u_ n_ c_ t_ i_ o_ n_ _ o_ r_ _ o_ b_ j_ e_ c_ t_ _ i_ n_ _ t_ h_ e_ _ c_ o_ m_ p_ i_ l_ a_ t_ i_ o_ n_
_ u_ n_ i_ t_ _ w_ h_ i_ c_ h_ _ d_ e_ f_ i_ n_ e_ s_ _ t_ h_ e_ _ f_ i_ r_ s_ t_ _ o_ b_ j_ e_ c_ t_ ,_ _ t_ h_ e_ _ b_ e_ h_ a_ v_ i_ o_ r_ _ i_ s_ _ u_ n_ d_ e_ f_ i_ n_ e_ d_ .
... No further order is imposed on the initialization of
objects from different translation units..."
The above change is logically sufficient to require constructor
ordering, though the committee may also wish to include words to
the effect that use of a function or object in one translation unit
during construction of an object in another translation unit is
explicitly defined to be "use" for the purposes of initialization
ordering. The second sentence, while admittedly somewhat convoluted,
explicitly states that "loops" between translation units result in
undefined behavior even if there is no actual loop in data dependencies,
thus simplifying implementation substantially.
2. Motivations
The common interpretation of the current text presents difficulties
in the case where the constructor of a variable in one compilation
unit directly or indirectly accesses a static object defined in
another compilation unit. Such an access is clearly "use" of an
object in another compilation unit, meaning that the object must be
initialized before its use. Most implementations have avoided this
concern by considering use during construction to be excluded from
the provisions of section 3.4.
Given the interpretation taken by existing implementations, there
has been some support for modifying the language definition to
conform to existing practice rather than require compilers to
support the functionality which would be required by proposed
modification to the working document. Indeed, it has been argued
that the current state of affairs is intended, and that any
implication that there should be cross-compilation-unit ordering of
initialization is unintentional. In particular, the use of the
"init counter" hack in iostreams tends to support this contention.
Regardless of the original intent, the current state of affairs
makes nonlocal complex static objects an unsafe language feature.
When implementing a class containing nonlocal static objects, there
is absolutely no way, given the current state of affairs, to
guarantee that no consumer will access those objects before they
are initialized, since any consumer may define a nonlocal static
object whose constructor uses those objects, either directly or
through function calls.
Even the "init counter" hack is not completely safe. Consider the
following case:
a.h:
class A {
public:
A();
}
---
a.C:
#include <iostream.h>
A::A() { cout << "A constructed\n"; }
---
b.C:
#include "a.h"
class B {
A x;
public:
B() {}
};
B v;
---
Notice that in the above example, a.h does not include iostream.h,
meaning that the iostream init counter class never gets referenced
in b.C. Thus, there is no guarantee that cout will be initialized
before it is used in a.C. The obvious work-around is to always
include all of the headers used in an implementation file in the
corresponding header file, but this solution is unwieldy at best,
and sometimes impractical in situations where classes make many
cross-references to one another, or when one wishes to restrict the
visibility of e.g. certain little-used or internal-only library
routines. Worse still, the error may not show up at all until a
later change (such as a modified build procedure or additional new
module) causes the implementation to change the order in which it
calls initializers.
Given the above problems, nonlocal static objects which are
instances of classes with constructors are never safe, and even
pointers to such objects initialized using init counter classes are
not safe. The semantics specified in the standard must be
strengthened to clearly require that nonlocal static objects must
be initialized before they are used, even if they are used, either
directly or indirectly, from the constructor of an object in
another compilation unit.
3. Consequences
The most serious consequence of this change to the working document
is that its adoption would require modifications to most, if not
all, existing C++ implementations. The changes can be made,
however, without breaking any existing programs, since the current
assumption made by programs must be that there is no ordering to
initialization of objects defined in different compilation units.
4. Experience
Unfortunately, there is (to my knowledge) no existing
implementation of C++ which addresses this issue. Other languages
have addressed similar issues successfully, though reportedly only
after much difficult consideration. This proposal, however, is
deliberately limited to access from outside a compilation unit, and
explicitly addresses complications such as two constructors each
using objects or functions in each others' compilation units by making
such cross dependencies result in undefined behavior. Implementors
are, of course, free to explore the algorithmic complexities involved
in allowing circular dependencies between modules where there is no
actual data dependency, but there is no need to require them to do
so.
4.1 Sample algorithm outline
Each nonlocal static object will have an associated initialization
flag and an initialization entry point. The initialization entry
point for an object will be an alias for its compilation unit's
initialization routine. The compilation unit's initialization
routine will first test and set a compilation unit-wide
initialization flag, then initialize all nonlocal static objects in
the compilation unit, testing and setting the object's
initialization flag before commencing with initialization of the
object. At the beginning of any function which directly accesses a
nonlocal static object, the compiler includes code to check that
object's initialization flag, and invoke the object's
initialization routine if the flag is not set. Any function which
uses an inline function containing a direct reference to a nonlocal
static object is considered to directly access that object. Note
that access here is considered to include taking a reference to an
object, and for the purposes of this algorithm, any nonlocal static
object initialized to contain a reference to another nonlocal
static object must cause initialization code to be generated which
will initialize the object it refers to. In this way, we avoid
aliasing problems. Additionally, any function defined in a
compilation unit must check the compilation unit-wide
initialization flag, and call the compilation unit's initialization
routine if it is not set. Obviously, this check could replace the
checks for the initialization flags for those nonlocal static
variables which are defined in the same module in which they are
accessed, since such compilation unit-internal references have the
known effect of invoking that unit's initialization routine. These
compilation unit-wide checks are necessary to retain the semantics
of ensuring that all nonlocal static objects be initialized before
the first use of any object or function in a compilation unit.
Since this requirement is vital to the implementation of much
existing code (and especially the current init counter approach),
changing the requirement is not feasible. Also, the compilation
unit-wide flag prevents use of local functions from interfering
with initialization order within a compilation unit.
Initialization proceeds as is commonly the case at present. At
link time, the implementation gathers the initialization routines
from all modules being linked, and inserts code before main() to
call each routine in turn, in no specific order.
Any access to a nonlocal static object must occur either directly
from inside some function, or indirectly through a pointer. Either
type of access, including access through a static pointer, will
result in proper initialization of the objects using the method
described above, before entry to the function making the accesses.
This means that if the construction of a nonlocal static requires
that another nonlocal static object be initialized, the other
object and all nonlocal static objects defined in its compilation
unit will be initialized before entry to the function which
accesses the object.
By maintaining a per-variable initialization flag, this algorithm
should actually correctly handle most cases where two compilation
units each refer to objects in the other, but analyzing the
situation to determine precisely how well it does so is
non-trivial. The proposed modification to section 3-4, however,
explicitly declares such situations to result in undefined
behavior, so it is not necessary to analyze that case at this
point. It would also be feasible to have each per-variable
initialization flag be an alias for the compilation unit-wide flag,
but this approach would provide less functionality for no gain in
run-time efficiency, and only a minor gain in space efficiency.
5. Summary
The modification proposed in this paper makes complex nonlocal
static objects a safe, predictable feature of the language without
substantially changing the language's requirements. It achieves
this simply by clarifying existing stated requirements on
initialization ordering which have hitherto been subject to varied
interpretation. Further, it will not break existing code, and will
tend to increase the robustness of new and existing applications
and libraries written in C++ by eliminating an entire class of
errors. Further, it can be implemented simply, though perhaps not
trivially, since the implementation involves dependency graph
searches to determine the presence of references to nonlocal static
data. The benefits of this modification far outweigh the risks
associated with any change to the language definition requiring
that existing implementations change.
--
----------------------------------------------------------------------
+ Brad Daniels | "Let others praise ancient times; +
+ Biles and Associates | I am glad I was born in these." +
+ These are my views, not B&A's | - Ovid(43 B.C - 17 A.D) +