Topic: Smart pointers and Stupid people (my reactions and a new idea)
Author: rfg@lupine.ncd.com (Ron Guilmette)
Date: 13 Jan 91 03:38:58 GMT Raw View
Now for my detailed comments on the "smart pointer problem" discussion
so far. (This is where it really starts to get biased! :-)
--------------------------------------------------------------------------
I believe that the concerns expressed by Andrew Ginter are, for the most
part, non-issues. I don't see where functions which return pointers
(either smart or stupid ones) need to cause us any special concerns.
Likewise for temporaries and expression evaluation ordering. The `this'
pointer is worthy of note only in that it must have type T* and thus,
the type T* should be unrestricted wherever `this' is accessible.
Taking the address of a data member of an object of type T need not cause
us any special concer either because the value yielded by this use of the
unary & operator will be of some pointer-to-member-of-T type, which cannot
be subsequently be used in isolation. Ratherany such pointer-to-member-
of-T may only be used in conjunction with honest-to-goodness pointers to
objects of the type T and if these rae maintained correctly than all will
work out just fine.
Likewise, Tim Atkins concerns are (I believe) misplaced. I don't think
that it is necessary to have *all* pointers to some type T be smart in
order for a program containing objects of type T to be useful. Quite
the contrary, it seems to me that for any pointed-at type (T) you may
want to use smart pointers (to T) in most places and you will absolutely
have to use stupid pointers to T in certain (limited) places. Additionally,
I don't see where low level implementation-specific details (e.g. the
code that cfront generates) needs to enter into this discussion unless
cfront has bugs that become aparent when we are fiddling with smart
(or stupid) pointers. The issue of T*'s in registers also seems
unrelated, unless of course our garbage collector can be triggered into
action asynchronously (e.g. as the result of a signal). In that case,
it may be wise to declare all of our stupid pointers to be volatile
(so that we don't get into memory/register synchronization problems)
but that is all unrelated to the point of this discussion.
I believe that both Peter Grandi and Jim Adcock are saying that we need
to restrict the use of the type T* at run-time via run-time mechanisms.
If so, I disagree with both of them. I feel that we ought to be able
to do something at compile-time where the performance cost is not so high
as it is for things done at run-time.
Henry Cobb's idea to make all constructors for type T private is somewhat
similar to Marshall Cline's suggestion of nesting the declaration (and
definition) of the class T within a smart_pointer_to_T class. In both
cases, the idea seems to be to restrict the ability to create objects
of type T to some particular (limited) set of lexical scopes (all of
which are under complete control of the smart_pointer_to_T type).
To varying degrees, these two proposals solve the "smart pointer
problem" by making the type T unknown to the outside world. (In the case
of Henry's proposal, the whole program could at least say `sizeof(T)'
whereas in Marshall's proposal, even that would be illegal outside of
the encapsulating outer class.)
Anyway, these two proposals succeed by hiding the type T from those who
would attempt to use it directly, and by forcing such potential users
to ask for assistance from the smart_pointer_to_T type in order to
do anything (including creation and destruction) with an object of type T.
These solutions have definite merits, but there is a downside to hiding
the type T. (More on this later.)
The solution proposed by Bob Martin and (independently) also by Jeremy
Grodberg to allow the type T* to be treated like a class (which can
be declared and which can have member functions and operators defined
for it) is clever and I had myself considered it, however I fear that
Bjarne will never like it. The reason? Well, it makes the language
"mutable" (in Stroustrup's terms). One early (and related) idea
which I had some time ago for solving this "smart pointer" problem
was to allow stuff like:
T*& operator= (T*, T*&);
T operator* (T*);
In effect, I wanted to let the user just redefine the meaning of = and
(unary) * for plain old pointer types. If you could do that, then you
could be in complete control of all operations done with stupid pointers.
That idea was almost the same as allowing:
class T* {
public:
T*& operator= (T*&);
T operator* ();
};
But in both cases, you are allowing the user to change the existing
meaning of things whose meaning is already well defined in the language
(e.g. the meaning of unary * when applied to a pointer type value).
Bjarne doesn't want to open that Pandora's box. I tend to feel that
this one important case (of pointer types) might warrant a bit of
"mutability" being allowed to stick its nose into the tent, but it
doesn't much matter what I think. I doubt that Bjarne will have any
part of it.
Of all of these ideas, I think that I like Marshall Cline's the best.
It certainly has good prospects of being implemented widely so that we
can all start to use it soon. After all, it relies only on features of
C++ which are already described in current drafts of the x3j16 working
documents! In effect, nested classes are already "in" the standard.
(I hope nobody in x3j16 kills me for having said that.)
Likewise, Henry Cobb's idea (to make all constructors for T private
and then to just make functions and classes which actually have to
create T's into friends of T) is a good solution which ought to work
even with current implementations.
I do see some problems with these two ideas however. First and foremost,
by using either of these approaches, I have to give up the ability
(which I would otherwise have) to simply declare an object of type T
as a storage-class `static' file-scope variable, or as storage-class
`auto' variable (local to a function) or even as a member.
I don't like that one bit! Just because I want the use of T*'s to be
to be restricted does not mean that I also want to be restricted in
what I can do with a T. Gosh darn it! I want my cake and I want to
eat it too!
Another problem with both Marshall's idea and with Henry's idea is that
they both require me to put the entire *definition* of the (controlled)
class T into header files where I don't even want it to be! That slows
down compilation unnecessarily (which irks me). For example, with Henry's
proposal, I have to put this into my header file:
smart_tp.h:
---------------------------------------------------------------
class T {
/* ... the complete definition of T ...*/
friend class smart_pointer_to_T;
};
class smart_pointer_to_T {
/* ... definition of smart_pointer_to_T ... */
};
---------------------------------------------------------------
Here, both definitions of both classes have to be scanned and compiled
for each .C file which includes "smart_tp.h". Many of these may not even
need to know *any* of the details of the definition of class T.
Likewise, for Marshall's proposal, I need:
smart_tp.h:
---------------------------------------------------------------
class smart_pointer_to_T {
class T {
/* ... the complete definition of T ...*/
};
/* ... definition of smart_pointer_to_T ... */
};
---------------------------------------------------------------
Which is equally wasteful of compile time.
Now somebody else was asking over in comp.std.c++ if it was legal to
incompletely declare a nested class, so that you could have (for example):
smart_tp.h:
---------------------------------------------------------------
class smart_pointer_to_T {
class T; /* incomplete declaration of T */
/* ... definition of smart_pointer_to_T ... */
};
---------------------------------------------------------------
and then later on in a different file:
complete_t.C:
---------------------------------------------------------------
#include "smart_tp.h"
class smart_pointer_to_T::T {
/* completion of type smart_pointer_to_T::T */
};
---------------------------------------------------------------
In my opinion, that would be "way cool" if you could do that, but I don't
think that it is legal. Furthermore, even if it is legal, it only
provides a way of eliminating one of my two objections to Marshall's
proposed solution to the "smart pointer problem". The other (more
important) objection still remains. You still couldn't declare T
objects all over the place. You could only created them where the
smart_pointer_to_T type would let you (probably only in the heap).
My initial proposal was intended solve the "smart pointer problem"
while keeping the language "immutable", allowing declarations of T
objects in most places, and avoiding any need to have a complete
definition of the type T preceed the definition of the type
smart_pointer_to_T.
I believe that my proposal did all that, but I'm now starting to wonder
if it was really such a hot idea after all.
My proposal simply provided a means for telling the compiler that (in
certain contexts) it sould treat uses of type T* values as illegal
(thus forcing the user to use the smart pointer type in those contexts
instead).
Perhaps I grabbed the problem by the wrong end.
I now believe that it might be equally effective to simply make it
impossible to even generate a valid (non-null) stupid pointer-to-T
value in certain contexts. Obviously, if you can prevent valid
values of type T* from leaking out into some area then you don't
even need to worry about whether or not operations on T*'s are
restricted (over that area) or not.
Obviously, for a class type T, you can overload operator& (either as
a member function of T or as a global function taking a T&).
That right there puts you in control of most of the cases where a T*
could potentially be generated.
Unfortunately, there are others that you (currently) can't control.
As Jeremy Grodberg (jgro@lia.com) noted, the language rules currently say
that if you new() an array of objects of some type T, the global operator
new is invoked for this regardless of whether or not the type T has its
own class-specific operator new() defined. As a result, whenever you
new() an array of T, you'll get back a value of type T* even if you
would have preferd getting back a value of type smart_pointer_to_T.
This is a one means by which which unwanted (but valid and non-void)
values of type T* may leak into some context. This leakage is very bad
and it ought to be rectified by x3j16.
Also, there is one more leakage problem. Given some local or global
variable called `ta' of type array of T, the following expression
yields a value of type T* even if the type T has its own class-specific
operator& defined for it:
ta
That's it! The name of an array is generally converted (implicitly) into
a pointer to the zeroth member of that array. This implicit conversion
currently circumvents any class-specific operator& definition (if one
is present) for the class T and allows values of type T* to leak into
contexts where they may not be welcomed.
If both of these unfortunate leaks in the language could be plugged, we
might be able to achieve really water-tight "safe" smart pointer types
just by overloading operator& (and having it yield a smart pointer type)
for some "controlled" type T.
Both leaks could be easily plugged while doing little harm to the existing
language.
For the first leak, it would be easy enough to say that a class-specific
operator new() for a class T is called whenever a single object *or* an
array of objects of type T is new'ed. Such operators could then be defined
by the user to return some smart pointer type.
For the second leak, we could simply redefine the semantics of "array-name"
(where "array-name" names an array of objects of some class type) to be
equivalent to invoking (implicitly) an applicable operator& (either member
or global) on the zeroth element of the array.
There now. That was simple, eh?
Note that by plugging these two leaks, we have not destroyed the user's
ability to declare objects of type T (or even objects of type T*)
but what we have done is to give the user all of the tools he/she needs
in order to insure that no useful values of type T* (other than NULL)
ever leak into a given area (where they might be misused).