Topic: Weak refs suffice for virtual pruning - was C++ as PL/1


Author: Pete Becker <petebecker@acm.org>
Date: 1998/07/23
Raw View
bbrunswick@my-dejanews.com wrote:
>
> Isn't it sufficient for the compiler to generate the vtbl filled with
> weak references, and then each virtual function call to use a symbol to
> get the offset in the vtbl that it should use. Those symbols get defined
> with the virtual function definition, meaning its only pulled in if needed.

If the compiler says "I'm calling Base::f(int)", how does the linker
know to link in Derived::f(int)?

>
> Pruning facets is also fairly straight forward isn't it, at the cost of some
> bloat in use_facet? We can simply fill the slots in the locale with dummy
> values, and have the appropriate specialisation of use_facet convert these to
> the real functions, so those functions are only pulled in if that use_facet
> call is ever made. I imagine the (inlined) use_facet call maybe checking if
> the pointer is a magic small integer (non-portable, but we are the system
> library) and if so looking the proper value up in a static table. (which
> pulls in the real functions)

Yes, you can do that. It's a bit trickier than it sounds, but it's what
the Dinkum library does.

>
> If we do assembly level control of the library, then maybe something could be
> arranged to use weak refs again, as above.

Maybe. Where would you use them? And, of course, if you don't control
the linker and the linker on the platform that you're using doesn't
support weak references, this doesn't help.

--
Pete Becker
Dinkumware, Ltd.
http://www.dinkumware.com
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: 1998/07/23
Raw View
bbrunswick@my-dejanews.com wrote in article <6p4cc4$u9f$1@nnrp1.dejanews.com>...
> Isn't it sufficient for the compiler to generate the vtbl filled with
> weak references, and then each virtual function call to use a symbol to
> get the offset in the vtbl that it should use. Those symbols get defined
> with the virtual function definition, meaning its only pulled in if needed.

I'm shooting from the hip here, but it seems to me that all you can tell
by this ruse is whether *at least one* version of a virtual function gets
called, but not *which one*. So you have to load 'em all. Still, you do
eliminate a virtual that doesn't get called in any flavor.

> Pruning facets is also fairly straight forward isn't it, at the cost of some
> bloat in use_facet? We can simply fill the slots in the locale with dummy
> values, and have the appropriate specialisation of use_facet convert these to
> the real functions, so those functions are only pulled in if that use_facet
> call is ever made. I imagine the (inlined) use_facet call maybe checking if
> the pointer is a magic small integer (non-portable, but we are the system
> library) and if so looking the proper value up in a static table. (which
> pulls in the real functions)

See <xlocale> in VC++ V5.0. You will find a mechanism very similar to
what you've described. It has been available in production code since V4.2
shipped in 1996. I first described the technique in print in my monthly
column ``State of the Art: Too Much of a Good Thing,'' Embedded Systems
Programming, July 1996.

And it still doesn't solve all of the ``problems with locale and its facets.''

> If we do assembly level control of the library, then maybe something could be
> arranged to use weak refs again, as above.

Yeah, maybe something could be arranged. That's an engineering spec if
ever I read one.

> Please tell me my understanding is lacking and that these are actually hard
> problems... its worrying if so much argument is going on over things that can
> be worked around easily.

You have some good insights into some of the issues, as have a number of
other contributors to this protracted thread. But they are still hard problems
and they are still not worked around easily. More important, these were
problems *that did not occur* in the Language Formerly Known as C++ that
inspired the invention called Standard C++. It is the latter creature that
vendors of commercial compilers (and the vendors behind the scenes who
supply good libraries for them) are now wrestling with.

I have persisted in contributing to this thread to raise awareness of the
issues. It doesn't help to dismiss very real problems as nonexistent, or as
easily worked around (if only you sat down and actually thought about it).
It doesn't help to suggest that a handful of secret techniques will prove
within the year that the problems are all artifacts of ``immature''
implementations.

What does help are positive suggestions, like your idea of ``facets on
demand,'' even if it proved not to be original. I look forward to more such
contributions to this discussion.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Hyman Rosen <hymie@prolifics.com>
Date: 1998/07/23
Raw View
P.J. Plauger wrote:
> bbrunswick@my-dejanews.com wrote in article <6p4cc4$u9f$1@nnrp1.dejanews.com>...
> > Isn't it sufficient for the compiler to generate the vtbl filled with
> > weak references, and then each virtual function call to use a symbol to
> > get the offset in the vtbl that it should use. Those symbols get defined
> > with the virtual function definition, meaning its only pulled in if needed.
>
> I'm shooting from the hip here, but it seems to me that all you can tell
> by this ruse is whether *at least one* version of a virtual function gets
> called, but not *which one*. So you have to load 'em all. Still, you do
> eliminate a virtual that doesn't get called in any flavor.

This is where new linker behavior is needed, but the new behavior is simple.
We need to define a new kind of symbol, which I call a weak symbol. When
the linker has an unresolved reference to a symbol, and it finds a
corresponding weak symbol in a library object file, it loads that file as
normal. The only difference is that this does not cause the symbol to
become defined. If the weak symbol is defined in further library object
files, those objects are loaded as well.

Now, the compiler just needs to generate weak references in the vtable,
plus allocate a unique symbol for every virtual function, based on the its
name and the class in which it is first declared. Whenever the compiler
generates a polymorphic call to a virtual function, it issues a reference
to the special symbol of that function. Whenever the compiler generates an
out-of-line implementation of a virtual function, it generates a weak
symbol definition of the special symbol of that function. That's all that's
required.


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: jcoffin@taeus.com (Jerry Coffin)
Date: 1998/07/23
Raw View
In article <01bdb64d$f67d8480$8a1ec2d0@porky>, pjp@dinkumware.com
says...

[ ... ]

> You have some good insights into some of the issues, as have a number of
> other contributors to this protracted thread. But they are still hard problems
> and they are still not worked around easily. More important, these were
> problems *that did not occur* in the Language Formerly Known as C++ that
> inspired the invention called Standard C++. It is the latter creature that
> vendors of commercial compilers (and the vendors behind the scenes who
> supply good libraries for them) are now wrestling with.

I think this is a somewhat unfair characterization: the problems DID
occur with the languages formerly known as C++, but didn't occur as
often or with as drastic of results.  The introduction of virtual
functions leads to at least the possibility of the problem arising.
The current design of the iostreams simply means that almost any
program using iostreams (at least as implemented in any library I've
seen yet) to include a great deal of code that displays a more or less
extreme example of the fundamental problem.

In a way, I consider that a good thing: if the potential for the same
problem existed, but was only rarely seen, the techniques for dealing
with it would remain obscure and most likely people would have to
invent new (and probably not very good) ways of trying to deal with it
every time it arose.  As-is, it's more or less incumbent upon the
vendors to come up with ways of dealing with it, so it'll be dealt
with reasonably efficiently when it arises.  Even if implementations
don't add new compiler/linker features to help, simply having library
code available to examine for the basic techniques will probably be
quite helpful for others who invent libraries with any sort of vague
similarity to the standard iostreams.


--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: rdamon@BeltronicsInspection.com (Richard Damon)
Date: 1998/07/23
Raw View
bbrunswick@my-dejanews.com wrote:

>In article <6p063o$8kt$1@shell7.ba.best.com>,  ncm@nospam.cantrip.org (Nathan
>Myers) wrote:
>> Omitting unused virtual functions is an important tool for eliminating
>> some bloat, but weak references by themselves (as Pete likes to remind
>> us) are not sufficient to omit unused virtual functions; that takes a
>> linker extension.
>
>Surely that turns out not to be the case.
>
>Isn't it sufficient for the compiler to generate the vtbl filled with
>weak references, and then each virtual function call to use a symbol to
>get the offset in the vtbl that it should use. Those symbols get defined
>with the virtual function definition, meaning its only pulled in if needed.
>
Reference to Base::fun needs to force loading of Derived::fun, even though
usages of Base need never have seen Derived. You also need some form of reverse
hook which says if you load A you need to load B also.

--
richard_damon@iname.com (Redirector to my current best Mailbox)
rdamon@beltronicsInspection.com (Work Adddress)
Richad_Damon@msn.com (Just for Fun)


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: Pete Becker <petebecker@acm.org>
Date: 1998/07/24
Raw View
Richard Damon wrote:
>
> >
> Reference to Base::fun needs to force loading of Derived::fun, even though
> usages of Base need never have seen Derived. You also need some form of reverse
> hook which says if you load A you need to load B also.

That's right, but you have to be careful how you say it. It's easy for
the module that defines A to say that if A gets loaded B must also get
loaded. That's routine stuff. It's harder from the other end: the module
that defines B needs to be able to say that if A gets loaded, B must
also get loaded.

--
Pete Becker
Dinkumware, Ltd.
http://www.dinkumware.com
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: phalpern@truffle.ma.ultranet.com (Pablo Halpern)
Date: 1998/07/24
Raw View
Hyman Rosen <hymie@prolifics.com> wrote:

>This is where new linker behavior is needed, but the new behavior is simple.
>We need to define a new kind of symbol, which I call a weak symbol. When
>the linker has an unresolved reference to a symbol, and it finds a
>corresponding weak symbol in a library object file, it loads that file as
>normal. The only difference is that this does not cause the symbol to
>become defined. If the weak symbol is defined in further library object
>files, those objects are loaded as well.
>
>Now, the compiler just needs to generate weak references in the vtable,
>plus allocate a unique symbol for every virtual function, based on the its
>name and the class in which it is first declared. Whenever the compiler
>generates a polymorphic call to a virtual function, it issues a reference
>to the special symbol of that function. Whenever the compiler generates an
>out-of-line implementation of a virtual function, it generates a weak
>symbol definition of the special symbol of that function. That's all that's
>required.

This is a start, but it is not enough.

The minimal condition for requiring the inclusion of a virtual function,
X::f() that is not called statically is:

  a. At least one object of class X is constructed AND
  b. f() is called polymorphically through a pointer or reference
     to X or one of its base classes.

Now, lets look at an example:

struct A            { virtual void f(); };
struct B : public A { virtual void f(); };
struct C : public B { virtual void f(); virtual void h(); };
struct D : public C { virtual void f(); };

void g(B& b) { b.f(); /* polymophic call */ }
int main() { A a; B b; C c; g(c); }

Your solution would generate a special symbol for A::f() that we'll call
A::f_call. The implementations of A::f(), B::f(), C::f() and D::f()
would all define a week resolution of A::f_call. The call to b.f() would
generate a reference to A::f_call, causing A::f(), B::f(), C::f() and
D::f() all to be linked in.

This would prevent C::h() from being liked in, thus saving some code
bloat. Depending on the program, it might save a lot of code bloate.
However, the minimal rules I proposed above say you can do better.  Our
program does not construct an object of type D, so it clearly doesn't
need D::f().  Also, we don't call f() through A or a base class of A, so
we don't need A::f(), either.

It seems our linker would have to be quite a bit smarter. Perhaps,
instead of weak symbol, we could have a compound symbol that represents
a boolean condition. Each portion of a boolean expression would be
represented by a simple symbol. That portion of the expression would
evaulate true if a reference to that symbol exists in the program. Only
if the whole expression evaluates true is the compound symbol linked in.
So, for our example we would have the following compound symbols:

        A::f = compound(A::f_vtbl && A::f_call)
 B::f = compound(B::f_vtbl && (B::f_call || A::f_call))
 C::f = compound(C::f_vtbl && (C::f_call || B::f_call ||
                                      A::f_call))
 D::f = compound(D::f_vtbl && (D::f_call || C::f_call ||
                                      B::f_call || A::f_call))

C::f_vtbl is referenced by C's vtbl.  B::f_call is referenced by the
call to b.f().  Thus C::f() is linked in. B::f() would also by linked in
using this logic. However, A::f() would not be linked in (because there
is no reference to A::f_call), nor would D::f() be linked in (because
there is no reference to D::f_vtbl).

The object format and linker logic could be simplified by assigning each
simple symbol a bit pattern and causing a compound symbol to be linked
only if all of its bits were set by ORing the references to its
component simple symbols together. Thus, all of the f_vtbl symbols could
be associated with the bit pattern, 0x1 and all of the f_call symbols
could be associated with 0xfffffffe.  Only if at least one of the
required f_call and the required f_vtbl reference exist would the result
be 0xffffffff, and the function would be linked:

 B::f = compound(B::f_vtbl | B::f_call | A::f_call)

Finally, it should be noted that my algorithm would cause B::f() to be
linked in, even though it is not really needed. However, determining
that B::f() is not needed would require global flow analysis, and not
just a static set of rules. In a non-trivial program, the amount of
code-bloat saved by doing the global analysis is not likely to be
significant. IMHO, there are certainly better places to put your
optimization dollars.

-------------------------------------------------------------
Pablo Halpern                   phalpern@truffle.ultranet.com

I am self-employed. Therefore, my opinions *do* represent
those of my employer.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]





Author: AllanW@my-dejanews.com
Date: 1998/07/24
Raw View
In article <6p4cc4$u9f$1@nnrp1.dejanews.com>,
  bbrunswick@my-dejanews.com wrote:
> In article <6p063o$8kt$1@shell7.ba.best.com>,  ncm@nospam.cantrip.org (Nathan
> Myers) wrote:
> > Omitting unused virtual functions is an important tool for eliminating
> > some bloat, but weak references by themselves (as Pete likes to remind
> > us) are not sufficient to omit unused virtual functions; that takes a
> > linker extension.
>
> Surely that turns out not to be the case.
>
> Isn't it sufficient for the compiler to generate the vtbl filled with
> weak references, and then each virtual function call to use a symbol to
> get the offset in the vtbl that it should use. Those symbols get defined
> with the virtual function definition, meaning its only pulled in if needed.

Sorry, no, this wouldn't do.

    // Base.H
    class Base {
        int i;
    public:
        virtual const char *name();
        virtual int count();
    };

    // Base.Cpp
    const char *Base::name() { return "Base"; }
    int Base::count() { return ++i; }

The virtual table has pointers to Base::name and Base::count, right?
Let's assume these are weak references.

    // Ref.H
    #include "Base.H"
    void ShowName(Base*);

    // Ref.Cpp
    #include <iostream>
    #include "Ref.H"
    void ShowName(Base *p) {
        std::cout << p->name() << std::endl;
    }

Now we're using Base::name.  Not directly, of course; we don't know
if it will call Base::name, or somethingelse::name (where somethingelse
is some class derived from Base).  So the compiled code looks up p's
vtbl, and uses the address stored there to call a name() function.

    // Main.Cpp
    #include <iostream>
    #include "Ref.H"
    int main() {
        Base b;
        ShowName(&b);
    }

Main creates a Base object.  But it doesn't call name(), so there's no
reason to include it in the link.  All it does is take the address of
the b object and pass it to the ShowName function.  What happens there
is up to that seperately-compiled program.

Now we link all these programs together.  Remember, the vtbl consists
of only weak references to Base::name() and Base::count().  There is
no hard reference, so Base::name() is not included in the executable.
main() calls ShowName(), which calls name() through p's vtbl.  Since
the reference was weak, the value is 0 and we jump to a routine in
ROM which formats the hard disk...

> Pruning facets is also fairly straight forward isn't it, at the cost of some
> bloat in use_facet? We can simply fill the slots in the locale with dummy
> values, and have the appropriate specialisation of use_facet convert these to
> the real functions, so those functions are only pulled in if that use_facet
> call is ever made. I imagine the (inlined) use_facet call maybe checking if
> the pointer is a magic small integer (non-portable, but we are the system
> library) and if so looking the proper value up in a static table. (which
> pulls in the real functions)

So, to avoid bloats, we'll take all the data that should have been in
the facets and put them into static tables, which we can then use to
initialize the facets.  Won't we end up with exactly the same data in
memory twice, once in the facet and the other in the static tables?

> If we do assembly level control of the library, then maybe something could be
> arranged to use weak refs again, as above.
>
> Please tell me my understanding is lacking and that these are actually hard
> problems... its worrying if so much argument is going on over things that can
> be worked around easily.

I'm positive that's the case for the first "solution" you gave.  I think
it may be for the second case as well.

> PS - Apologies if the formatting is mucked up, but deja-news seems to trash it
> if preview is used.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: john@mail.interlog.com (John R MacMillan)
Date: 1998/07/25
Raw View
[ This isn't strictly C++ related, but getting our terminology
  straight is probably important for this discussion. ]

|We need to define a new kind of symbol, which I call a weak symbol.

You may wish to choose a different name, since many people already
use that to mean something else.  For example, an entry in an ELF
symbol table with binding STB_WEAK is usually called a weak symbol.

|the linker has an unresolved reference to a symbol, and it finds a
|corresponding weak symbol in a library object file, it loads that file as
|normal. The only difference is that this does not cause the symbol to
|become defined.

Perhaps such symbols could instead be called `unsatisfying'. :-)
--
To reply by mail, please remove "mail." from my address -- but please
send e-mail or post, not both


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]






Author: bbrunswick@my-dejanews.com
Date: 1998/07/22
Raw View
In article <6p063o$8kt$1@shell7.ba.best.com>,  ncm@nospam.cantrip.org (Nathan
Myers) wrote:
> Omitting unused virtual functions is an important tool for eliminating
> some bloat, but weak references by themselves (as Pete likes to remind
> us) are not sufficient to omit unused virtual functions; that takes a
> linker extension.

Surely that turns out not to be the case.

Isn't it sufficient for the compiler to generate the vtbl filled with
weak references, and then each virtual function call to use a symbol to
get the offset in the vtbl that it should use. Those symbols get defined
with the virtual function definition, meaning its only pulled in if needed.

Pruning facets is also fairly straight forward isn't it, at the cost of some
bloat in use_facet? We can simply fill the slots in the locale with dummy
values, and have the appropriate specialisation of use_facet convert these to
the real functions, so those functions are only pulled in if that use_facet
call is ever made. I imagine the (inlined) use_facet call maybe checking if
the pointer is a magic small integer (non-portable, but we are the system
library) and if so looking the proper value up in a static table. (which
pulls in the real functions)

If we do assembly level control of the library, then maybe something could be
arranged to use weak refs again, as above.

Please tell me my understanding is lacking and that these are actually hard
problems... its worrying if so much argument is going on over things that can
be worked around easily.

PS - Apologies if the formatting is mucked up, but deja-news seems to trash it
if preview is used.

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp   Create Your Own Free Member Forum


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]