Topic: Instantiability of STL templates


Author: Kaba <none@here.com>
Date: Fri, 17 Feb 2006 14:39:20 CST
Raw View
In the previous post I did notice that the third-party library problem
is a fundamental and unbeatanle problem for nsm. I have given up on nsm
on solving the controlling of implicit instantiation problem and now
support the N1448.

At the end of the previous post I suggested a hybrid of N1448 and nsm.
That is a wrong suggestion because they nothing to do with each other
anymore.

Nsm is only a way to write a template, with the advantage of separating
the interface and implementation. There are two advantages to this
technique: avoiding code exposition in plug-ins and faster build times.
The latter applies to single module programs also.

So, since the last challenge was 1-0 for you, I'd like to raise another
related issue:

Problem: Reason the one-file template model over the two-file template
model in nsm.

> OK.  Let's stick with the incorrect behaviour caused by data
> replication.  Incorrect behaviour is much more tangible than
> maintenance problems.
>
> > Our problem: Make it possible to avoid code/data replication across
> > modules.

> If the templated code that implements std::set has been changed, then
> presumbably you have upgraded your standard library, in which case you
> would expect to have to recompile *everything* (unless the vendor has
> made a specific guarantee to the contrary).

Assuming only the template implementation code of <set> is changed, then
a simply rebuild of the core library is enough. This applies generally,
not only to stl.

> >  Lets forget the templates for a moment. Assume
> > you are creating the core-library which your plug-ins reference to do
> > things. You choose a finite set of normal classes ("tools") which the
> > plug-ins are to use. Now bring in templates. set<float> is just a normal
> > class, a tool selected to be used in a plugin.
>
> And it's an implementation detail of the plugin.  I don't want
> knowledge that the plugin uses a std::set<float> internally to be
> exposed any more than it needs to be.  In your example on page 6 (quite
> right -- not 7 as I said), you explicitly instantiate std::set<float>
> in application.cpp -- i.e. in the main executable.  This is what I mean
> by unnecessary coupling between the modules.  Under N1448, the solution
> would be to put an "extern template" declaration in the plugin header
> and an explicit instantiation in the plugin -- *not* in the main
> application as you state on page 7.

If you had 100 plugins, and instantiated std::set in every each of them,
then we end up replicating code and get incorrect behaviour? The place
to instantiate the code is in the core-library which the plugins connect
to.

I try to give a concrete example to clear this up:

sound.h
----------

template <typename T>
class Sound
{
public:
    void normalize();
    T data_[100];
};

template <typename T>
void Sound<T>::normalize()
{
// Do your thing..
}

plugin.h
--------

#include "sound.h"
void myFunction(const Sound<int>& a);

plugin.cpp (in module 1)
----------
#include "plugin.h"

extern template class Sound<int>;

void myFunction(const Sound<int>& a)
{
    // Filter the data, do something cool..
    a.normalize();
}

plugintools.cpp (in module 2)
---------------
#include "sound.h"

template class Sound<int>;

application.cpp (in module 2)
---------------
#include "plugin.h"
int main()
{
    Sound<int> a;
    myFunction(a);
    // Do something with a..
    a.normalize();
    return 0;
}

>
> > To quote page 8: "The set
> > of possible template instances is infinite inside a module, but only
> > finite between modules.".
>
> I don't understand this comment.  For many templates, including all the
> STL containers, they have an infinite number of possible instantiations
> (think std::set<int>, std::set<int*>, std::set<int**>,
> std::set<int***>, ...).  And in any piece of code, whether a function,
> class,  translation unit, or separately compiled module, only a finite
> number of these are actually used.  How does this observation help?

You need to instantiate the tools used by the plugins in the core-
library, so that they are not replicated in plugins. When you code
inside a module, you can use whatever instance of a template (infinite
possibilities) because it can be generated compile-time. But when you
code a plug-in, the core-library has already been built. Because code
replication leads to incorrect behaviour, you only have the set of
instances that the core-library delivers. That set must be predefined.
Thus: infinite choices inside module, finite choices between modules.

> Agreed.  When I used to program C++ on Windows I've certainly
> encountered that.  (Of course, one could argue that on a system where
> the linker does handle removing the duplicate symbols, the
> implementation of std::set should be careful not to cause this problem.
>  Perhaps recent versions of MSVC do?  But let's ignore that for now.)

Yes, recent versions of MSVC work nicely with this also, that is, they
do not replicate data.

> I would suggest solving it (per N1448) by adding
>
>   extern template class std::set<int>;
>
> to plugin.h and
>
>   template class std::set<int>;
>
> to plugin.cpp.  How would you see this being solved in "nsm".

Why would you instantiate the code in a plugin?

> > If we were talking about a general class, the decoupling would be the
> > gain.
>
> Decoupling isn't a gain per se.  The gain is what might result from the
> decoupling -- perhaps faster build times, or smaller executable sizes.
> But these have to be balanced against the potential losses -- slower
> code, or larger executable sizes.  (Depending on context, explicit
> instantation may cause larger or smaller executables.  Certainly in the
> case when an instantiation is used in only one translation unit, I
> can't see how it can possibly have a beneficial effect.)

It is easy to imagine faster build times. And there should be no losses
compared to using the one-file approach. This example should give you an
idea:

myclass_decl.h
--------------

#include <set_decl>
#include <vector_decl>

class MyClass
{
public:
    void f();
private:
    set<int> a_;
    vector<float> b_;
};

myclass.h
---------

#include "myclass_decl.h"
#include <set>
#include <vector>

void MyClass::f()
{
    // Do things with a_ and b_
}

application.cpp
---------------
#include "myclass.h"

int main()
{
    MyClass a;
    a.f();
    return 0;
}

By nsm we are able to delay the inclusion of the implementation to where
it is actually used. The more deep the inclusion trees, the more
advantage you get.

> > - N1448 in current STL has the disadvantage of code revealing even when
> > it is not needed
>
> This is perhaps true.  I wonder whether a better solution would be to
> allow "extern template" to be applied to an imcomplete type, though?  I
> can't see any technical reason why this could not  be made to work, and
> it would allow far greater decoupling.

(I am again concentrating on classes for brevity)

Umm.. You do need the interface to use the functions? And the size of
the type to use it as a member inside another types?

The class definition has it all, thus the separation in nsm.

--
Kalle Rutanen
http://kaba.hilvi.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Wil Evers <bouncer@dev.null>
Date: Sat, 18 Feb 2006 10:36:04 CST
Raw View
Kaba wrote:

[snip]

> The idea of dynamic linker doing removal of replicated symbols is new to
> me (first mentioned by Wil Evers in another thread). It is a great idea.
> From my experience with visual studio, I see that unfortunately its
> linker does not do this (I could be missing a switch): it is easy to
> reproduce a situation where global data is replicated and causes
> incorrect behaviour (that is, no removal of duplicates).

I don't think you missed any switch; AFAIK, the Windows DLL loader - which
is an integral part of the OS - has no facilities for removing replicated
symbols.

> This raises an interesting question whether it should be the user's or
> linker's responsibility to avoid code/data replication across modules.

I guess it would be fair to say that the C++ current standard pretty much
requires *the implementation* to merge duplicate implicit template
instantiations.

- Wil

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Kaba <none@here.com>
Date: Sat, 18 Feb 2006 13:09:13 CST
Raw View
> I don't think you missed any switch; AFAIK, the Windows DLL loader - which
> is an integral part of the OS - has no facilities for removing replicated
> symbols.

Hmm.. I think we are near to mix to kinds of linkers here. At least I
am.

1) The linker that comes with your compiler. It resolves the symbols
between translation units in a module. It also resolves static
dependencies between dynamic modules (through .lib files).

2) The linker that is provided by the OS. This actually places the
symbols in the memory space when a dynamic library is loaded.

It seems logical that the compiler linker (I can't make up a better
name..) could remove the duplicate symbols, because it has access to the
objects files (.o), as well as the import libraries (.lib) of the other
modules (.dll). Was this what you meant previously?

> > This raises an interesting question whether it should be the user's or
> > linker's responsibility to avoid code/data replication across modules.
>
> I guess it would be fair to say that the C++ current standard pretty much
> requires *the implementation* to merge duplicate implicit template
> instantiations.

Right. But I have a feeling this is not practically possible. Imagine
two libraries that are developed independently as dlls. Then a user that
used them both would possibly have multiple copies of the same symbols.

Thinking out loud:

Maintaining ODR in compile time is hard with dynamic libraries: it
requires coordination between all the libraries that are used and that
are possibly used in the future. Between these libraries, a definition
must be placed in one and only one place. If you don't have control over
a part of the source code (behaves as a black box), then it is clearly
impossible.

Compiler linker is another possibility to remove duplicate symbols. But
even then it can only remove those that can be seen from .lib files. If
all .lib files are not given, then there may still be duplicate symbols
left.

OS linker is the last place to do all removal of duplicate symbols, and
possibly the first time some dlls meet each other. I don't know much
about the internals of dlls, but I suspect this is hard.

--
Kalle Rutanen
http://kaba.hilvi.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Bob Bell" <belvis@pacbell.net>
Date: Sat, 18 Feb 2006 21:15:31 CST
Raw View
Wil Evers wrote:
> Kaba wrote:
> > This raises an interesting question whether it should be the user's or
> > linker's responsibility to avoid code/data replication across modules.
>
> I guess it would be fair to say that the C++ current standard pretty much
> requires *the implementation* to merge duplicate implicit template
> instantiations.

The current C++ standard doesn't make any requirement whatsoever on
programs using DLLs.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Wil Evers <bouncer@dev.null>
Date: Sat, 18 Feb 2006 21:25:31 CST
Raw View
Kaba wrote:

>> I don't think you missed any switch; AFAIK, the Windows DLL loader -
>> which is an integral part of the OS - has no facilities for removing
>> replicated symbols.
>
> Hmm.. I think we are near to mix to kinds of linkers here. At least I
> am.
>
> 1) The linker that comes with your compiler. It resolves the symbols
> between translation units in a module. It also resolves static
> dependencies between dynamic modules (through .lib files).

> 2) The linker that is provided by the OS. This actually places the
> symbols in the memory space when a dynamic library is loaded.

Right.  This is what I'd call the "dynamic linker" or "DLL loader".

> It seems logical that the compiler linker (I can't make up a better
> name..) could remove the duplicate symbols, because it has access to the
> objects files (.o), as well as the import libraries (.lib) of the other
> modules (.dll). Was this what you meant previously?

No, that's not what I meant; what I meant was that the 'standard' dynamic
linkers on some OSs (in particular, ELF-based systems such as Linux and
Solaris) are capable of merging duplicate implicit template instantiations
at run time, right across shared library boundaries.

>> > This raises an interesting question whether it should be the user's or
>> > linker's responsibility to avoid code/data replication across modules.
>>
>> I guess it would be fair to say that the C++ current standard pretty much
>> requires *the implementation* to merge duplicate implicit template
>> instantiations.
>
> Right. But I have a feeling this is not practically possible.
> Imagine
> two libraries that are developed independently as dlls. Then a user that
> used them both would possibly have multiple copies of the same symbols.
>
> Thinking out loud:
>
> Maintaining ODR in compile time is hard with dynamic libraries: it
> requires coordination between all the libraries that are used and that
> are possibly used in the future. Between these libraries, a definition
> must be placed in one and only one place. If you don't have control over
> a part of the source code (behaves as a black box), then it is clearly
> impossible.

Well, it is certainly possible, because ELF does it.  However, you are right
that this model can (and does) lead to problems when the ODR is violated,
which is why ELF "compiler linkers" have facilities to force internal
linking and/or hide symbols from other modules.  This is often a practical
solution when ODR violations are to be expected.

On the other hand, the semantics of such modules are clearly not covered by
the C++ standard: ODR violations lead to undefined behavior.

- Wil

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: hyrosen@mail.com (Hyman Rosen)
Date: Sun, 19 Feb 2006 16:16:09 GMT
Raw View
Bob Bell wrote:
> The current C++ standard doesn't make any requirement whatsoever on
> programs using DLLs.

Why do you say that? If an implementation claims to be conforming
while supporting DLLs, then the program must behave as the standard
specifies. In fact, I would think that 3.6.2/3 is there precisely
to support DLLs.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: bouncer@dev.null (Wil Evers)
Date: Sun, 19 Feb 2006 22:50:50 GMT
Raw View
Bob Bell wrote:

> Wil Evers wrote:
>> Kaba wrote:
>> > This raises an interesting question whether it should be the user's or
>> > linker's responsibility to avoid code/data replication across modules.
>>
>> I guess it would be fair to say that the C++ current standard pretty much
>> requires *the implementation* to merge duplicate implicit template
>> instantiations.
>
> The current C++ standard doesn't make any requirement whatsoever on
> programs using DLLs.

The C++ standard does not distinguish between static and dynamic linking at
all; it does, however, place requirements on conforming implementations.

Some implementations are only conforming when static linking is used; others
allow me to use dynamic linking without breaking conformance.

- Wil

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: none@here.com (Kaba)
Date: Sun, 19 Feb 2006 22:51:20 GMT
Raw View
> No, that's not what I meant; what I meant was that the 'standard' dynamic
> linkers on some OSs (in particular, ELF-based systems such as Linux and
> Solaris) are capable of merging duplicate implicit template instantiations
> at run time, right across shared library boundaries.

Well that's interesting. That should get rid of most problems. However,
I wonder if we are able to construct an example where this gives wrong
results? That is, how are the symbols recognized as being same or
different?
For example, what if in two different shared libraries there is in both
a class with the same name and the same members, but different
implementations?

If the os linker can do such a removal job, then clearly the compiler
linker could do part of this job, no?
As an input you give:
- the shared library A that the duplicate symbols are removed from
- the set of shared libraries B that the A is compared to.
As an output you get:
- the shared library A' with duplicate symbols removed w.r.t B.

--
Kalle Rutanen
http://kaba.hilvi.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kuyper@wizard.net
Date: Sun, 19 Feb 2006 16:53:30 CST
Raw View
Bob Bell wrote:
> Wil Evers wrote:
> > Kaba wrote:
> > > This raises an interesting question whether it should be the user's or
> > > linker's responsibility to avoid code/data replication across modules.
> >
> > I guess it would be fair to say that the C++ current standard pretty much
> > requires *the implementation* to merge duplicate implicit template
> > instantiations.
>
> The current C++ standard doesn't make any requirement whatsoever on
> programs using DLLs.

The standard doesn't say anything about DLLs specifically, but every
requirement that it does contain applies equally well to any C++
program that uses DLLs without doing something that has undefined
behaviour. When the conditions for the ODR are met, it implies that
duplicate implicit template instantiations have to be merged, at least
if the program contains any code that could detect the failure to merge
them.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Bob Bell" <belvis@pacbell.net>
Date: Sun, 19 Feb 2006 19:21:33 CST
Raw View
Hyman Rosen wrote:
> Bob Bell wrote:
> > The current C++ standard doesn't make any requirement whatsoever on
> > programs using DLLs.
>
> Why do you say that? If an implementation claims to be conforming
> while supporting DLLs, then the program must behave as the standard
> specifies. In fact, I would think that 3.6.2/3 is there precisely
> to support DLLs.

You're right, it's possible to create a standard-conforming
implementation with DLLs; all I meant was that the standard has nothing
to say one way or the other about DLLs, in the same way that it doesn't
have anything to say about multiple threads. I could be wrong, but I
don't think the standard has a lot to say about the OP's problem.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Hyman Rosen <hyrosen@mail.com>
Date: Mon, 20 Feb 2006 01:49:00 CST
Raw View
Kaba wrote:
> For example, what if in two different shared libraries there is in both
> a class with the same name and the same members, but different
> implementations?

Then incorporating both into a single program violates the
one-definition rule, and the program has undefined behavior.
The implementation may choose to try to do something sensible,
but the standard imposes no requirements here.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Hyman Rosen <hyrosen@mail.com>
Date: Mon, 20 Feb 2006 01:48:52 CST
Raw View
Bob Bell wrote:
> the standard has nothing to say one way or the other about DLLs,
 > in the same way that it doesn't have anything to say about multiple
 > threads.

I don't think the two are comparable. Multithreading is a different
program execution model - if I were to write
     void f() { extern int i; i = 1;
                while (i == 1) { }
                printf("not reached"); }
then standard C++ would be perfectly justified in eliminating the
test in the while loop, reasoning that it could never become false.
That is obviously not the case once multithreading is permitted.

On the other hand, DLLs are simply an implementation detail of
constructing a runnable program out of compilation units.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Bob Bell" <belvis@pacbell.net>
Date: Mon, 20 Feb 2006 10:13:10 CST
Raw View
Hyman Rosen wrote:
> Kaba wrote:
> > For example, what if in two different shared libraries there is in both
> > a class with the same name and the same members, but different
> > implementations?
>
> Then incorporating both into a single program violates the
> one-definition rule, and the program has undefined behavior.
> The implementation may choose to try to do something sensible,
> but the standard imposes no requirements here.

Exactly the point I was trying to make. I couldn't have said it better
myself. In fact, I didn't. ;-)

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Bob Bell" <belvis@pacbell.net>
Date: Mon, 20 Feb 2006 11:22:50 CST
Raw View
Hyman Rosen wrote:
> Bob Bell wrote:
> > the standard has nothing to say one way or the other about DLLs,
>  > in the same way that it doesn't have anything to say about multiple
>  > threads.
>
> I don't think the two are comparable. Multithreading is a different
> program execution model - if I were to write
>      void f() { extern int i; i = 1;
>                 while (i == 1) { }
>                 printf("not reached"); }
> then standard C++ would be perfectly justified in eliminating the
> test in the while loop, reasoning that it could never become false.
> That is obviously not the case once multithreading is permitted.
>
> On the other hand, DLLs are simply an implementation detail of
> constructing a runnable program out of compilation units.

But they allow things that are outside the scope of the standard.
Plugins, for example, are usually implemented using DLLs that all
provide a function with a common name. Programs that do that are
officially exhibiting undefined behavior. As far as I could tell, it
was this kind of use case that led to this thread -- at least, the
problems with code/data replication being discussed are just as
relevant in a plugin-based architecture. I thought it was worth
pointing out that the standard doesn't have a lot to say about whether
a (compile-time or run-time) linking phase can remove duplicate
code/data definitions -- especially when, in a plugin architecture,
there are duplicates that you _don't_ want removed.

Bob

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: bouncer@dev.null (Wil Evers)
Date: Tue, 21 Feb 2006 03:07:51 GMT
Raw View
Bob Bell wrote:

> Hyman Rosen wrote:

>> Kaba wrote:
>> > For example, what if in two different shared libraries there is in both
>> > a class with the same name and the same members, but different
>> > implementations?
>>
>> Then incorporating both into a single program violates the
>> one-definition rule, and the program has undefined behavior.
>> The implementation may choose to try to do something sensible,
>> but the standard imposes no requirements here.
>
> Exactly the point I was trying to make. I couldn't have said it better
> myself. In fact, I didn't. ;-)

Right.  But that does not imply that the standard imposes no requirements on
programs that don't violate the ODR, but do use shared libraries/DLLs.

- Wil

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: none@here.com (Kaba)
Date: Tue, 14 Feb 2006 21:43:13 GMT
Raw View
By the standard, are all templates of STL explicitly instantiable with
types that conform to its concepts?

Consider:

template <typename T>
class set
{
public:
    int size()
    {
        return 5;
    }
private:
    void prohibitExplicitInstantiation()
    {
        T a;
        a.odd();
    }
};

This is an example of a template that is only explicitly instantiable
with types T that implement function odd. Although this is not required
when implicitly instantiating (if it is not called), it is required when
explicitly instantiating.

This is a question related to the support for dynamic libraries:
if templates of STL are not explicitly instantiable, then it is not
possible to use STL templates between modules in any other way than by
code/data replication.

--
Kalle Rutanen
http://kaba.hilvi.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: richard@ex-parrot.com
Date: Thu, 16 Feb 2006 00:01:56 CST
Raw View
Kaba wrote:
> By the standard, are all templates of STL explicitly instantiable with
> types that conform to its concepts?

No.  Explicitly instantiating or explicitly specialising standard
library templates can only be done if "the declaration depends on a
user-defined name" [17.4.3.1/1].  The rationale being that the library
may, perhaps for reasons of efficiency, choose to provide an optimized
specialisation for certain types and it is illegal to have both an
explicit specialisaton and an explicit instantiation [14.7/5].  It is
also possible that the library containing the standard library might
itself contain an explicit instantiation of that template.  (While most
systems allow separated shared libraries to have duplicate explicit
instantiations, this is not guaranteed, and if you choose to link
statically against the standard library, you very likely will have
problems.)

(I can't actually find any language in the Standard suggesting it is
ever legal to explicitly instantiate an STL container; however this is
something that is done in real-world code and I can't imagine a vendor
disallowing any form of explicit instantiation of an STL template.)

> Consider:
>
> template <typename T>
> class set
> {
>     void prohibitExplicitInstantiation()
>     {
>         T a;
>         a.odd();
>     }
> };
>
> This is an example of a template that is only explicitly instantiable
> with types T that implement function odd. Although this is not required
> when implicitly instantiating (if it is not called), it is required when
> explicitly instantiating.

That is a potential problem.  The seen this occur in two related
situations.  The first was with std::vector.  This does not require
that its value_type be EqualityComparable, however one Standard Library
I have used chooses to implement operator== on the vector by calling
down to a member function of the vector.  This resulted in only vectors
of EqualityComparable types being explicitly instantiable.  The second
occasion was with std::list which has a sort() member function that
requires LessComparability of its value_type.

> This is a question related to the support for dynamic libraries:
> if templates of STL are not explicitly instantiable, then it is not
> possible to use STL templates between modules in any other way than by
> code/data replication.

On many systems, many of the potential problems of having duplicate
copies of symbols go away by having the dynamic linker remove all but
one copy of the symbol.  If this does not happen, or if, as a result of
dynamically opening new libraries, the address of the symbol moves, it
can cause many problems.  (This issue of addresses moving is often
overlooked by people using GCC on Linux and thinking that these
problems only affect systems using the Windows DLL model.)

You're right that there are problems with the interaction of templates
(whether STL or otherwise)  and dynamically loaded code (whether in
Windows DLLs or Linux DSOs).  And with today's technology explicit
instantiation and N1448-style "extern template" declarations (and,
obviously, well designed interfaces to dynamically loaded modules) seem
the most plausible way of solving them.  Picking up on your comments in
other threads on this newsgroup, however, I don't see any merit in
splitting STL headers into interface (.h) / implementation (.hpp) pairs
as a way of solving these issues.  That is likely to just inconvenience
people in the 99% of cases when implicit instantiation is desireable.
Far better to have a mechanism (    la N1448) for suppressing it when
*that* is desired.

Richard Smith


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: none@here.com (Kaba)
Date: Thu, 16 Feb 2006 17:10:57 GMT
Raw View
> No.  Explicitly instantiating or explicitly specialising standard
> library templates can only be done if "the declaration depends on a
> user-defined name" [17.4.3.1/1].  The rationale being that the library
> may, perhaps for reasons of efficiency, choose to provide an optimized
> specialisation for certain types and it is illegal to have both an
> explicit specialisaton and an explicit instantiation [14.7/5].  It is
> also possible that the library containing the standard library might
> itself contain an explicit instantiation of that template.  (While most
> systems allow separated shared libraries to have duplicate explicit
> instantiations, this is not guaranteed, and if you choose to link
> statically against the standard library, you very likely will have
> problems.)

Thanks for a comprehensive answer.=20

> (I can't actually find any language in the Standard suggesting it is
> ever legal to explicitly instantiate an STL container; however this is
> something that is done in real-world code and I can't imagine a vendor
> disallowing any form of explicit instantiation of an STL template.)

Indeed, taking for example visual studio, nowadays the templates seem to=20
be explicitly instantiable and do not contain global variables, such=20
that they can be used with dlls. I recall that a few versions ago vs had=20
an implementation of map/set (using rb-tree) which had a global sentinel=20
node which resulted in problems (through data replication) between=20
modules.

> On many systems, many of the potential problems of having duplicate
> copies of symbols go away by having the dynamic linker remove all but
> one copy of the symbol. =20

The idea of dynamic linker doing removal of replicated symbols is new to=20
me (first mentioned by Wil Evers in another thread). It is a great idea.=20
>From my experience with visual studio, I see that unfortunately its=20
linker does not do this (I could be missing a switch): it is easy to=20
reproduce a situation where global data is replicated and causes=20
incorrect behaviour (that is, no removal of duplicates).

This raises an interesting question whether it should be the user's or=20
linker's responsibility to avoid code/data replication across modules.

> If this does not happen, or if, as a result of
> dynamically opening new libraries, the address of the symbol moves, it
> can cause many problems.  (This issue of addresses moving is often
> overlooked by people using GCC on Linux and thinking that these
> problems only affect systems using the Windows DLL model.)

I do not understand what you mean by the address of a symbol moving.=20
Could you clarify on that?

> You're right that there are problems with the interaction of templates
> (whether STL or otherwise)  and dynamically loaded code (whether in
> Windows DLLs or Linux DSOs).  And with today's technology explicit
> instantiation and N1448-style "extern template" declarations (and,
> obviously, well designed interfaces to dynamically loaded modules) seem
> the most plausible way of solving them.  Picking up on your comments in
> other threads on this newsgroup, however, I don't see any merit in
> splitting STL headers into interface (.h) / implementation (.hpp) pairs
> as a way of solving these issues.  That is likely to just inconvenience
> people in the 99% of cases when implicit instantiation is desireable.
> Far better to have a mechanism (=E0 la N1448) for suppressing it when
> *that* is desired.

Why inconvenience?=20

If you want, you can always use the file which contains both the=20
interface and implementation (ie <set>). It just gives you an option to=20
use an interface-only file (ie <set_decl> in plug-ins). It does not=20
change any working code to non-working.

I'd like to challenge this newsgroup to objectively reason the current=20
model versus this new model. If we reach preference over the current=20
model, then we have again confirmed that it is the optimal one, and the=20
discussion has been an educational one. If we reach preference over the=20
new model, then we have made progress.

In my article in=20

http://kaba.hilvi.org/project/abstraction/nsm.pdf

I point out that the split into file pairs and the N1448 are two=20
different ways to solve the same problem. This directly leads to the=20
question: is there any preference to another over the other?

For additional information, I wrote a short summary of all dynamic=20
library problems I could think of in:

http://kaba.hilvi.org/project/abstraction/stldll.pdf

Here is the reasoning that my brains follow:
(nsm =3D natural separation model)

a) Redundancy

N1448: Template code must be revealed, if instantiated or not
nsm: Template code must be revealed, if instantiated

For a plugin system, if the code is not to be generated in the plug-in,=20
the template code is redundant. The creator of the system would of=20
course want to keep source code revealing minimal.

b) Logicality

N1448: If template code should not be instantiated, its use is=20
prohibited.
nsm: If template code should not be instantiated, it is not given.

c) Changes

N1448: No library changes. Language changes: one additional keyword. All=20
previous code still works.=20
nsm: Additional header files that do not contain implementation. No=20
language changes. All previous code still works (for example, <set>=20
still includes both class definitions and implementations).

d) Change of view point

N1448: No change.
nsm: In general, it is good practice to separate implementation from=20
interface. The new view point is that this is also good with templates=20
(and that export is not the right way to achieve this).

e) Ideality

The STL should clearly, implicitly or explicitly, promote the ideal=20
technique.
N1448, nsm: Answer this based on your judgement from the answers of the=20
previous points.

--=20
Kalle Rutanen
http://kaba.hilvi.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Richard Smith" <richard@ex-parrot.com>
Date: Thu, 16 Feb 2006 14:01:15 CST
Raw View
Kaba wrote:
> > On many systems, many of the potential problems of having duplicate
> > copies of symbols go away by having the dynamic linker remove all but
> > one copy of the symbol.
>
> The idea of dynamic linker doing removal of replicated symbols is new to
> me (first mentioned by Wil Evers in another thread). It is a great idea.

It's standard practice in most modern UNIX environments for the dynamic
linker to commonize references to symbols.  (This isn't actually the
same as removing them -- I was being a little sloppy with my
terminology there.)  It means that you can guarantee that there will
only be one copy of static data across all dynamic libraries.

> From my experience with visual studio, I see that unfortunately its
> linker does not do this (I could be missing a switch): it is easy to
> reproduce a situation where global data is replicated and causes
> incorrect behaviour (that is, no removal of duplicates).

The Windows DLL model is a little different to the one more usually
used in UNIX environments.
With GCC on Linux every symbol behaves as if it were dllexported.
Because of this, having duplicate dllexported symbols (using Windows
terminology) is much more common than with Windows DLLs, so it much
more important that the dynamic linker is able to handle this sanely.
In principle it ought to be possible for the linker to physically
remove the duplicate code, but I've not seen any mainstream linkers do
this.

> This raises an interesting question whether it should be the user's or
> linker's responsibility to avoid code/data replication across modules.

Presumably this is a detail of how dynamic libraries work on your
system.  On a typical UNIX platform, it's the dynamic linker; from my
recollection of working with Windows DLLs in the past, I guess it's the
developer's responsibility on that platform.

> > If this does not happen, or if, as a result of
> > dynamically opening new libraries, the address of the symbol moves, it
> > can cause many problems.  (This issue of addresses moving is often
> > overlooked by people using GCC on Linux and thinking that these
> > problems only affect systems using the Windows DLL model.)
>
> I do not understand what you mean by the address of a symbol moving.
> Could you clarify on that?

Sure.  It's not fundamentally any different from the issues that are
familiar to those who use Windows DLLs -- but it's an example that
applies to UNIX-style DSOs too.  The problem is if you have a symbol
(e.g. template instantiation, inline function, type info structure or
vtable) defined in several shared libraries, the address of that symbol
can be different in different parts of the code if the libraries are
loaded at different times -- effectively multiple copies of the symbol
exist.  For people familiar with Windows DLLs, this is entirely
expected, but with the UNIX-style DSOs this is really not supposed to
happen.  Arguably this is simply a bug in the commonly-available UNIX
tools, though the way the Linux C++ ABI is specified, this is the
defined behaviour.

Effectively what I'm trying to demonstrate is that the problematic
interactions between dynamic libraries and templates do still exist
with the UNIX-model of DSOs, they're just much less frequently
encountered than with Windows DLLs.

I won't post an example as it would necessarily be quite long.  If
you're interested in a test case from a real-world example, I posted
one to the GCC mailing list a while ago:

  http://gcc.gnu.org/ml/gcc/2004-10/msg01118/weaksyms.tar.gz

> > I don't see any merit in
> > splitting STL headers into interface (.h) / implementation (.hpp) pairs
> > as a way of solving these issues.  That is likely to just inconvenience
> > people in the 99% of cases when implicit instantiation is desireable.
> > Far better to have a mechanism (    la N1448) for suppressing it when
> > *that* is desired.
>
> Why inconvenience?
>
> If you want, you can always use the file which contains both the
> interface and implementation (ie <set>). It just gives you an option to
> use an interface-only file (ie <set_decl> in plug-ins). It does not
> change any working code to non-working.

Remembering to put #include <set_decl> rather than #include <set> in
the plug-in's header is fine.  The problem is that every source file
that includes the plug-in's header must also only include <set_decl>,
otherwise all the internal declarations become visible and available
again.  And likewise with every header (including those of third-party
libraries) included by every source file that includes the plug-in's
header.  You end up requiring the whole program to shift from #include
<set> to #include <set_decl> just in order to suppress the implicit
instantiation of a few templates.

The alternative presented in N1448, and widely supported by current
compilers, is to put a few "extern template" declarations in the
plug-in's header and then the rest of the project can remain unchanged
the same.  It has exactly the same effect, but it only requires
modifying the piece of code that requires that functionality.

> I'd like to challenge this newsgroup to objectively reason the current
> model versus this new model.

OK.  Can we first agree, though, what *exactly* the particular problem
is that you are trying to solve?  In your article,

> http://kaba.hilvi.org/project/abstraction/nsm.pdf

you simply state that the duplicate symbols "create a maintenace
problem".  I don't see how this is the case.  You state in your
introduction that one aim of dynamic libraries is to allow the
implementations of libraries to change without requiring other
libraries to be rebuilt.  Assuming the public headers for the library
(and headers included by them) remain unchanged, then I can't see why
the presence of #include <set> (for example) should change what needs
to be rebuilt.

In particular, in example on page 7 where plugin.cpp only includes
<set_decl>, I can't see what is gained by doing this.  In fact, to me
it seems you lose out by doing it -- you have now increased the
coupling between application.cpp and plugin.cpp as application.cpp now
needs to know that the plugin uses a std::set<float> in its
implementation.

Perhaps it would help if you could post an example where the current
standard behaviour (i.e. doing #include <set> where appropriate)
actually causes a problem?

> a) Redundancy
>
> N1448: Template code must be revealed, if instantiated or not
> nsm: Template code must be revealed, if instantiated

No.  Under N1448, the template function implementations *must* be
available if implicit instantiation is required, any may (though need
not) be available if implicit instantiation is not required.  As I
understand your "nsm" proposal, you are requiring that template
function implementations *must* be kept hidden if you want to avoid the
implicit instantiation.

How then do you cope with the following (using "nsm"):

  // plugin.h
  #include <set_decl>
  std::set<int> foo(); // Part of the public interface

  // plugin.cc
  #include "third-party-header.h"
  #include "plugin.h"
  std::set<int> foo() { return std::set<int>(); }

What happens if "third-party-header.h" is modified to #include <set>?
Your solution is just too brittle.

> For a plugin system, if the code is not to be generated in the plug-in,
> the template code is redundant. The creator of the system would of
> course want to keep source code revealing minimal.

Sure; decoupling source code is good up to a point.  But sticking to
the current example, what is gained by removing the implementation
details of std::set from the plug-in's header?

In some situations it may reduce code bloat by removing duplicate
copies of the symbol -- though in others, it will add to code bloat by
instantiating all the rarely-used member function.  It may result in
faster compiles because of reduced depenedencies, but with more
compilers supporting pre-compiled headers, this benefit is likely to be
small.

> b) Logicality
>
> N1448: If template code should not be instantiated, its use is
> prohibited.
> nsm: If template code should not be instantiated, it is not given.

Again, no.  In N1448 it doesn't matter whether it is provided: in your
solution it *must* not be provided.

> c) Changes
>
> N1448: No library changes. Language changes: one additional keyword. All
> previous code still works.

No new keyword. Both "extern" and "template" are already keywords.  It
just requires legalising a new combination of them.  And it should be
noted that most mainstream compilers already support this syntax --
it's just a case of standardising the status quo.

> nsm: Additional header files that do not contain implementation.

Headers that existing compilers do not currently have, whereas they do
usually support N1448.

Finally, it's worth remember that most of the time neither N1448 nor
your "nsm" is desireable and the current behaviour is spot on.  Most
code is happy allowing the compiler to implicitly instantiate
templates, and this is unlikely to change.

--
Richard Smith


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: none@here.com (Kaba)
Date: Fri, 17 Feb 2006 01:21:12 GMT
Raw View
> > This raises an interesting question whether it should be the user's or
> > linker's responsibility to avoid code/data replication across modules.
>
> Presumably this is a detail of how dynamic libraries work on your
> system.  On a typical UNIX platform, it's the dynamic linker; from my
> recollection of working with Windows DLLs in the past, I guess it's the
> developer's responsibility on that platform.

I actually meant what the standardized solution should be..

> I won't post an example as it would necessarily be quite long.  If
> you're interested in a test case from a real-world example, I posted
> one to the GCC mailing list a while ago:
>
>   http://gcc.gnu.org/ml/gcc/2004-10/msg01118/weaksyms.tar.gz

I'll take a look at it.

> Remembering to put #include <set_decl> rather than #include <set> in
> the plug-in's header is fine.  The problem is that every source file
> that includes the plug-in's header must also only include <set_decl>,
> otherwise all the internal declarations become visible and available
> again.  And likewise with every header (including those of third-party
> libraries) included by every source file that includes the plug-in's
> header.  You end up requiring the whole program to shift from #include
> <set> to #include <set_decl> just in order to suppress the implicit
> instantiation of a few templates.

I do not understand. If a source file includes the plug-in header, which
includes <set_decl> it still can include <set> if it wants to. <set>
includes <set_decl>, but the include guards take care that <set_decl>
doesn't define the class twice (note I have left the guards off in the
examples for brevity).

I'll discuss third-party libraries later below.

Regarding the spread of "_decl" files: assume "_decl" file is defined as
a file which does not contain any function/object definitions. If we
want to maintain this property, then each "_decl" file must only include
"_decl" files. The spread is only inside the set of "_decl" files.

Anyway, nothing more is required than what is required currently. We are
adding possibilities, not removing them.

> > I'd like to challenge this newsgroup to objectively reason the current
> > model versus this new model.
>
> OK.  Can we first agree, though, what *exactly* the particular problem
> is that you are trying to solve?  In your article,

Larger problem: make it possible to use STL templates across modules
portably and with correct behaviour

We will only concentrate on the correct behaviour. Incorrect behaviour
is caused by the data replication and code replication causes
maintenance problems:

Our problem: Make it possible to avoid code/data replication across
modules.

> you simply state that the duplicate symbols "create a maintenace
> problem".  I don't see how this is the case.  You state in your
> introduction that one aim of dynamic libraries is to allow the
> implementations of libraries to change without requiring other
> libraries to be rebuilt.  Assuming the public headers for the library
> (and headers included by them) remain unchanged, then I can't see why
> the presence of #include <set> (for example) should change what needs
> to be rebuilt.

If implementation is given, then it is possibly replicated. Assume only
code (no data) is replicated across plug-ins. Everything works right. If
the template that generated the code is changed, then the plug-ins would
have to be rebuilt. That is the maintenance problem. You'd want to place
the code in a core-library and let the plug-ins reference it. To do this
you would need either N1448 or nsm. Then changing the implementation
needs only a rebuild of the core-library (interface change is another
story).

> In particular, in example on page 7 where plugin.cpp only includes
> <set_decl>, I can't see what is gained by doing this.  In fact, to me
> it seems you lose out by doing it -- you have now increased the
> coupling between application.cpp and plugin.cpp as application.cpp now
> needs to know that the plugin uses a std::set<float> in its
> implementation.

You propably mean page 6. Lets forget the templates for a moment. Assume
you are creating the core-library which your plug-ins reference to do
things. You choose a finite set of normal classes ("tools") which the
plug-ins are to use. Now bring in templates. set<float> is just a normal
class, a tool selected to be used in a plugin. To quote page 8: "The set
of possible template instances is infinite inside a module, but only
finite between modules.". The coupling is always there. To remove it
means replicating code.

> Perhaps it would help if you could post an example where the current
> standard behaviour (i.e. doing #include <set> where appropriate)
> actually causes a problem?

plugin.h
--------

#include <set>
void myFunction(const set<int>& a);

plugin.cpp (in module 1)
----------

#include "plugin.h"
#include <set>

void myFunction(const set<int>& a)
{
// Do things with a...
}

application.cpp (in module 2)
---------------

#include "plugin.h"

int main()
{
set<int> a;
// Do things with a...
myFunction(a);
// Do things with a...
return 0;
}

If the set contains class variables and the dynamic linker does not do
the removal of the set<int> function/class variable replicates in module
1, then this example very propably causes a crash or at least incorrect
behaviour.

The problems caused by this were sometime ago a common question in
comp.os.ms-windows.programmer.win32 (incorrect behaviour and crashes
while "nothing is done wrong").

> > a) Redundancy
> >
> > N1448: Template code must be revealed, if instantiated or not
> > nsm: Template code must be revealed, if instantiated
>
> No.  Under N1448, the template function implementations *must* be
> available if implicit instantiation is required, any may (though need
> not) be available if implicit instantiation is not required.

If under N1448 you do not have the implementation, then that is nsm and
the "extern" has no meaning. "extern" has only meaning when the
implementation is present.

> As I
> understand your "nsm" proposal, you are requiring that template
> function implementations *must* be kept hidden if you want to avoid the
> implicit instantiation.

Yes.

> How then do you cope with the following (using "nsm"):
>
>   // plugin.h
>   #include <set_decl>
>   std::set<int> foo(); // Part of the public interface
>
>   // plugin.cc
>   #include "third-party-header.h"
>   #include "plugin.h"
>   std::set<int> foo() { return std::set<int>(); }
>
> What happens if "third-party-header.h" is modified to #include <set>?
> Your solution is just too brittle.

Well, you have just found a problem in nsm:) I have no answer to this.

> Sure; decoupling source code is good up to a point.  But sticking to
> the current example, what is gained by removing the implementation
> details of std::set from the plug-in's header?

If we were talking about a general class, the decoupling would be the
gain. But, because STL comes with every compiler, the implementation is
public anyway, so there is no gain. However, STL is a library which
shows people how things can be accomplished, and it should promote good
ways to do things.

> In some situations it may reduce code bloat by removing duplicate
> copies of the symbol -- though in others, it will add to code bloat by
> instantiating all the rarely-used member function.  It may result in
> faster compiles because of reduced depenedencies, but with more
> compilers supporting pre-compiled headers, this benefit is likely to be
> small.

The explicit instantiation is necessary to place the code in the core-
library. The duplicate removal is important, because correct programs
are not in general possible without it. These are not specific issues to
nsm, but to any solution.

> > c) Changes
> >
> > N1448: No library changes. Language changes: one additional keyword. All
> > previous code still works.
>
> No new keyword. Both "extern" and "template" are already keywords.  It
> just requires legalising a new combination of them.  And it should be
> noted that most mainstream compilers already support this syntax --
> it's just a case of standardising the status quo.

Oops, I don't know where the "additional keyword" came from. I meant an
language extension.

> Finally, it's worth remember that most of the time neither N1448 nor
> your "nsm" is desireable and the current behaviour is spot on.  Most
> code is happy allowing the compiler to implicitly instantiate
> templates, and this is unlikely to change.

Agreed.

Now that you have trashed the nsm idea in itself, how about a hybrid of
N1448 and nsm?

- N1448 in current STL has the disadvantage of code revealing even when
it is not needed
- This revealing is not a concern in STL since the implementation is
revealed anyway.
- Systems that use plug-ins do not want to reveal template code
=> they actually need to use a hybrid of N1448 (to avoid replication)
and nsm (to avoid code revealing)
- STL promotes good ways to write code.
=> The example of STL leads to think that revealing the code is
mandatory.

Note the advantages of nsm:
* "_decl" files act as the next lightweight step up of the forward
declarations. You must have noticed it seems a pity that you can't
forward declare STL classes, when only for example a pointer is used.
* clear separation of interface and implementation

(By the way, I wrote this post two times and spent 2.5 hours on it. I
accidentally cancelled the first post after 1 hour of writing. This is
not healthy anymore:) )

--
Kalle Rutanen
http://kaba.hilvi.org

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: dave@boost-consulting.com (David Abrahams)
Date: Fri, 17 Feb 2006 06:57:27 GMT
Raw View
richard@ex-parrot.com writes:

> (I can't actually find any language in the Standard suggesting it is
> ever legal to explicitly instantiate an STL container; however this is
> something that is done in real-world code and I can't imagine a vendor
> disallowing any form of explicit instantiation of an STL template.)

Oh, that's easy.  Explicitly instantiate basic_string<char> and you're
likely to get a mess of link errors because the vendor already does
that in the standard library.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Richard Smith" <richard@ex-parrot.com>
Date: Fri, 17 Feb 2006 09:41:47 CST
Raw View
Kaba wrote, quoting me:
> > Remembering to put #include <set_decl> rather than #include <set> in
> > the plug-in's header is fine.  The problem is that every source file
> > that includes the plug-in's header must also only include <set_decl>,
> > otherwise all the internal declarations become visible and available
> > again.  [...]
>
> I do not understand. If a source file includes the plug-in header, which
> includes <set_decl> it still can include <set> if it wants to. <set>
> includes <set_decl>, but the include guards take care that <set_decl>
> doesn't define the class twice (note I have left the guards off in the
> examples for brevity).

Concrete example.

  // plugin.h
  #include <set_decl>

  class plugin {
  public:
    void foo() {}
  private:
    std::set<int> s;
  };

Now presumably we've had to include <set_decl> because we *need* to
prevent implicit instantiation, perhaps because getting duplicate
copies of the symbols from set::set<int> will lead to incorrect
behaviour in the model of dynamic libraries that our system uses.  If
so, this causes a real problem.  Imagine client code that does the
following (perhaps with #includes occuring indirectly).

  #include <set>
  #include "plugin.h"

  // other stuff...

Suddenly we get implicit instantiation of std::set<int>, which, we are
assuming, leads to some form of incorrect behaviour.  In this case we
can get around it by reordering the headers, but this will not always
be possible.

Obviously it would be nice if the model of dynamic libraries that gets
standardised simply does the right thing with duplicate symbols.  I do
wonder, though, whether it will always be possible to "simply" do the
right thing.

> > OK.  Can we first agree, though, what *exactly* the particular problem
> > is that you are trying to solve?
>
> Larger problem: make it possible to use STL templates across modules
> portably and with correct behaviour
>
> We will only concentrate on the correct behaviour. Incorrect behaviour
> is caused by the data replication and code replication causes
> maintenance problems:

OK.  Let's stick with the incorrect behaviour caused by data
replication.  Incorrect behaviour is much more tangible than
maintenance problems.

> Our problem: Make it possible to avoid code/data replication across
> modules.
>
> > you simply state that the duplicate symbols "create a maintenace
> > problem".  I don't see how this is the case.  You state in your
> > introduction that one aim of dynamic libraries is to allow the
> > implementations of libraries to change without requiring other
> > libraries to be rebuilt.  Assuming the public headers for the library
> > (and headers included by them) remain unchanged, then I can't see why
> > the presence of #include <set> (for example) should change what needs
> > to be rebuilt.
>
> If implementation is given, then it is possibly replicated. Assume only
> code (no data) is replicated across plug-ins. Everything works right. If
> the template that generated the code is changed, then the plug-ins would
> have to be rebuilt. That is the maintenance problem. You'd want to place
> the code in a core-library and let the plug-ins reference it. To do this
> you would need either N1448 or nsm.

If the templated code that implements std::set has been changed, then
presumbably you have upgraded your standard library, in which case you
would expect to have to recompile *everything* (unless the vendor has
made a specific guarantee to the contrary).

>  Lets forget the templates for a moment. Assume
> you are creating the core-library which your plug-ins reference to do
> things. You choose a finite set of normal classes ("tools") which the
> plug-ins are to use. Now bring in templates. set<float> is just a normal
> class, a tool selected to be used in a plugin.

And it's an implementation detail of the plugin.  I don't want
knowledge that the plugin uses a std::set<float> internally to be
exposed any more than it needs to be.  In your example on page 6 (quite
right -- not 7 as I said), you explicitly instantiate std::set<float>
in application.cpp -- i.e. in the main executable.  This is what I mean
by unnecessary coupling between the modules.  Under N1448, the solution
would be to put an "extern template" declaration in the plugin header
and an explicit instantiation in the plugin -- *not* in the main
application as you state on page 7.

> To quote page 8: "The set
> of possible template instances is infinite inside a module, but only
> finite between modules.".

I don't understand this comment.  For many templates, including all the
STL containers, they have an infinite number of possible instantiations
(think std::set<int>, std::set<int*>, std::set<int**>,
std::set<int***>, ...).  And in any piece of code, whether a function,
class,  translation unit, or separately compiled module, only a finite
number of these are actually used.  How does this observation help?

> > Perhaps it would help if you could post an example where the current
> > standard behaviour (i.e. doing #include <set> where appropriate)
> > actually causes a problem?
>
> plugin.h
> --------
>
> #include <set>
> void myFunction(const set<int>& a);
>
> plugin.cpp (in module 1)
> ----------
>
> #include "plugin.h"
> #include <set>
>
> void myFunction(const set<int>& a)
> {
> // Do things with a...
> }
>
> application.cpp (in module 2)
> ---------------
>
> #include "plugin.h"
>
> int main()
> {
> set<int> a;
> // Do things with a...
> myFunction(a);
> // Do things with a...
> return 0;
> }
>
> If the set contains class variables and the dynamic linker does not do
> the removal of the set<int> function/class variable replicates in module
> 1, then this example very propably causes a crash or at least incorrect
> behaviour.

Agreed.  When I used to program C++ on Windows I've certainly
encountered that.  (Of course, one could argue that on a system where
the linker does handle removing the duplicate symbols, the
implementation of std::set should be careful not to cause this problem.
 Perhaps recent versions of MSVC do?  But let's ignore that for now.)

I would suggest solving it (per N1448) by adding

  extern template class std::set<int>;

to plugin.h and

  template class std::set<int>;

to plugin.cpp.  How would you see this being solved in "nsm".

> > > a) Redundancy
> > >
> > > N1448: Template code must be revealed, if instantiated or not
> > > nsm: Template code must be revealed, if instantiated
> >
> > No.  Under N1448, the template function implementations *must* be
> > available if implicit instantiation is required, any may (though need
> > not) be available if implicit instantiation is not required.
>
> If under N1448 you do not have the implementation, then that is nsm and
> the "extern" has no meaning. "extern" has only meaning when the
> implementation is present.

Sure.  (Well, actually, the whole "extern template" declaration has no
meaning, not just the "extern", but I'm sure that's what you meant.)
But how does that contradict what I said?  If you wish to use implicit
instantiation the implementation (obviously) must be present; if you
want to prevent implicit instantiation, it does not matter whether the
implementation is present.  This is very important as it is quite
likely that you will want a mixture of implicit and explicit-only
instantiation in the same translation unit.

Suppose, in your example code, that both application.cpp and plugin.cpp
both (entirely independently) happen to use std::set<float>.  That
shouldn't cause a problem.  In the Windows DLL model, the two modules
each have their own copy of the symbols associated with std::set<float>
and everything simply works.  In the Linux DSO model, all the symbols
are commonized and, again, everything simply works.  The problem is
restricted to set::set<int> which is passed across the interface to
both modules.

Under N1448, the "extern template" declaration for std::set<int>
handles this correctly; under "nsm", as I understand it, you would be
forced to move to explicit instantation of std::set<float> too.  If
that is correct, I simply don't see it as an acceptable solution.  I
absolutely do not want to have to migrate to explicit instantation of
all std::set types simply because one instantation of std::set requires
it.  As a solution, i simply does not scale.

> >   // plugin.cc
> >   #include "third-party-header.h"
> >   #include "plugin.h"
> >   std::set<int> foo() { return std::set<int>(); }
> >
> > What happens if "third-party-header.h" is modified to #include <set>?
> > Your solution is just too brittle.
>
> Well, you have just found a problem in nsm:) I have no answer to this.

Frankly, I think it's a pretty fundamental problem.  It requires "nsm"
to be used pervasively if at all.  This means that low level libraries
either must use "nsm", in which case they force "nsm" on to client
code; or they don't use "nsm", in which case you mustn't either.  The
effect is to fragment C++ into two incompatible dialects.  I just can't
see it happening.

> > Sure; decoupling source code is good up to a point.  But sticking to
> > the current example, what is gained by removing the implementation
> > details of std::set from the plug-in's header?
>
> If we were talking about a general class, the decoupling would be the
> gain.

Decoupling isn't a gain per se.  The gain is what might result from the
decoupling -- perhaps faster build times, or smaller executable sizes.
But these have to be balanced against the potential losses -- slower
code, or larger executable sizes.  (Depending on context, explicit
instantation may cause larger or smaller executables.  Certainly in the
case when an instantiation is used in only one translation unit, I
can't see how it can possibly have a beneficial effect.)

> Now that you have trashed the nsm idea in itself, how about a hybrid of
> N1448 and nsm?
>
> - N1448 in current STL has the disadvantage of code revealing even when
> it is not needed

This is perhaps true.  I wonder whether a better solution would be to
allow "extern template" to be applied to an imcomplete type, though?  I
can't see any technical reason why this could not  be made to work, and
it would allow far greater decoupling.  Add to this an <stlfwd> header
that forward declares the STL container types (something that could be
useful anyway as you are not permitted to forward declare them
yourself), and this would give you far greater decoupling, yes?

--
Richard Smith

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "Richard Smith" <richard@ex-parrot.com>
Date: Fri, 17 Feb 2006 09:40:53 CST
Raw View
David Abrahams wrote:
> richard@ex-parrot.com writes:
>
> > (I can't actually find any language in the Standard suggesting it is
> > ever legal to explicitly instantiate an STL container; however this is
> > something that is done in real-world code and I can't imagine a vendor
> > disallowing any form of explicit instantiation of an STL template.)
>
> Oh, that's easy.  Explicitly instantiate basic_string<char> and you're
> likely to get a mess of link errors because the vendor already does
> that in the standard library.

Sorry, I meant that I couldn't see anything that allowed you to
explicitly instantiate in *any* circumstance. Clearly when the type
does not depend on a type under the user's control, you could get
problems.  But am I, for example, allowed to do the following:

  template class std::vector<my_ns::my_type>;

It's the sort of thing that many people do, but I can't see anything in
the Standard that explicitly permits it.

--
Richard Smith

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]