Topic: C++, Linkage and stabs
Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/11/28 Raw View
In article 2474@sophia.inria.fr, Theodore Papadopoulo <papadop@sophia.inria.fr> writes:
>... I understand very well too that object's
>implementation may vary form one compiler to an another so that the
>objects (.o) produced by different compilers should be uncompatible. But
>why having chosen to make this incompatibility through different
>mangling schemes... Wouldn't have it be much more easy to say that in
>each (.o) file is introduced an undefined symbol containing an
>identification of the compiler and of the version of the mangling scheme
>(something like __G++_MANGLING_2.3 or __ARM_MANGLING_1.4). The
>definition being provided in the compilation unit providing the main.
>To me, this would have two benefits:
>
>+ Easier to understand the linker messages when mixing object comming
>from two different compilers.
>+ Provided that the stab mechanism gives a good description of the
>objects layout, there would be a better orthogonality between debugger
>and compiler devlopments. This raises the question whether there has
>been any effort made for standardizing the information contained in the
>stabs specifically for C++. To be clear, I do not want a formalization
>of the encoding (this would be great but is out of the scope here) but
>of the minimum information that should be present for a debugger to be
>used transparently (i.e. with no more a priori knowledge of the compiler
>than it needs for a typical C compiler).
What you are asking is completely beyond the scope of the C++ standard.
The standard does not address name mangling at all, because it is not
part of the language, nor is it required. A "smart" object format or
linker would not use name mangling at all. Implementations which
use name mangling typically provide automatic demangling; if your
implemenation does not, complain to the vendor.
The standard does not try to address object file format or contents. There
is no requirement that an identifiable compiler, linker, debugger, or
object file exist, for one thing. An interactive interpreter could be a
valid implementation, for example.
Next, requiring some unique identifier for each compiler (or whatever)
also requires some name repository and an organization to manage the
names and any conflicts that may arise. (For example, Industrial
Boolean Mechanisms might conflict with another company having the
same initials.)
Finally, I don't understand what you mean by a debugger that requires
"no more a priori knowledge of the compiler than it needs for a typical
C compiler".
If a platform has an ABI, compilers and debuggers would be presumed to
comply with the ABI. No other information in the object file would be
required. The ABI would specify everything.
If a platform has no ABI, or at least none which is adhered to, no
simple description will be adequate. Consider Intel x86 platforms,
many of which have no ABI:
type int has 16 or 32 bits
pointers have 16 or 32 bits, or both
integers and pointers are unaligned, have alignment 2, or alignment 4
enums vary in size or are fixed at 2 or 4 bytes
floating-point numbers are IEEE or some special format
caller or callee pops the stack
caller or callee copies a struct passed by value
how are function arguments passed, and in what order?
Plus other variations too numerous to list, and we are still talking only
about C, not C++. I don't see how to cover all the combinations by placing
one or several identifiers in the object file. A standard that attempted
to provide a list would undoubtedly miss some combination used by some compiler.
---
Steve Clamage, stephen.clamage@eng.sun.com
---
[ comp.std.c++ is moderated. Submission address: std-c++@ncar.ucar.edu.
Contact address: std-c++-request@ncar.ucar.edu. The moderation policy
is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]
Author: Theodore Papadopoulo <papadop@sophia.inria.fr>
Date: 1995/11/28 Raw View
As a user of (at least) two different C++ compilers, here are a couple
of questions for which I would be pleased to have answers. My first
concern here is not directly the C++ langage but merely the effects of
the standard choices onto the programming environment and more
specifically onto the debugger.
First, it seems that I have not understood all of the rationale behind
mangling scheme... I understand very well why names have to be encoded
(althougth I do not understand at all why the return type is not a part
of the encodind scheme). I understand very well too that object's
implementation may vary form one compiler to an another so that the
objects (.o) produced by different compilers should be uncompatible. But
why having chosen to make this incompatibility through different
mangling schemes... Wouldn't have it be much more easy to say that in
each (.o) file is introduced an undefined symbol containing an
identification of the compiler and of the version of the mangling scheme
(something like __G++_MANGLING_2.3 or __ARM_MANGLING_1.4). The
definition being provided in the compilation unit providing the main.
To me, this would have two benefits:
+ Easier to understand the linker messages when mixing object comming
from two different compilers.
+ Provided that the stab mechanism gives a good description of the
objects layout, there would be a better orthogonality between debugger
and compiler devlopments. This raises the question whether there has
been any effort made for standardizing the information contained in the
stabs specifically for C++. To be clear, I do not want a formalization
of the encoding (this would be great but is out of the scope here) but
of the minimum information that should be present for a debugger to be
used transparently (i.e. with no more a priori knowledge of the compiler
than it needs for a typical C compiler).
This would (in the long term at least and hopefully) allow an easy use
of vendor's debuggers with g++ (they often offer some goodies that are
not provided with gdb) and gdb for any vendor's compiler (it provides a
uniform interface for debugging across the many different platforms and
that's great). Of course, gdb, g++ can be replaced by any other products
(there are cited because they are free!!).
I'm aware of the fact that these problems preexists to C++, but I
believe that they are emphasized a lot with the current definition of
C++.
At this time, I suppose that nothing like this can be incorporated in
the draft standard, but I would be extremely interested in seeing your
comments.
--
Theodore Papadopoulo (papadop@sophia.inria.fr)
Projet Robotvis, INRIA, France
---
[ comp.std.c++ is moderated. Submission address: std-c++@ncar.ucar.edu.
Contact address: std-c++-request@ncar.ucar.edu. The moderation policy
is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]
Author: fjh@munta.cs.mu.OZ.AU (Fergus Henderson)
Date: 1995/11/28 Raw View
Theodore Papadopoulo <papadop@sophia.inria.fr> writes:
>First, it seems that I have not understood all of the rationale behind
>mangling scheme... I understand very well why names have to be encoded
>(althougth I do not understand at all why the return type is not a part
>of the encodind scheme). I understand very well too that object's
>implementation may vary form one compiler to an another so that the
>objects (.o) produced by different compilers should be uncompatible. But
>why having chosen to make this incompatibility through different
>mangling schemes... Wouldn't have it be much more easy to say that in
>each (.o) file is introduced an undefined symbol containing an
>identification of the compiler and of the version of the mangling scheme
>(something like __G++_MANGLING_2.3 or __ARM_MANGLING_1.4).
One problem I see with this is that it would impose a small space penalty,
since each object file would have to contain space for this symbol.
One word per object file may not sound like much, but there are some
cases where this could be a problem. Sometimes libraries are carefully
divided (either manually or with some tool) so that each object file
contains only one variable or function, so as to minimize the granularity
of the linking process, and to avoid the cascading library effect where
you use one function from an object file, and that object file contains
another function (which you don't use) that needs another object file,
which contains functions that need other object files, etc.
If each object file contains only one variable or function, then
adding one word per object file may be a significant overhead.
With linker support, this can be avoided -- but then again with full
linker support, we could have type-safe linkage without name mangling!
I don't know of any portable way of forcing a link error if a symbol
is not defined without using up space in the object file.
The other problem I see is that it might make changes to the name
mangling scheme more painful. Currently, if a C++ vendor realizes that
for some obscure case they need to change the name mangling (perhaps
because they need to support some new C++ feature), all they need to do
is fix that obscure case and issue a new release. Customers who didn't
use that obscure case don't need to recompile. But if your scheme was
adopted, any change to the name mangling would require all customers to
recompile everything.
--
Fergus Henderson WWW: http://www.cs.mu.oz.au/~fjh
fjh@cs.mu.oz.au PGP: finger fjh@128.250.37.3
I will have little or no net access from Nov 30 until Dec 25,
so please email me a copy of any follow-ups.
---
[ comp.std.c++ is moderated. Submission address: std-c++@ncar.ucar.edu.
Contact address: std-c++-request@ncar.ucar.edu. The moderation policy
is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]
Author: tony@online.tmx.com.au (Tony Cook)
Date: 1995/11/29 Raw View
Herb Sutter (herbs@interlog.com) wrote:
: In article <30BAE46C.2474@sophia.inria.fr>,
: Theodore Papadopoulo <papadop@sophia.inria.fr> wrote:
: >First, it seems that I have not understood all of the rationale behind
: >mangling scheme... I understand very well why names have to be encoded
: >(althougth I do not understand at all why the return type is not a part
: >of the encodind scheme).
: Because the return type is not part of a function's signature; for example,
: you can't overload based on return type:
: int f(int i);
: long f(int i); // error: redefinition, won't overload
: x = f(15); // which f()?
: So there's no point in mangling in the return type, since the mangled name
: will already be unique without it. (Note, however, that const _is_ part of
: the signature.)
There is a point to mangling in the return type: return type-safety.
This would prevent errors of the form:
// A.CPP
#include <iostream>
extern int f();
int main() { cout << f() << endl; return 0 }
// B.CPP
#include <cmath>
float f() { return M_PI; }
(which I think is currently undefined)
In know at least one compiler which mangles variable names too.
Which would prevent the common beginner's error:
// C.CPP
#include <iostream>
extern int *x;
int main() { cout << *x; << endl; return 0; }
// D.CPP
int x[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Mangling in return types (or variable types) isn't necessary to
conform to the DWP, but there is still a good reason to use it.
--
Tony Cook - tony@online.tmx.com.au
100237.3425@compuserve.com
---
[ comp.std.c++ is moderated. Submission address: std-c++@ncar.ucar.edu.
Contact address: std-c++-request@ncar.ucar.edu. The moderation policy
is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]