Thread

Topic: underscores in names reserved. Why?

Author: kuyper@wizard.net (James Kuyper)
Date: Wed, 4 Feb 2004 18:49:05 +0000 (UTC) Raw View

barmar@alum.mit.edu (Barry Margolin) wrote in message news:<barmar-99FC55.17552603022004@netnews.comcast.net>...
> In article <slrnc207bv.1og.do-not-spam-benh@tin.bwsint.com>,
>  do-not-spam-benh@bwsint.com (Ben Hutchings) wrote:
>
> > The additional prohibition in C++ of embedded double-underscores is, I
> > think, due to the use of double-underscore as a separator between a
> > function's name and the representation of its parameter types in
> > mangled names in CFront and other implementations that followed the
> > description of mangling in the ARM.
>
> But since names are being mangled, couldn't the double underscore simply
> be mangled into something else (triple underscore, perhaps), so that
> there's no conflict?

Under that scheme, function__and(_argtype) and function_(and__argtype)
 both get mangled to function___and___argtype. A name mangling scheme
can't guarantee unique results for each pair of inputs if required to
work on arbitrary inputs. It must impose restrictions on the
user-defined identifiers that don't apply to the mangled identifiers.
That's precisely what the current standard does.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: qrczak@knm.org.pl ("Marcin 'Qrczak' Kowalczyk")
Date: Wed, 4 Feb 2004 20:11:56 +0000 (UTC) Raw View

On Wed, 04 Feb 2004 18:49:05 +0000, James Kuyper wrote:

>> But since names are being mangled, couldn't the double underscore simply
>> be mangled into something else (triple underscore, perhaps), so that
>> there's no conflict?
>
> Under that scheme, function__and(_argtype) and function_(and__argtype)
>  both get mangled to function___and___argtype. A name mangling scheme
> can't guarantee unique results for each pair of inputs if required to
> work on arbitrary inputs.

Of course it can. For example:
- change _ in names to _u
- use _ followed by other characters to encode things like argument types

--
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net (James Kuyper)
Date: Thu, 5 Feb 2004 00:34:36 +0000 (UTC) Raw View

barmar@alum.mit.edu (Barry Margolin) wrote in message news:<barmar-99FC55.17552603022004@netnews.comcast.net>...
> In article <slrnc207bv.1og.do-not-spam-benh@tin.bwsint.com>,
>  do-not-spam-benh@bwsint.com (Ben Hutchings) wrote:
>
> > The additional prohibition in C++ of embedded double-underscores is, I
> > think, due to the use of double-underscore as a separator between a
> > function's name and the representation of its parameter types in
> > mangled names in CFront and other implementations that followed the
> > description of mangling in the ARM.
>
> But since names are being mangled, couldn't the double underscore simply
> be mangled into something else (triple underscore, perhaps), so that
> there's no conflict?

Under that scheme, function__and(_argtype) and function_(and__argtype)
 both get mangled to function___and___argtype. A name mangling scheme
can't guarantee unique results for each pair of inputs if required to
work on arbitrary inputs. It must impose restrictions on the
user-defined identifiers that don't apply to the mangled identifiers.
That's precisely what the current standard does.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: cpdaniel_remove_this_and_nospam@mvps.org.nospam ("Carl Daniel")
Date: Thu, 5 Feb 2004 02:19:06 +0000 (UTC) Raw View

Barry Margolin wrote:
> In article <slrnc207bv.1og.do-not-spam-benh@tin.bwsint.com>,
>  do-not-spam-benh@bwsint.com (Ben Hutchings) wrote:
>
>> The additional prohibition in C++ of embedded double-underscores is,
>> I think, due to the use of double-underscore as a separator between a
>> function's name and the representation of its parameter types in
>> mangled names in CFront and other implementations that followed the
>> description of mangling in the ARM.
>
> But since names are being mangled, couldn't the double underscore
> simply be mangled into something else (triple underscore, perhaps),
> so that there's no conflict?

How would you then handle

extern "C" void foo__bar();

?

-cd

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: duggar@mit.edu (Keith H Duggar)
Date: Thu, 5 Feb 2004 02:19:44 +0000 (UTC) Raw View

> were no such things as compiler generators (like lex and yacc), they
> designed the syntax to be easy to parse.  One choice they made was to
> recognize token types by their first character:
>
> digit = number
> alpha = identifier
> punctuation = operator

Oh yeah, thanks for the reminder. I remember now the fortran parsing
such as variables that begin with "i" are integers etc.

> Although modern compiler technology doesn't require this crutch, it has
> apparently never been considered a significant enough inconvenience to
> prompt the language designers to change it.

I suppose so. Though it seems like such an easy change.

> A few languages bucked that trend.  For instance, Lisp allows symbols to
> start with any character that isn't a delimiter (I'm simplifying it a
> bit), and supports escaping to allow any character to be anywhere in a
> symbol name.

True. Lisp does have some nice points. That is one of them.

> Numeric literals are allowed to have letters on the end of them or in
> the middle, and it would be dangerous to say that identifiers can begin
> with a digit just so long as they can't be parsed as a numeric literal.
> It would also prevent extensions to the format of numeric literals such
> as the introduction of hexadecimal floating point in C99.

Well, Ben has a good point there. Identifiers such as :

0xBEEF.FACE

Could run into trouble. But there are so many ways to handle that.
Especially considering there was already escape code syntax why not
just:

\xBEEF.FACE

Why have two different syntaxes for essentially the same purpose.
Escape syntax worked for strings it would work for literals.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jm@bourguet.org (Jean-Marc Bourguet)
Date: Thu, 5 Feb 2004 18:30:11 +0000 (UTC) Raw View

kuyper@wizard.net (James Kuyper) writes:

>  A name mangling scheme can't guarantee unique results for each pair
> of inputs if required to work on arbitrary inputs.

It's quite easy: put what has to be mangled in a canonical textual
representation and encode this representation in hexadecimal.  Now you
can obviously think about more space-efficient way and the issue of
things having no easy textual canonical representation because they
need a typedef to be defined.

What it not possible is to provide a mangling scheme which does not
clash with extern "C" name if these names are not mangled themselves
(there where OS where C names where mangled by adding an underscore at
the start of the name) and there isn't any implementation specific way
to use names not allowed by C.

Yours,

--
Jean-Marc

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: david.thompson1@worldnet.att.net (Dave Thompson)
Date: Mon, 9 Feb 2004 04:36:41 +0000 (UTC) Raw View

On Tue, 3 Feb 2004 22:30:24 +0000 (UTC), barmar@alum.mit.edu (Barry
Margolin) wrote:

> In article <b47de02.0402012055.4fae736c@posting.google.com>,
>  duggar@mit.edu (Keith H Duggar) wrote:
> > <snip> why an identifier can't begin with a
> > number such as 2PI or 4PIo3.
>
> That's an arbitrary restriction that most languages have inherited from
> the early days of Algol, Fortran, and PL/I.  Since the compilers were
> typically written in assembly language in the 50's and 60's, and there
> were no such things as compiler generators (like lex and yacc), they
> designed the syntax to be easy to parse.  One choice they made was to
> recognize token types by their first character:
>
> digit = number
> alpha = identifier
> punctuation = operator
>
Except classic Fortran *doesn't* break lexemes at whitespace so there
is no reliably easy "first character" to look at. Even today lexing
"fixed-form" Fortran is hard for the usual tools. And COBOL, which I
believe predates everything but Fortran, does allow identifiers
beginning with digit, and containing (but not beginning or ending
with) hyphen which is also negation and subtraction.

I would speculate (without evidence) that Fortran required letter
because the names of variables in science and engineering, its target
uses, invariably(!) are or at least start with Roman or Greek letters,
and of course computer equipment of the day didn't have Greek letters
or even lowercase Roman. And everybody else just did the same because
that was (apparently) working fine and there was no real reason to
change, except adding real lowercase (as C did) because it's kewl.

> A few languages bucked that trend.  For instance, Lisp allows symbols to
> start with any character that isn't a delimiter (I'm simplifying it a
> bit), and supports escaping to allow any character to be anywhere in a
> symbol name.

And FORTH, somewhat later but I believe before C, allows all graphics;
the *only* delimiter is whitespace -- normally; there are a few
exceptions, and you can add your own parsing extensions/overrides
which can of course do anything (practically) computable.

- David.Thompson1 at worldnet.att.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: duggar@mit.edu (Keith H Duggar)
Date: Sun, 1 Feb 2004 11:48:12 +0000 (UTC) Raw View

The standard reserves certain underscore name patterns. Why is this
necessary? Consider the following from Rolf Magnus which will not
compile in g++ 3.3.2 :

int _Z4testv;

void test() {};

int main()
{
     test();
}

/tmp/cckaLQYy.s: Assembler messages:
/tmp/cckaLQYy.s:13: Error: symbol `_Z4testv' is already defined

Because of the mangling of the test function which is allowed by the
std. Well, why? Name mangling is used to avoid name conflicts
(overloading, etc) so why should it be allowed to create conflicts?

If g++ or any other compiler simply mangled ALL names including
variable names in a consistent manner wouldn't that eliminate the need
for reserved name patterns completely? Thus freeing the entire
namespace and allowing users to create names as they wish using any of
the valid characters.

By the way, I have a special request. Please do not use #define macros
as counter examples or explanations for why reserved names are
necessary. I consider macros quite lame and I'm not really interested
in how macros can destroy almost any idea or code that one can think
of.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: m@remove.this.part.rtij.nl (Martijn Lievaart)
Date: Sun, 1 Feb 2004 19:02:49 +0000 (UTC) Raw View

On Sun, 01 Feb 2004 11:48:12 +0000, Keith H Duggar wrote:

> If g++ or any other compiler simply mangled ALL names including
> variable names in a consistent manner wouldn't that eliminate the need
> for reserved name patterns completely? Thus freeing the entire
> namespace and allowing users to create names as they wish using any of
> the valid characters.

The standard headers might introduce names that are ment for internal use.
These could be hidden in namespace std, but a using namespace std would
bring back the problem.

This is solvable by implementors, f.i like this:

--- in some standard header

// allow $'s in var names
#pragma system
extern int i$;
#pragma no-system

But the gain I see is marginal compared to the actually quite simple rules
we have now.

HTH,
M4

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net (James Kuyper)
Date: Sun, 1 Feb 2004 19:03:40 +0000 (UTC) Raw View

duggar@mit.edu (Keith H Duggar) wrote in message news:<b47de02.0401311906.309cac94@posting.google.com>...
> The standard reserves certain underscore name patterns. Why is this
> necessary? Consider the following from Rolf Magnus which will not

For most of the underscore patterns where the underscore is the first
character, then it's because the implementators need to have names
that they can use freely, without interfering with user code.

> compile in g++ 3.3.2 :
>
> int _Z4testv;
>
> void test() {};
>
> int main()
> {
>      test();
> }
>
> /tmp/cckaLQYy.s: Assembler messages:
> /tmp/cckaLQYy.s:13: Error: symbol `_Z4testv' is already defined
>
> Because of the mangling of the test function which is allowed by the
> std. Well, why? Name mangling is used to avoid name conflicts
> (overloading, etc) so why should it be allowed to create conflicts?

Because name mangling is generally done by taking the names defined by
user code, and merging them together with characters seperating the
distinct parts (namespace, class, function name, argument type list).
As a simplistic case, consider Myclass::add(int) and
Myclass::add(double). These member functions could be implementated by
code that the linker sees as ordinary funcions with names like like
Myclass__add__int and Myclass__add__double. Now, if you were able to
freely insert underscores, what would happen if you linked such code
to a module which happened to define ordinary functions named
Myclass__add__int? There are many ways this problem could have been
dealt with. Specifying that names with double underscores in them are
reserved is a simple approach, that works, and it was the one chosen
by the committee.

> If g++ or any other compiler simply mangled ALL names including
> variable names in a consistent manner wouldn't that eliminate the need
> for reserved name patterns completely? Thus freeing the entire

No matter what algorithm you use, if it's legal for a user provided
name to matches the mangled name of something else, you're heading for
trouble. There are many ways of dealing with this trouble, some more
complicated than others, but it must be dealt with.

> By the way, I have a special request. Please do not use #define macros
> as counter examples or explanations for why reserved names are
> necessary. I consider macros quite lame and I'm not really interested
> in how macros can destroy almost any idea or code that one can think
> of.

You might not be interested, but the people who are responsible for
the standard are. Until the day that C++ can shed all of it's need for
backwards compatibility, both with C and with current versions of C++,
it will need to handle the issues raised by macros in a reasonable
fashion. When that day comes, it probably won't be C++ anymore,
regardless of what name it goes by.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jackklein@spamcop.net (Jack Klein)
Date: Sun, 1 Feb 2004 23:09:00 +0000 (UTC) Raw View

On Sun, 1 Feb 2004 11:48:12 +0000 (UTC), duggar@mit.edu (Keith H
Duggar) wrote in comp.std.c++:

> The standard reserves certain underscore name patterns. Why is this
> necessary? Consider the following from Rolf Magnus which will not
> compile in g++ 3.3.2 :

The very first reason is that from day one, the C++ language has gone
to a great deal of effort to maintain linkage compatibility with C
code, and in fact inherits many of these reserved identifier patterns
from C.  The only really new one C++ added is the prohibition against
consecutive underscores anywhere within an identifier, which is only
reserved in C at the beginning of the identifier.  I have heard, but
do not know for a fact, that this is because the original cfront C++
preprocessor generated symbols using that pattern.

> int _Z4testv;
>
> void test() {};
>
> int main()
> {
>      test();
> }
>
> /tmp/cckaLQYy.s: Assembler messages:
> /tmp/cckaLQYy.s:13: Error: symbol `_Z4testv' is already defined
>
> Because of the mangling of the test function which is allowed by the
> std. Well, why? Name mangling is used to avoid name conflicts
> (overloading, etc) so why should it be allowed to create conflicts?

The C++ standard neither defines nor requires "name mangling".  True,
a C++ compiler must have some method of specifying different functions
that have the same name in the source code, but in fact your own
example shows one of the reasons quite clearly.

Very often parts of the tool chain that make up the complete C++
implementation often include tools that are not primarily, or at all
C++ oriented.  GNU's as, in particular, was designed to compile the
output of gcc before g++ existed.  It is the job of the C++ compiler
to fit itself into the existing tool chain if the implementors do not
want to have to replace all the tools in the chain.

> If g++ or any other compiler simply mangled ALL names including
> variable names in a consistent manner wouldn't that eliminate the need
> for reserved name patterns completely? Thus freeing the entire
> namespace and allowing users to create names as they wish using any of
> the valid characters.

Ah, but what happens if your code started with:

extern "C" {
    int _Z4testv;
}

?

And then you would need to special case the compiler to not modify
names for system functions on platforms where system call linkage is
done by name.

> By the way, I have a special request. Please do not use #define macros
> as counter examples or explanations for why reserved names are
> necessary. I consider macros quite lame and I'm not really interested
> in how macros can destroy almost any idea or code that one can think
> of.

That's an unusual request, since the C++ standard requires a certain
amount of macro use.

Unless you can provide a concrete example of where this provision of
the standard actually harms the language or reduces the ability to
write correct C++ programs, don't expect to see it change.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Mon, 2 Feb 2004 00:39:48 +0000 (UTC) Raw View

In article <b47de02.0401311906.309cac94@posting.google.com>, Keith H
Duggar <duggar@mit.edu> writes
>If g++ or any other compiler simply mangled ALL names including
>variable names in a consistent manner wouldn't that eliminate the need
>for reserved name patterns completely?


And exactly how do you propose that implementors do that? I can think of
no name mangling rules that would avoid clashes between implementation
defined names (used internally by implementations of the Standard
Library) and user declared names other than by a scheme that was
originally introduced by C. C++ also needs a mechanism to handle
overloading as well as type-safe linkage. The simplest mechanism is the
one we currently have and it is easy to teach. Never use a double
underscore + certain other requirements that are effectively those that
we would have to abide by for C compatibility.

Though stronger than strictly necessary I apply a simple rule of 'Never
start an identifier with an underscore'.  It is the others such as
identifiers starting with 'str' that are more often ignored.

--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net (James Kuyper)
Date: Mon, 2 Feb 2004 17:45:17 +0000 (UTC) Raw View

m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message news:<pan.2004.02.01.16.58.18.802930@remove.this.part.rtij.nl>...
> On Sun, 01 Feb 2004 11:48:12 +0000, Keith H Duggar wrote:
>
> > If g++ or any other compiler simply mangled ALL names including
> > variable names in a consistent manner wouldn't that eliminate the need
> > for reserved name patterns completely? Thus freeing the entire
> > namespace and allowing users to create names as they wish using any of
> > the valid characters.
>
> The standard headers might introduce names that are ment for internal use.
> These could be hidden in namespace std, but a using namespace std would
> bring back the problem.
>
> This is solvable by implementors, f.i like this:
>
> --- in some standard header
>
> // allow $'s in var names
> #pragma system
> extern int i$;
> #pragma no-system
>
> But the gain I see is marginal compared to the actually quite simple rules
> we have now.

'$' isn't part of the C character set. That's because it's not
available in all ASCII variants.

No matter which character or sting of characters you use, it has
basically the same effect.. You're taking a subset of the names that
can be given to the linker, and saying that users must never use one
of the names in that subset.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: m@remove.this.part.rtij.nl (Martijn Lievaart)
Date: Mon, 2 Feb 2004 21:03:51 +0000 (UTC) Raw View

On Mon, 02 Feb 2004 17:45:17 +0000, James Kuyper wrote:

> m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message
> news:<pan.2004.02.01.16.58.18.802930@remove.this.part.rtij.nl>...
>> On Sun, 01 Feb 2004 11:48:12 +0000, Keith H Duggar wrote:
>>
>> > If g++ or any other compiler simply mangled ALL names including
>> > variable names in a consistent manner wouldn't that eliminate the
>> > need for reserved name patterns completely? Thus freeing the entire
>> > namespace and allowing users to create names as they wish using any
>> > of the valid characters.
>>
>> The standard headers might introduce names that are ment for internal
>> use. These could be hidden in namespace std, but a using namespace std
>> would bring back the problem.
>>
>> This is solvable by implementors, f.i like this:
>>
>> --- in some standard header
>>
>> // allow $'s in var names
>> #pragma system
>> extern int i$;
>> #pragma no-system
>>
>> But the gain I see is marginal compared to the actually quite simple
>> rules we have now.
>
> '$' isn't part of the C character set. That's because it's not available
> in all ASCII variants.

Ah, but an implementor knows the target, so there is no problem there.

> No matter which character or sting of characters you use, it has
> basically the same effect.. You're taking a subset of the names that can
> be given to the linker, and saying that users must never use one of the
> names in that subset.

But the linker is part of the implementation. So the implementor can
create the linker such that it can accept more characters than the C++
standard allows for C++ names, so the results of name-mangling and symbols
created as an implementation artifact don't have to conflict with names
the user can create. They use these 'extra' characters, I just used '$' as
an example.

(IIRC, older MS/Intel linkers allowed '$' in variable names, that was why
I choose that, but I may be wrong there).

Of course, in practise the standard allows C compatible linkers, that is
why we have the rules as they are now. Although the problem posed is
solvable by implementors, imho there is not enough gain to consider it.

M4

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: llewelly.at@xmission.dot.com (llewelly)
Date: Tue, 3 Feb 2004 01:17:11 +0000 (UTC) Raw View

m@remove.this.part.rtij.nl (Martijn Lievaart) writes:

> On Mon, 02 Feb 2004 17:45:17 +0000, James Kuyper wrote:
>
> > m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message
> > news:<pan.2004.02.01.16.58.18.802930@remove.this.part.rtij.nl>...
> >> On Sun, 01 Feb 2004 11:48:12 +0000, Keith H Duggar wrote:
> >>
> >> > If g++ or any other compiler simply mangled ALL names including
> >> > variable names in a consistent manner wouldn't that eliminate the
> >> > need for reserved name patterns completely? Thus freeing the entire
> >> > namespace and allowing users to create names as they wish using any
> >> > of the valid characters.
> >>
> >> The standard headers might introduce names that are ment for internal
> >> use. These could be hidden in namespace std, but a using namespace std
> >> would bring back the problem.
> >>
> >> This is solvable by implementors, f.i like this:
> >>
> >> --- in some standard header
> >>
> >> // allow $'s in var names
> >> #pragma system
> >> extern int i$;
> >> #pragma no-system
> >>
> >> But the gain I see is marginal compared to the actually quite simple
> >> rules we have now.
> >
> > '$' isn't part of the C character set. That's because it's not available
> > in all ASCII variants.
>
> Ah, but an implementor knows the target, so there is no problem there.
>
> > No matter which character or sting of characters you use, it has
> > basically the same effect.. You're taking a subset of the names that can
> > be given to the linker, and saying that users must never use one of the
> > names in that subset.
>
> But the linker is part of the implementation.

Present and past C++ implementations are full of examples which
    relied on exising assemblers and linkers - Cfront, EDG, KAI, and
    gcc (on some platforms) all come to mind. Each of these avoided
    re-implementing assembly and linking for many platforms, greatly
    increasing their user base and that of C++ as a
    whole. Traditional (at least on unix) seperation of the compiler,
    linker, and assembler made this approach convienient, or
    necessary, depending viewpoint and circumstance.

> So the implementor can
> create the linker such that it can accept more characters than the C++
> standard allows for C++ names, so the results of name-mangling and symbols
> created as an implementation artifact don't have to conflict with names
> the user can create. They use these 'extra' characters, I just used '$' as
> an example.
>
> (IIRC, older MS/Intel linkers allowed '$' in variable names, that was why
> I choose that, but I may be wrong there).

I don't know about M$/Intel linkers, but plenty of older unix linkers
    and compilers allowed, and still allow, $ in identifiers;
    traditional C allowed it on many platforms, it was ISO C90 that
    outlawed it, on the grounds that there were platforms where the $
    character was unavailable, or unsuitable.

> Of course, in practise the standard allows C compatible linkers, that is
> why we have the rules as they are now. Although the problem posed is
> solvable by implementors, imho there is not enough gain to consider it.
[snip]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: m@remove.this.part.rtij.nl (Martijn Lievaart)
Date: Tue, 3 Feb 2004 18:28:12 +0000 (UTC) Raw View

On Tue, 03 Feb 2004 01:17:11 +0000, llewelly wrote:

>> But the linker is part of the implementation.
>
> Present and past C++ implementations are full of examples which
>     relied on exising assemblers and linkers - Cfront, EDG, KAI, and
>     gcc (on some platforms) all come to mind. Each of these avoided
>     re-implementing assembly and linking for many platforms, greatly
>     increasing their user base and that of C++ as a
>     whole. Traditional (at least on unix) seperation of the compiler,
>     linker, and assembler made this approach convienient, or
>     necessary, depending viewpoint and circumstance.

Another way of stating my second point, the gain isn't worth it, it would
outlaw implementations like this. Still the first point was, it can be
done.

Together this gives: It can be done, but at great expense. The gains
aren't worth that.

(I have used the technique I described in iterpreters, it makes dealing
with these issues trivial.)

OTOH, if I understand correctly linkers have to have some (C++) smarts
anyhow to allow exported templates, so modifying the compilers/linkers to
allow the above would not be such a great pain as it seems at first. It
still isn't worth the gain imho.

M4

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net (James Kuyper)
Date: Tue, 3 Feb 2004 18:28:13 +0000 (UTC) Raw View

m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message news:<pan.2004.02.02.20.45.51.466226@remove.this.part.rtij.nl>...
> On Mon, 02 Feb 2004 17:45:17 +0000, James Kuyper wrote:
>
> > m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message
> > news:<pan.2004.02.01.16.58.18.802930@remove.this.part.rtij.nl>...
> >> On Sun, 01 Feb 2004 11:48:12 +0000, Keith H Duggar wrote:
..
> > No matter which character or sting of characters you use, it has
> > basically the same effect.. You're taking a subset of the names that can
> > be given to the linker, and saying that users must never use one of the
> > names in that subset.
>
> But the linker is part of the implementation. So the implementor can

Yes, but as a practical matter it may not be part of the C++ compiler.
It might be a seperate linker that is written by a vendor who has no
interest in modifying the linker to make things easier for C++
compilers. I'll grant you, the "export" keyword is (almost?)
impossible to implement correctly without a C++-aware linker; but that
is part of the reason why "export" is not yet widely supported.

> create the linker such that it can accept more characters than the C++
> standard allows for C++ names, so the results of name-mangling and symbols
> created as an implementation artifact don't have to conflict with names
> the user can create. They use these 'extra' characters, I just used '$' as
> an example.

At the time the standard was written, some of the most widely used C++
compilers "compiled" C++ into C code, and then passed that C code on
to whatever C compiler was available. That's still a common approach
today. Many features of C++ reflect this history. In particular,
whatever identifiers such a C++ compiler uses must fit portable
standard C naming conventions. Therefore, if the C++ compiler is to be
free to invent identifiers not provided by user code, then those
identifiers must be legal C identifiers that are not legal C++
identifiers.

You might think that all such a C++ compiler would need is the set of
identifiers which are already reserved in C, but those names might be
already in use by the host implementation of C. Even if they weren't,
membership in that set is determined mainly the initial letters of the
name. Without use of an internal seperator that is illegal for use in
user code, it's hard to come up with a reasonable name mangling scheme
that distinguishes Myclass::member() from Myclassm::ember() (except by
moving the problem to some other part of the identifier). The rule
about double underscores was heavily influenced by a desire to allow
implementations of that kind. I think that's an entirely appropriate.
You might consider it to be a matter of unnecessarily clinging to the
past. There are other ways that it could have been done, but I know of
none that are radically better than that one. Do you?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Tue, 3 Feb 2004 21:24:32 +0000 (UTC) Raw View

"Martijn Lievaart" <m@remove.this.part.rtij.nl> wrote in message
news:pan.2004.02.02.20.45.51.466226@remove.this.part.rtij.nl...

> > '$' isn't part of the C character set. That's because it's not available
> > in all ASCII variants.
>
> Ah, but an implementor knows the target, so there is no problem there.

There's a problem if the implementor knows that a particular target
supports no such extension.

> > No matter which character or sting of characters you use, it has
> > basically the same effect.. You're taking a subset of the names that can
> > be given to the linker, and saying that users must never use one of the
> > names in that subset.
>
> But the linker is part of the implementation. So the implementor can
> create the linker such that it can accept more characters than the C++
> standard allows for C++ names, so the results of name-mangling and symbols
> created as an implementation artifact don't have to conflict with names
> the user can create. They use these 'extra' characters, I just used '$' as
> an example.

You're working on the quaint assumption that the "implementor" is some
monolith who controls all aspects of an implementation. The world is now
full of third-party vendors who have little such control. Even relative
monoliths like the major software/hardware vendors often have little
control over linkers that may be shared across multiple languages.

> (IIRC, older MS/Intel linkers allowed '$' in variable names, that was why
> I choose that, but I may be wrong there).
>
> Of course, in practise the standard allows C compatible linkers, that is
> why we have the rules as they are now.

The problem is seldom caused by C. It's other languages, or internal
political divisions, that limit most options.

>                                       Although the problem posed is
> solvable by implementors, imho there is not enough gain to consider it.

I'll agree with the conclusion, if not all the premises.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: duggar@mit.edu (Keith H Duggar)
Date: Tue, 3 Feb 2004 21:26:05 +0000 (UTC) Raw View

> That's an unusual request, since the C++ standard requires a certain
> amount of macro use.

It does? How interesting. Can you point me to an example of where the
standard requires macro use? Thanks in advance.

> Unless you can provide a concrete example of where this provision of
> the standard actually harms the language or reduces the ability to
> write correct C++ programs, don't expect to see it change.

I didn't mean to imply that it was harmful or reduced the ability to
write correct code. I was really just curious why the name patterns
needed to be reserved. It just seems that if there is a choice between
fixing a problem on the implementation side rather than introducing
special constraints in the language then it would be better to avoid
complicating the language. I thought this since implementations change
more easily and more often than language standards.

It's just a matter of curiosity is all. In particular I often wonder
about naming constraints such as why an identifier can't begin with a
number such as 2PI or 4PIo3.

Do you happen to know any good sources for explanations of the various
naming restrictions in the standard?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: barmar@alum.mit.edu (Barry Margolin)
Date: Tue, 3 Feb 2004 22:30:24 +0000 (UTC) Raw View

In article <b47de02.0402012055.4fae736c@posting.google.com>,
 duggar@mit.edu (Keith H Duggar) wrote:

> It's just a matter of curiosity is all. In particular I often wonder
> about naming constraints such as why an identifier can't begin with a
> number such as 2PI or 4PIo3.

That's an arbitrary restriction that most languages have inherited from
the early days of Algol, Fortran, and PL/I.  Since the compilers were
typically written in assembly language in the 50's and 60's, and there
were no such things as compiler generators (like lex and yacc), they
designed the syntax to be easy to parse.  One choice they made was to
recognize token types by their first character:

digit = number
alpha = identifier
punctuation = operator

Although modern compiler technology doesn't require this crutch, it has
apparently never been considered a significant enough inconvenience to
prompt the language designers to change it.

A few languages bucked that trend.  For instance, Lisp allows symbols to
start with any character that isn't a delimiter (I'm simplifying it a
bit), and supports escaping to allow any character to be anywhere in a
symbol name.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: do-not-spam-benh@bwsint.com (Ben Hutchings)
Date: Tue, 3 Feb 2004 22:35:04 +0000 (UTC) Raw View

Keith H Duggar wrote:
>> That's an unusual request, since the C++ standard requires a certain
>> amount of macro use.
>
> It does? How interesting. Can you point me to an example of where the
> standard requires macro use? Thanks in advance.
>
>> Unless you can provide a concrete example of where this provision of
>> the standard actually harms the language or reduces the ability to
>> write correct C++ programs, don't expect to see it change.
>
> I didn't mean to imply that it was harmful or reduced the ability to
> write correct code. I was really just curious why the name patterns
> needed to be reserved. It just seems that if there is a choice between
> fixing a problem on the implementation side rather than introducing
> special constraints in the language then it would be better to avoid
> complicating the language. I thought this since implementations change
> more easily and more often than language standards.
>
> It's just a matter of curiosity is all. In particular I often wonder
> about naming constraints such as why an identifier can't begin with a
> number such as 2PI or 4PIo3.

Numeric literals are allowed to have letters on the end of them or in
the middle, and it would be dangerous to say that identifiers can begin
with a digit just so long as they can't be parsed as a numeric literal.
It would also prevent extensions to the format of numeric literals such
as the introduction of hexadecimal floating point in C99.

> Do you happen to know any good sources for explanations of the various
> naming restrictions in the standard?

Many of the naming restrictions are inherited from C so look at the
rationale for C89 (if you can get it).

The additional prohibition in C++ of embedded double-underscores is, I
think, due to the use of double-underscore as a separator between a
function's name and the representation of its parameter types in
mangled names in CFront and other implementations that followed the
description of mangling in the ARM.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: barmar@alum.mit.edu (Barry Margolin)
Date: Wed, 4 Feb 2004 00:48:17 +0000 (UTC) Raw View

In article <slrnc207bv.1og.do-not-spam-benh@tin.bwsint.com>,
 do-not-spam-benh@bwsint.com (Ben Hutchings) wrote:

> The additional prohibition in C++ of embedded double-underscores is, I
> think, due to the use of double-underscore as a separator between a
> function's name and the representation of its parameter types in
> mangled names in CFront and other implementations that followed the
> description of mangling in the ARM.

But since names are being mangled, couldn't the double underscore simply
be mangled into something else (triple underscore, perhaps), so that
there's no conflict?

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: jackklein@spamcop.net (Jack Klein)
Date: Wed, 4 Feb 2004 15:29:47 +0000 (UTC) Raw View

On Tue, 3 Feb 2004 21:26:05 +0000 (UTC), duggar@mit.edu (Keith H
Duggar) wrote in comp.std.c++:

> > That's an unusual request, since the C++ standard requires a certain
> > amount of macro use.
>
> It does? How interesting. Can you point me to an example of where the
> standard requires macro use? Thanks in advance.

Section 16.8 predefined macro names, including __LINE__, __FILE__,
__DATE__, and so on.

17.4.1.2 requires that all standard C library functions defines by the
C standard as actually being macros must also be macros in C++.

There are a few others.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: do-not-spam-benh@bwsint.com (Ben Hutchings)
Date: Wed, 4 Feb 2004 17:21:16 +0000 (UTC) Raw View

Barry Margolin wrote:
> In article <slrnc207bv.1og.do-not-spam-benh@tin.bwsint.com>,
>  do-not-spam-benh@bwsint.com (Ben Hutchings) wrote:
>
>> The additional prohibition in C++ of embedded double-underscores is, I
>> think, due to the use of double-underscore as a separator between a
>> function's name and the representation of its parameter types in
>> mangled names in CFront and other implementations that followed the
>> description of mangling in the ARM.
>
> But since names are being mangled, couldn't the double underscore simply
> be mangled into something else (triple underscore, perhaps), so that
> there's no conflict?

Yes, it could.  However, this restriction always existed in pre-standard
portable C++ and I imagine that the standard committee saw little benefit
and significant costs in removing the restriction.  People using third-
party binary-only libraries for C++ get very upset when their compiler's
ABI changes and they can't use a new compiler without also getting all
new libraries.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]