Thread

Topic: New range-based for and the creation of another kind of "magic" function

Author: Sean Hunt <rideau3@gmail.com>
Date: Fri, 31 Jul 2009 15:11:42 CST Raw View

Disclaimer: I have not yet seen N2930.

With concepts removed from C++1x, range-based for has been downgraded
to an ADL-based solution. While I generally like the idea, my primary
concern is that it creates a new kind of "magic" function - that is,
one that has special meaning to the language.

Currently, C++ has a dearth of "magic" functions. These allow the C++
language to assign special properties to them that other functions do
not have. There are four basic types of magic functions in the
language: ::main(), constructors, destructors, and operator functions
(While there are at least four different types of functions that are
defined with the operator keyword, they can be grouped together for
the purposes of this discussion). Every one of these functions is
specified using a special syntax - there is no ability to mistake it
for a "normal" function with no significance to the language.

I will use the term "magic function" in the remainder of this post to
refer to any function with special language-defined semantics that
differ from ordinary functions, and "magic name" to refer to any name
which, if a function has it, may result in the function being magic.

The new range proposal creates a for loop with syntax "for (i : vec)"
which will, for a vec of a user--defined type, cause an implicit
lookup of begin(vec) and end(vec), but using only argument-dependent
lookup - that is, only functions in the same namespace as vec will be
considered. This makes begin and end into magic names - the first two
such names in C++, other than ::main().

There are a number of issues with magic names in and of themselves,
and why they should not be adopted. Suppose a future version of the
standard decided to allow UDTs to overload the switch() construct
(this is a plausible future extension; EWG has it marked as "open to
resubmit in the future"). The mechanism chosen is simple. The result
of the /condition/ expression being evaluated, call it switch_cond, is
evaluated against each case's /constant-expression/, call it
case_expr, in order, using the ADL-only expression match(switch_cond,
case_expr). This could change the behavior of existing classes with a
match() function as well as an implicit conversion to an integral type
(even if no actual change results, it will still make it slower
barring a very good optimizer). It also means that future coders
wishing to use the function match() need to make sure they do not
accidentally allow use in a switch statement when they do not intend.
If many magic names are added to C++, developers may introduce subtle
bugs when they don't realize a name is magic, or otherwise have to
work around magic names. Range-based for may force designers to
maneuver around begin() and end() as well, though it won't change the
meaning of any currently-valid programs. ::main() is far less of a
risk not only due to being a feature since the language's inception,
but also due to applying only to functions named main() in the global
namespace - developers need not avoid the name main elsewhere in their
programs.

As proposed, ranges have other issues. Chief among them is the
inability to consistently retrieve the iterators used by range-based
for - you can't call begin(object) and end(object) as illustrated in
the following example:

namespace foo {
   struct bar {};
   int* begin(bar);
}

int* begin(bar);

In this case, range-based for applied to an object of type bar will
find only foo::begin(), and work. Attempting to perform the same
lookup otherwise is impossible and will result in ambiguity errors
because ::begin() is also found.

There are also at least one major issue of understandability. Suppose
a person unfamiliar with C++'s range-based for loop sees the following
declaration in a header file:

namespace some_library_namespace {
   int* begin (some_library_type);
   int* end (some_library_type);
}

Quickly grepping the source shows no cases where either function is
called. This code does not explain itself very well, and it violates a
general rule of coding which is not to add functions to a library.
This is not a very desirable trait of a new language extension. It's
also rather unintuitive that begin() and end() must be free functions
- there is discussion elsewhere on this list to use the kludgy
solution of a template in std, with special lookup to find those
templates.

Every one of these problems can be averted by using an already-
existing mechanism to implement begin() and end(): operator functions.
In particular, observe that if begin() and end() where made operators,
they could function much like unary operator functions do - they can
be free or members and can be declared non-intrusively to facilitate
use of library types. Because of a special new name, no existing code
will gain unintended new meanings and developers will not have to
worry about using them. The use of the operator keyword will make it
self-explanatory to someone unfamiliar with this nuance of C++ - even
if completely unfamiliar with operators and not using syntax
highlighting, the use of two identifiers will make the function stand
out. The use of a special function will allow generic code to retrieve
the begin() and end() iterators itself (this may require SFINAE to
distinguish between free and member operators, but that just means
someone will put two functions into Boost that people can use to their
heart's content.).

There is one caveat to an operator implementation, however. "operator
begin" and "operator end" might clash with the existing conversion
functions. The big risk is that "begin operator begin ()". would not
at all do what was intended, and would be completely legal if begin
was a type. An implementation could reasonably be expected to provide
a warning in this situation - "begin" is not a common class name and,
when used, would rarely be an iterator. Likewise for "end operator end
()". If it's decided that the potential for confusion is fatal to the
syntax, I suggest two alternative syntaxes: "operator : begin" and
"operator for begin" (even "operator for : begin", I guess, though
that seems verbose). The first draws parallels to the for(i : vec)
syntax; the latter is more explicit but reads funny. In either case,
the use of an existing token is unambiguous after the operator
keyword.

I'd appreciate feedback, both on my suggestion and my reasoning.

Sean Hunt

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: Francis Glassborow <francis.glassborow@btinternet.com>
Date: Mon, 3 Aug 2009 12:16:35 CST Raw View

Sean Hunt wrote:
> Disclaimer: I have not yet seen N2930.
>
> With concepts removed from C++1x, range-based for has been downgraded
> to an ADL-based solution. While I generally like the idea, my primary
> concern is that it creates a new kind of "magic" function - that is,
> one that has special meaning to the language.

Actually concepts have been removed from C++0x and are quite likely to
be included in C++1x when we get there. I am not nit-picking. The name
we use to refer to a project matters and in this case it is that the
release of the C++ Standard currently being worked on is code named
C++0x. Changing that mid stream because of delayed delivery will just
cause confusion.

>
> Currently, C++ has a dearth of "magic" functions. These allow the C++
> language to assign special properties to them that other functions do
> not have. There are four basic types of magic functions in the
> language: ::main(), constructors, destructors, and operator functions
> (While there are at least four different types of functions that are
> defined with the operator keyword, they can be grouped together for
> the purposes of this discussion). Every one of these functions is
> specified using a special syntax - there is no ability to mistake it
> for a "normal" function with no significance to the language.

While I agree with you that ::main() is special (in that it cannot be
called and falling off the end of it is not an error) and ctors are
special (no return type) and dtors are somehow special because of the
way they chain together how are operator functions special? OK they have
two token names but, IMO, not much else is special about them.

>
> I will use the term "magic function" in the remainder of this post to
> refer to any function with special language-defined semantics that
> differ from ordinary functions, and "magic name" to refer to any name
> which, if a function has it, may result in the function being magic.

OK, but I have always understood 'magic' to refer to an implementation
that cannot be written in C++ and so needs some under the cover action
by the implementation.

>
> The new range proposal creates a for loop with syntax "for (i : vec)"
> which will, for a vec of a user--defined type, cause an implicit
> lookup of begin(vec) and end(vec), but using only argument-dependent
> lookup - that is, only functions in the same namespace as vec will be
> considered. This makes begin and end into magic names - the first two
> such names in C++, other than ::main().
>

Your argument seems to be very convoluted. I wonder if you can rewrie it
focusing on the major issues.

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: Sean Hunt <rideau3@gmail.com>
Date: Mon, 3 Aug 2009 19:14:58 CST Raw View

On Aug 3, 12:16 pm, Francis Glassborow
<francis.glassbo...@btinternet.com> wrote:
> > Currently, C++ has a dearth of "magic" functions. These allow the C++
> > language to assign special properties to them that other functions do
> > not have. There are four basic types of magic functions in the
> > language: ::main(), constructors, destructors, and operator functions
> > (While there are at least four different types of functions that are
> > defined with the operator keyword, they can be grouped together for
> > the purposes of this discussion). Every one of these functions is
> > specified using a special syntax - there is no ability to mistake it
> > for a "normal" function with no significance to the language.
>
> While I agree with you that ::main() is special (in that it cannot be
> called and falling off the end of it is not an error) and ctors are
> special (no return type) and dtors are somehow special because of the
> way they chain together how are operator functions special? OK they have
> two token names but, IMO, not much else is special about them.

The compiler will perform special lookup on operator expressions that
can look up operator functions, and they are restricted in the number
of arguments they can have. There are some other special cases, too
(conversion operators have no distinct return type). For the record,
the four types of operators to which I was referring were "real"
operators (those that overload built symbols), new/delete, conversion
functions, and user-defined literal suffixes. I guess copy assignment
operators could also be considered unique as they get implicit
definition by the compiler.

> > I will use the term "magic function" in the remainder of this post to
> > refer to any function with special language-defined semantics that
> > differ from ordinary functions, and "magic name" to refer to any name
> > which, if a function has it, may result in the function being magic.
>
> OK, but I have always understood 'magic' to refer to an implementation
> that cannot be written in C++ and so needs some under the cover action
> by the implementation.

I consider the term magic to also apply to something that is treated
specially by the compiler - in this case, 'begin' and 'end' functions.

> > The new range proposal creates a for loop with syntax "for (i : vec)"
> > which will, for a vec of a user--defined type, cause an implicit
> > lookup of begin(vec) and end(vec), but using only argument-dependent
> > lookup - that is, only functions in the same namespace as vec will be
> > considered. This makes begin and end into magic names - the first two
> > such names in C++, other than ::main().
>
> Your argument seems to be very convoluted. I wonder if you can rewrie it
> focusing on the major issues.

Basically, my issues are this:
 - We are adding function names that must be treated specially to
avoid unwanted behavior
 - We are adding an element of the language that cannot be reproduced
in language due to its odd lookup rules. This means you cannot, for
instance, perform SFINAE on begin and end with 100% certainty, like
you now can with every other function.
 - We are setting the precedent to possibly add more magic names, and
these might not be backwards-compatible.
 - We are removing clarity from the language.
 - We are precluding a member implementation which, while not
necessarily bad, may be confusing

I think that every one of these can be solved by adopting a new
function id, such as "operator begin" - several suggestions were at
the bottom of the original post.

Sean Hunt


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]