Thread

Topic: Are references to not-quite-objects legal?

Author: alfps@start.no (Alf P. Steinbach)
Date: Sun, 16 Oct 2005 18:33:27 GMT Raw View

[Accidentally posted to comp.lang.c++ first, sorry for the multi-posting]

One main usage of references is for arguments, to replace e.g. void foo(int*)
with void foo(int&).

However, our Holy Standard claims that a reference must be initialized with an
_object_, before going on to show examples of initializing with lvalues.

And there are some lvalues -- I cannot find any restrictions on dereferencing
that would forbid this  -- that aren't very real objects.  They're more like
hypothetical objects.  So, is the program below a correct C++ program?

  #include <cstddef>

  void doStuff( int& firstElem, std::size_t size )
  {
      // whatever
      for( std::size_t i = 0;  i < size;  ++i ) { (&firstElem)[i]; }
  }

  int main()
  {
      int grumblegrumble[666];
      doStuff( *(new int[0]), 0 );
      doStuff( grumblegrumble[666], 0 );
      //doStuff( destroyedButNotDeallocated, 0 );
  }

The reason I ask is that if it is a correct program, then in a small piece I'm
writing I'll use some special term to denote pointers that do point to fully
real objects, and corresponding references.  I thought "RealGood" would be
nice for that purpose, with Un-RealGood, "URG", denoting not RealGood, where
RealGood implies valid.  But if the actual arguments in this program are not
allowed by the standard, then the not-quite-objects do not, presumably, even
exist as "objects", and then it would perhaps be incorrect even to dereference
the URG pointers shown here (although they would be comparable entities)?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: eldiener@earthlink.net (Edward Diener)
Date: Mon, 17 Oct 2005 05:00:14 GMT Raw View

Alf P. Steinbach wrote:
> [Accidentally posted to comp.lang.c++ first, sorry for the multi-posting]
>
> One main usage of references is for arguments, to replace e.g. void foo(int*)
> with void foo(int&).
>
> However, our Holy Standard claims that a reference must be initialized with an
> _object_, before going on to show examples of initializing with lvalues.
>
> And there are some lvalues -- I cannot find any restrictions on dereferencing
> that would forbid this  -- that aren't very real objects.  They're more like
> hypothetical objects.

No such thing.

>  So, is the program below a correct C++ program?
>
>   #include <cstddef>
>
>   void doStuff( int& firstElem, std::size_t size )
>   {
>       // whatever
>       for( std::size_t i = 0;  i < size;  ++i ) { (&firstElem)[i]; }
>   }
>
>   int main()
>   {
>       int grumblegrumble[666];
>       doStuff( *(new int[0]), 0 );
>       doStuff( grumblegrumble[666], 0 );
>       //doStuff( destroyedButNotDeallocated, 0 );
>   }

It's a "correct" C++ program in the sense that it will compile and link.
However I would not bet on it working "correctly" at run-time. You are
attempting to access objects outside the bounds of your arrays in your
doStuff calls. C++ does no compile time checking of array bounds, so you
compile/link fine but you do not run fine.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Mon, 17 Oct 2005 05:43:46 GMT Raw View

* Edward Diener:
> Alf P. Steinbach wrote:
> > [Accidentally posted to comp.lang.c++ first, sorry for the multi-posting]
> >
> > One main usage of references is for arguments, to replace e.g. void foo(int*)
> > with void foo(int&).
> >
> > However, our Holy Standard claims that a reference must be initialized with an
> > _object_, before going on to show examples of initializing with lvalues.
> >
> > And there are some lvalues -- I cannot find any restrictions on dereferencing
> > that would forbid this  -- that aren't very real objects.  They're more like
> > hypothetical objects.
>
> No such thing.

I'm sorry, but that doesn't help me: are you saying the standard doesn't
define the term "hypothetical object", like, it doesn't define "doStuff"?  I
know.  Perhaps there is something to learn here, not just about words. ;-)


> >  So, is the program below a correct C++ program?
> >
> >   #include <cstddef>
> >
> >   void doStuff( int& firstElem, std::size_t size )
> >   {
> >       // whatever
> >       for( std::size_t i = 0;  i < size;  ++i ) { (&firstElem)[i]; }
> >   }
> >
> >   int main()
> >   {
> >       int grumblegrumble[666];
> >       doStuff( *(new int[0]), 0 );
> >       doStuff( grumblegrumble[666], 0 );
> >       //doStuff( destroyedButNotDeallocated, 0 );
> >   }
>
> It's a "correct" C++ program in the sense that it will compile and link.
> However I would not bet on it working "correctly" at run-time. You are
> attempting to access objects outside the bounds of your arrays in your
> doStuff calls.

That's the _question_.  And I'm sorry, again, but a simple assertion as answer
does not help.  Do you perhaps have a reference (or two) to the standard,
and/or some reasoning to back up the assertion?


> C++ does no compile time checking of array bounds, so you
> compile/link fine but you do not run fine.

Actually it runs fine on all compilers tested on, and I personally see no
reason why it shouldn't (when using a standard-conforming compiler), but the
question is, whether it's formally UB or directly disallowed, or not.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: dave@boost-consulting.com (David Abrahams)
Date: Tue, 18 Oct 2005 03:11:15 GMT Raw View

alfps@start.no (Alf P. Steinbach) writes:

> * Edward Diener:
>> Alf P. Steinbach wrote:
>> > [Accidentally posted to comp.lang.c++ first, sorry for the multi-posting]
>> >
>> > One main usage of references is for arguments, to replace e.g. void foo(int*)
>> > with void foo(int&).
>> >
>> > However, our Holy Standard claims that a reference must be initialized with an
>> > _object_, before going on to show examples of initializing with lvalues.

Examples are non-normative.  You shouldn't draw any hard conclusions
based on what you (don't) happen to see in them.

>> > And there are some lvalues -- I cannot find any restrictions on dereferencing
>> > that would forbid this  -- that aren't very real objects.  They're more like
>> > hypothetical objects.
>>
>> No such thing.
>
> I'm sorry, but that doesn't help me: are you saying the standard
> doesn't define the term "hypothetical object", like, it doesn't
> define "doStuff"?  I know.  Perhaps there is something to learn
> here, not just about words. ;-)

Well your use of the term "hypothetical object" just confuses matters
(for you as well as for those trying to answer you), so one thing to
learn might be to avoid making up new terminology, especially without
defining it. ;-)

>> >  So, is the program below a correct C++ program?
>> >
>> >   #include <cstddef>
>> >
>> >   void doStuff( int& firstElem, std::size_t size )
>> >   {
>> >       // whatever
>> >       for( std::size_t i = 0;  i < size;  ++i ) { (&firstElem)[i]; }
>> >   }
>> >
>> >   int main()
>> >   {
>> >       int grumblegrumble[666];
>> >       doStuff( *(new int[0]), 0 );
>> >       doStuff( grumblegrumble[666], 0 );
>> >       //doStuff( destroyedButNotDeallocated, 0 );
>> >   }

After looking at your code, this is a simpler problem than your use of
language makes it seem to be (though not entirely trivial).  We can
handle *(new int[0]) with:

  3.7.3.1/2

  The results of dereferencing a pointer returned as a request for
  zero size are undefined.

for grumblegrumble[666], the following is almost helpful

  8 Declarators
  8.3.2 References
  ...
  4 There shall be no references to references, no arrays of
  references, and no pointers to references. The declaration of a
  reference shall contain an initializer (8.5.3) except when the
  declaration contains an explicit extern specifier (7.1.1), is a
  class member (9.2) declaration within a class declaration, or is the
  declaration of a parameter or a return type (8.3.5); see 3.1.
  A reference shall be initialized to refer to a valid object or
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  function.

There is no valid object at (new int[0]) or at grumblegrumble + 666 --
the definition of "object" in 1.8 makes that clear:

  1.8 The C++ object model

  1 The constructs in a C++ program create, destroy, refer to, access,
  and manipulate objects. An object is a region of storage.
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Unfortunately, it's hard to find any standard language connecting
expressions of the form ``*p'' with references (as opposed to
lvalues).  However this establishes that *p is expected to refer to an
object:

5.3.1 Unary operators

  1 The unary * operator performs indirection: the expression to which
  it is applied shall be a pointer to an object type, or a pointer to
  a function type and the result is an lvalue referring to the object
  or function to which the expression points.

And the following passage shows that subscripting the end of an array
and dereferencing it are equivalent.

  5.2.1 Subscripting

  1 A postfix expression followed by an expression in square brackets
  is a postfix expression. One of the expressions shall have the type
  ``pointer to T'' and the other shall have enumeration or integral
  type. The result is an lvalue of type ``T.'' The type ``T'' shall be
  a completely-defined object type. 56) The expression E1[E2] is
  identical (by definition) to *((E1)+(E2)).

Therefore, dereferencing the first expression produces undefined
behavior, and indexing grumblegrumble with 666 does the same.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com
Date: Mon, 17 Oct 2005 22:11:52 CST Raw View

Alf P. Steinbach wrote:

> However, our Holy Standard claims that a reference must be initialized with an
> _object_, before going on to show examples of initializing with lvalues.

That's correct.  (8.3.2/4)

> And there are some lvalues -- I cannot find any restrictions on dereferencing
> that would forbid this  -- that aren't very real objects.  They're more like
> hypothetical objects.

The term the committee seems to be converging on is "empty lvalue".
See:

  http://www2.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232

> So, is the program below a correct C++ program?
>
>   #include <cstddef>
>
>   void doStuff( int& firstElem, std::size_t size )
>   {
>       // whatever
>       for( std::size_t i = 0;  i < size;  ++i ) { (&firstElem)[i]; }
>   }
>
>   int main()
>   {
>       int grumblegrumble[666];
>       doStuff( *(new int[0]), 0 );
>       doStuff( grumblegrumble[666], 0 );
>       //doStuff( destroyedButNotDeallocated, 0 );
>   }
>

Nope: a reference shall be initialized to refer to a valid object or
function.  The fact that you can form an lvalue which does't designate
a valid object or function doesn't change this.

The committee is working on a proposal to clarify -- and loosen -- this
a bit:

  http://www2.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#453

The loosening will allow initializing a reference to designate an
uninitialized area of storage of proper size and alignment (not a
"valid object" in my book).  However, initializing a reference with *i,
where i is a pointer to the one-past-the-end element of an array, will
remain undefined behavior.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Wed, 19 Oct 2005 05:19:14 GMT Raw View

* David Abrahams:
>
> [clear & detailed explanation, snipped]

Thank you, David.

As always, I disagree with some of what you write, this time about the ne=
ed
for "inventing" new terminology, in the context that was mentioned, about
hypothetical objects -- I disagree that describing a red bird as a "red b=
ird"
is to invent new terminology, and I think spades should be called spades.=
 ;-)

Since you answered everything I explicitly asked for I thought of providi=
ng a
summary, as was usual on Usenet in the old days, but then johnchx2' reply=
 made
me realize there will possibly be more to this thread, perhaps far more.

Namely, after checking out the discussion that johnchx2 pointed to I'm no=
w
very happy that it turned out the standard's rules are currently such as =
you
pointed out, where valid references are really valid, and I'm very concer=
ned
about the proposed resolution of issue #232, to allow dereferencing of
nullpointers  --  for that's the point at which compilers currently have =
the
opportunity to put in checking code that terminates the program.

In addition to that immediate practical matter the proposed resolution me=
ans
that one isn't guaranteed that an empty lvalue as reference argument mean=
s a
bug somewhere else, i.e. one cannot any longer make that strong and usefu=
l
assumption, and so the proposed resolution not only removes the
checking-for-nullpointer support as a correctness-preserving automatic pr=
ogram
transformation, it also removes a very useful design-level constraint.  A=
 much
better resolution would IMO be to remove the current wording that allows
dereferencing of nullpointers in typeid expressions, in =A75.2.8/2.

Cheers,
(and again, thanks)

- Alf

--=20
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Wed, 19 Oct 2005 05:20:06 GMT Raw View

* David Abrahams:
>
> Therefore, dereferencing the first expression produces undefined
> behavior, and indexing grumblegrumble with 666 [one past the end]
> does the same.

I didn't see this when I responded half an hour ago or so, but:

  &a[666]            -- Undefined Behavior
  a + 666            -- OK.

That seems to be an unintentional effect of the standard's wording, breaking
just about all code in existence!

Of course this could be fixed by adding in even more exceptional clauses in
the standard, just like in a class with no well-defined class invariant the
code is peppered with validity checks.

I suggest instead generalizing the whole thing, by the simple expedient of
additional indirection, which is the Universal Answer in computer science.

I.e., a new term like syntactical lvalue should IMHO be introduced.  Then a[i]
produces a syntactical lvalue, a much more safe kind of entity.  A syntactical
lvalue converts implicitly to lvalue any place an lvalue is required, that
conversion being valid if and only if the resulting lvalue would refer to an
object, a real object, that is ;-).  And & can be defined to not require an
actual lvalue but just a syntactical lvalue.

Relating this to the proposed empty lvalue concept (issue #232): a syntactical
lvalue can be regarded as the union of currently valid lvalue and proposed
empty lvalue.  But used only such that the #232 proposed empty lvalue would
never actually exist in a correct program.  I.e., instead of unsafety, safety.

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Wed, 19 Oct 2005 05:20:21 GMT Raw View

* johnchx2@yahoo.com:
>
>   http://www2.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
>   http://www2.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#453

Thank you,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: dave@boost-consulting.com (David Abrahams)
Date: Thu, 20 Oct 2005 02:17:15 GMT Raw View

alfps@start.no (Alf P. Steinbach) writes:

> * David Abrahams:
>>
>> Therefore, dereferencing the first expression produces undefined
>> behavior, and indexing grumblegrumble with 666 [one past the end]
>> does the same.
>
> I didn't see this when I responded half an hour ago or so, but:
>
>   &a[666]            -- Undefined Behavior
>   a + 666            -- OK.
>
> That seems to be an unintentional effect of the standard's wording, breaking
> just about all code in existence!

I can't comment on whether it's intentional, but it's a fairly
well-known constraint that many programmers have learned to live
with.  It's also annoying, which is part of the reason for CWG 232.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: dave@boost-consulting.com (David Abrahams)
Date: Thu, 20 Oct 2005 02:16:53 GMT Raw View

alfps@start.no (Alf P. Steinbach) writes:

> * David Abrahams:
>>
>> [clear & detailed explanation, snipped]
>
> Thank you, David.
>
> As always, I disagree with some of what you write, this time about
> the need for "inventing" new terminology, in the context that was
> mentioned, about hypothetical objects -- I disagree that describing
> a red bird as a "red bird" is to invent new terminology, and I think
> spades should be called spades. ;-)

Me too.  The standard is very clear over what is and what is not an
object.  There's no room for fuzzy "hypothetical" objects.

> Namely, after checking out the discussion that johnchx2 pointed to
> I'm now very happy that it turned out the standard's rules are
> currently such as you pointed out, where valid references are really
> valid, and I'm very concerned about the proposed resolution of issue
> #232, to allow dereferencing of nullpointers -- for that's the point
> at which compilers currently have the opportunity to put in checking
> code that terminates the program.
>
> In addition to that immediate practical matter the proposed
> resolution means that one isn't guaranteed that an empty lvalue as
> reference argument means a bug somewhere else, i.e. one cannot any
> longer make that strong and useful assumption, and so the proposed
> resolution not only removes the checking-for-nullpointer support as
> a correctness-preserving automatic program transformation, it also
> removes a very useful design-level constraint.  A much better
> resolution would IMO be to remove the current wording that allows
> dereferencing of nullpointers in typeid expressions, in =A75.2.8/2.

I think there's a lot more to it than that.  In fact, your motivation
to disallow that case seems to conflict with your desire to write

   &x[666]


Do you think it should be okay to dereference one-past-the-end
pointers that don't refer to real objects, but not okay to dereference
null pointers (that also don't refer to real objects)?  If so, why is
that best?

--=20
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Thu, 20 Oct 2005 04:41:44 GMT Raw View

* David Abrahams:
> alfps@start.no (Alf P. Steinbach) writes:
>
> > * David Abrahams:
> >>
> >> Therefore, dereferencing the first expression produces undefined
> >> behavior, and indexing grumblegrumble with 666 [one past the end]
> >> does the same.
> >
> > I didn't see this when I responded half an hour ago or so, but:
> >
> >   &a[666]            -- Undefined Behavior
> >   a + 666            -- OK.
> >
> > That seems to be an unintentional effect of the standard's wording, breaking
> > just about all code in existence!
>
> I can't comment on whether it's intentional, but it's a fairly
> well-known constraint that many programmers have learned to live
> with.  It's also annoying, which is part of the reason for CWG 232.

Yes, allowing dereferencing of nullpointer and other empty lvalue pointers
would get rid of this incongruence.  But  --  to allow dereferencing of
nullpointers to solve this is like getting rid of an itch by ordering a
nuclear strike on one's own position.  After that, no itch, to be sure!

Instead of applying such brute force, as I wrote in the message the above is a
response to, the simple expedient of a bit more indirection in the wording,
e.g. via a new term for syntactical lvalue, would solve this.

I sincerely hope the committee will reconsider the envisioned nuclear strike,
and belay that order. ;-)

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Thu, 20 Oct 2005 14:51:21 GMT Raw View

* David Abrahams -> Alf P. Steinbach:
>
> Do you think it should be okay to dereference one-past-the-end
> pointers that don't refer to real objects,

At the syntax level, yes.

That means, not binding the result to a reference.

Detail: we wouldn't want to allow &a[666].someMember either, because that
would not allow such an address to be trapped by the processor, necessitating
an area of valid address range equal to the size of an array element, which
can be arbitrarily large, instead of just one byte.  So essentially it means
that the only thing possible to do with a[666] should be to apply the address
operator, as is common to do.  And as I wrote earlier, that can probably be
accomplished cleanly by introducing a syntactical lvalue concept, instead of a
collection of special cases and exceptions.


> but not okay to dereference null pointers

Yes.


> (that also don't refer to real objects)?

No, I don't think it's a Good Idea to think in terms of nullpointers referring
to #232 "empty lvalue"s, hypothetical objects.

To me it's much more clear to say that a nullpointer does _not_ refer to an
object or lvalue, at all.

Isn't it?


> If so, why [in your opinion] is that best?

"that" being: okay to dereference one-past-the-end pointers at the syntax
level, but not okay to dereference nullpointers.

For standardization purposes, because that's the existing practice. ;-)

But as to why we would want dererencing being also formally the way of the
existing practice, _apart_ from the general principle of codifying existing
practice: because &a[666] is meaningful and clear, and supports using indexing
syntax in a loop instead of pointer operations, is therefore commonly used,
and there is no practical reason to want it disallowed.  Whereas *((T*)0)
isn't meaningful, isn't clear, is not used at all, and there is a very good
reason to want it to continue to be disallowed.  Namely, as I wrote, that
allowing it would remove a currently valid, correctness-preserving program
transformation that helps in debugging, checking for nullpointer dereferencing
with program termination.  Also, the compiler can detect some such cases at
compile time.  Allowing *((T*)0) would throw all that overboard.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: usenet-nospam@nmhq.net (Niklas Matthies)
Date: Fri, 21 Oct 2005 05:11:34 GMT Raw View

On 2005-10-20 14:51, Alf P. Steinbach wrote:
> * David Abrahams -> Alf P. Steinbach:
>>
>> Do you think it should be okay to dereference one-past-the-end
>> pointers that don't refer to real objects,
>
> At the syntax level, yes.
>
> That means, not binding the result to a reference.
>
> Detail: we wouldn't want to allow &a[666].someMember either, because
> that would not allow such an address to be trapped by the processor,
> necessitating an area of valid address range equal to the size of an
> array element, which can be arbitrarily large, instead of just one
> byte.  So essentially it means that the only thing possible to do
> with a[666] should be to apply the address operator, as is common to
> do.  And as I wrote earlier, that can probably be accomplished
> cleanly by introducing a syntactical lvalue concept, instead of a
> collection of special cases and exceptions.

C99 does something like that. The wording is:

   The unary & operator returns the address of its operand. [...]
   If the operand is the result of a unary * operator, neither that
   operator nor the & operator is evaluated and the result is as if
   both were omitted, except that the constraints on the operators
   still apply and the result is not an lvalue. Similarly, if the
   operand is the result of a [] operator, neither the & operator nor
   the unary * that is implied by the [] is evaluated and the result
   is as if the & operator were removed and the [] operator were
   changed to a + operator. [...]

-- Niklas Matthies

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: ben-public-nospam@decadentplace.org.uk (Ben Hutchings)
Date: Fri, 21 Oct 2005 05:12:17 GMT Raw View

Alf P. Steinbach <alfps@start.no> wrote:
> * David Abrahams -> Alf P. Steinbach:
>>
>> Do you think it should be okay to dereference one-past-the-end
>> pointers that don't refer to real objects,
<snip>
>> but not okay to dereference null pointers
<snip>
>> (that also don't refer to real objects)?
>
> No, I don't think it's a Good Idea to think in terms of nullpointers referring
> to #232 "empty lvalue"s, hypothetical objects.
>
> To me it's much more clear to say that a nullpointer does _not_ refer to an
> object or lvalue, at all.
>
> Isn't it?
<snip>

The lvalue-ness or rvalue-ness of an expression is a static
syntactical property, not a dynamic one.  It is not possible for the
result of derefencing a pointer to be either an lvalue or not
depending on the value of the pointer at run-time.  That surely is why
there is a distinction between lvalues of object type and objects.

Ben.

--
Ben Hutchings
Never put off till tomorrow what you can avoid all together.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com
Date: 21 Oct 2005 05:20:10 GMT Raw View

Alf P. Steinbach wrote:
> To me it's much more clear to say that a nullpointer does _not_ refer to an
> object or lvalue, at all.

Well, hold on a minute.  Nothing "refers to" an lvalue.  lvalues are
expressions, and *they* refer to objects or to functions (or, if 232 is
adopted, to nothing at all).

If i has type T*, then the expression *i is an lvalue.  (What else
would it be...an rvalue?)  This does not depend upon the value of i.

What may depend on the value of i is whether evaluating the expression
*i gives rise to undefined behavior.  If i is singular, then evaluating
*i does result in undefined behavior (since *i could entail an
lvalue-to-rvalue conversion applied to i).  That's settled law.  What's
up for grabs is whether evaluating *i leads to undefined behavior where
i is non-singular, but doesn't point to a live object of type T.

The reason that that can be debated is that evaluating *i, by itself,
is basically a no-op.  It doesn't actually do anything.  (I'm assuming
that the subexpression i has been evaluated successfully, i.e., that i
is non-singular.)

Now, the language already has lvalues that can't be written through
("this" is a non-modifiable lvalue) and that can't be safely read (i,
where i denotes a singular value of type T*, or any lvalue with
non-void type denoting raw storage).  So if you're assuming that having
an lvalue entitles you to read or write through it, you're already on
the wrong track.

Adding the notion of an "empty lvalue" really doesn't change the
assumptions you can make about lvalues in general.

Moreover, no-one is proposing to allow references to be bound to empty
lvalues, so there's no need to worry about null references.  They're as
undefined as ever.

> Namely, as I wrote, that allowing it would remove a currently
> valid, correctness-preserving program transformation that helps
> in debugging, checking for nullpointer dereferencing
> with program termination.  Also, the compiler can detect some
> such cases at compile time.  Allowing *((T*)0) would throw
> all that overboard.

I'm not sure it does.

Keep in mind that the notion of "dereferencing a pointer," by itself,
is purely a source-code level artifact.  The two expression statements:

  *i;
   i;

will likely generate the same machine code (if any).

Where it becomes interesting is when the lvalue obtained by evaluating
*i is used for something else...either reading or writing.  And there's
nothing that prohibits the compiler from inserting the null-check at
THAT point.  Which, to the working programmer, is essentially
indistinguishable.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Fri, 21 Oct 2005 14:46:42 GMT Raw View

* johnchx2@yahoo.com:
> * Alf P. Steinbach:
>
> > To me it's much more clear to say that a nullpointer does _not_ refer to an
> > object or lvalue, at all.
>
> Well, hold on a minute.  Nothing "refers to" an lvalue.

If you like,

    "a nullpointer does _not_ refer to an (object) or (what an lvalue refers
    to)".

But I think requiring such completeness is wordplay.

Let's not describe the something that someone might think a nullpointer refers
or "points" to, as an empty lvalue, as is done in the #232 discussion, and as
you note above is meaningless.

Let's just plainly say that a nullpointer does not refer to an object or
(empty) lvalue, at all.

That's much more clear, isn't it?


>  lvalues are expressions, and *they* refer to objects or to functions

Yes, we're in agreement on the substance there, and what I indicated is that I
think the term and concept of "empty lvalue" really muddles the waters.


> (or, if 232 is adopted, to nothing at all).

But as I read what you write, we're not in agreement here.

The nullpointer address is special in that it can be a single well-known
address that can be checked for, and needs to be distinguished from addresses
that are less insubstantial and therefore less easy to check for.

If the standardization committee loses sight of the practical reality, the
reason that there is a standard, in favor of ingenious but ultimately empty
formalism, then the results will IMHO be sub-optimal.


[snip]
> Moreover, no-one is proposing to allow references to be bound to empty
> lvalues, so there's no need to worry about null references.

Again, I think that loses sight of practical reality in favor of formalism.

In practical programming one _must_ worry about invalid references, and the
question is how far the standard goes in actively supporting their avoidance.


> They're as undefined as ever.

No, they're a bit less undefined.  Still undefined, but somewhat less.  Just
like a girl who doesn't use contraceptives is a little less not-yet-pregnant.
;-)


> > Namely, as I wrote, that allowing it would remove a currently
> > valid, correctness-preserving program transformation that helps
> > in debugging, checking for nullpointer dereferencing
> > with program termination.  Also, the compiler can detect some
> > such cases at compile time.  Allowing *((T*)0) would throw
> > all that overboard.
>
> I'm not sure it does.
>
> Keep in mind that the notion of "dereferencing a pointer," by itself,
> is purely a source-code level artifact.  The two expression statements:
>
>   *i;
>    i;
>
> will likely generate the same machine code (if any).

Not when the compiler emits dereferencing checking, which currently is a
valid, correctness-preserving program transformation.


> Where it becomes interesting is when the lvalue obtained by evaluating
> *i is used for something else...either reading or writing.  And there's
> nothing that prohibits the compiler from inserting the null-check at
> THAT point.

Oh, but there is.

Consider, for example:

   extern void sometimesChokeOnZero( int* );
   int main()
   {
      int* p = 0;
      sometimesChokeOnZero( &*p );
   }

If dereferencing a nullpointer becomes valid the compiler the standard then
prohibits the compiler from emitting conforming code that detects this error
in main.

For another example, one which currently is not UB,

   struct T { virtual ~T() {} };

   void foo( T* p )
   {
       typeid( *p );
   }

   bool theHiredHelpsFunctionSucceeded()
   {
       try{ foo( 0 ); } catch( ... ) {}
       return true;
   }

   int main()
   {
      if( theHiredHelpsFunctionSucceeded() )
      {
          // Something, perhaps something critical.
      }
   }

All errors can not be detected.

The standard should support error detection to the degree it's possible.


>  Which, to the working programmer, is essentially indistinguishable.

No, not that, either: a delayed "lazy" check yields much less information, and
so is much less helpful.

Anyway, what's _gained_ by allowing *((T*)0)?

Nothing, except the ability to keep what is currently an inconsistency, namely
the typeid spec.

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Fri, 21 Oct 2005 14:46:50 GMT Raw View

* Ben Hutchings:
> * Alf P. Steinbach:
 >
> > No, I don't think it's a Good Idea to think in terms of nullpointers referring
> > to #232 "empty lvalue"s, hypothetical objects.
> >
> > To me it's much more clear to say that a nullpointer does _not_ refer to an
> > object or lvalue, at all.
> >
> > Isn't it?
>
> The lvalue-ness or rvalue-ness of an expression is a static
> syntactical property, not a dynamic one.  It is not possible for the
> result of derefencing a pointer to be either an lvalue or not
> depending on the value of the pointer at run-time.  That surely is why
> there is a distinction between lvalues of object type and objects.

Yes, that's one reason why saying that a nullpointer refers to an "empty
lvalue" is, IMO, not meaningful  --  and misleading.

In other words, it seems we're in agreement on that.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com
Date: Sat, 22 Oct 2005 08:49:55 CST Raw View

Alf P. Steinbach wrote:
> * johnchx2@yahoo.com:
> > * Alf P. Steinbach:
> >
> > > To me it's much more clear to say that a nullpointer does _not_ refer to an
> > > object or lvalue, at all.
> >
> > Well, hold on a minute.  Nothing "refers to" an lvalue.
>
> If you like,
>
>     "a nullpointer does _not_ refer to an (object) or (what an lvalue refers
>     to)".
>
> But I think requiring such completeness is wordplay.
>
> Let's not describe the something that someone might think a nullpointer refers
> or "points" to, as an empty lvalue, as is done in the #232 discussion, and as
> you note above is meaningless.

No, you seem to have missed my point.  Nobody wants to describe what a
null pointer *points to* as any kind of lvalue (empty, full, half-full,
or what have you).  An lvalue is an expression.  In particular, the
expression "*i" is an lvalue if i has pointer type, regardless of
whether i points to anything.

No wordplay.  Just trying to use the terms correctly.

> Let's just plainly say that a nullpointer does not refer to an object or
> (empty) lvalue, at all.
>
> That's much more clear, isn't it?

Since the idea of a pointer refering to an lvalue is nonsensical, I
can't say that I find it clear.

You could just say that a null pointer does not point to an object, but
I'm not sure that adds anything.

> The nullpointer address is special in that it can be a single well-known
> address that can be checked for, and needs to be distinguished from addresses
> that are less insubstantial and therefore less easy to check for.

I'm not sure I understand what "less insubstantial" means.  But I'm
guessing that you mean that the null pointer value is known not to
point to an object, while other bit patterns, interpreted as pointers,
might.

And it can be checked for.  However, not all uses of null pointers are
errors.  Null pointers are legal for a reason.

> [snip]
> > Moreover, no-one is proposing to allow references to be bound to empty
> > lvalues, so there's no need to worry about null references.
>
> Again, I think that loses sight of practical reality in favor of formalism.
>
> In practical programming one _must_ worry about invalid references, and the
> question is how far the standard goes in actively supporting their avoidance.

Just to be clear: I didn't mean that you didn't have to worry about
forming null references...you must avoid doing that just as you must
avoid any other code that exhibits undefined behavior.  I meant that
users of references (e.g. functions which take reference parameters)
are just as entitled to assume that those references are non-null as
they are today.

> > Where it becomes interesting is when the lvalue obtained by evaluating
> > *i is used for something else...either reading or writing.  And there's
> > nothing that prohibits the compiler from inserting the null-check at
> > THAT point.
>
> Oh, but there is.
>
> Consider, for example:
>
>    extern void sometimesChokeOnZero( int* );
>    int main()
>    {
>       int* p = 0;
>       sometimesChokeOnZero( &*p );
>    }
>
> If dereferencing a nullpointer becomes valid the compiler the standard then
> prohibits the compiler from emitting conforming code that detects this error
> in main.

That's because there's no error in main().  Passing a null pointer to a
function that takes a pointer is, in the general case, not an error.
Nor does it seem like a good idea to treat *p as some sort of shorthand
for assert(p).

>
> For another example, one which currently is not UB,
>
>    struct T { virtual ~T() {} };
>
>    void foo( T* p )
>    {
>        typeid( *p );
>    }
>
>    bool theHiredHelpsFunctionSucceeded()
>    {
>        try{ foo( 0 ); } catch( ... ) {}
>        return true;
>    }
>
>    int main()
>    {
>       if( theHiredHelpsFunctionSucceeded() )
>       {
>           // Something, perhaps something critical.
>       }
>    }
>
> All errors can not be detected.

Maybe I'm missing something: as I read the example, it does exhibit
undefined behavior (binding a reference to an lvalue which doesn't
refer to a valid object), and the error can be detected -- where the
error actually occurs, in foo().  This is where the error should be
detected, because it is foo() that is written incorrectly (taking a
pointer and assuming that it is non-null).

>
> The standard should support error detection to the degree it's possible.
>

Yes, but, as Stroustrup says, the compiler is not psychic.  I don't
want my compiler rewriting my program to dump core at points where it
guesses that I may have taken an action which, someday, could turn out
to have been an error.

> No, not that, either: a delayed "lazy" check yields much less information, and
> so is much less helpful.

Doesn't your debugger let you look at the call stack?

>
> Anyway, what's _gained_ by allowing *((T*)0)?
>

I'll have to leave answering that to the authors of the proposal.  ;-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Sat, 22 Oct 2005 13:37:37 CST Raw View

* johnchx2@yahoo.com:
> Alf P. Steinbach wrote:
> > * johnchx2@yahoo.com:
> > > * Alf P. Steinbach:
> > >
> > > > To me it's much more clear to say that a nullpointer does _not_ refer to an
> > > > object or lvalue, at all.
> > >
> > > Well, hold on a minute.  Nothing "refers to" an lvalue.
> >
> > If you like,
> >
> >     "a nullpointer does _not_ refer to an (object) or (what an lvalue refers
> >     to)".
> >
> > But I think requiring such completeness is wordplay.
> >
> > Let's not describe the something that someone might think a nullpointer refers
> > or "points" to, as an empty lvalue, as is done in the #232 discussion, and as
> > you note above is meaningless.
>
> No, you seem to have missed my point.  Nobody wants to describe what a
> null pointer *points to* as any kind of lvalue (empty, full, half-full,
> or what have you).  An lvalue is an expression.  In particular, the
> expression "*i" is an lvalue if i has pointer type, regardless of
> whether i points to anything.

Well then, exactly what's empty in an empty lvalue expression?


> No wordplay.  Just trying to use the terms correctly.

Me too. :-o


[snip]
> >
> > Consider, for example:
> >
> >    extern void sometimesChokeOnZero( int* );
> >    int main()
> >    {
> >       int* p = 0;
> >       sometimesChokeOnZero( &*p );
> >    }
> >
> > If dereferencing a nullpointer becomes valid the compiler the standard then
> > prohibits the compiler from emitting conforming code that detects this error
> > in main.
>
> That's because there's no error in main().

There is: derefencing p, even just for immediately applying &, is Undefined
Behavior.


> Passing a null pointer to a
> function that takes a pointer is, in the general case, not an error.

?


> Nor does it seem like a good idea to treat *p as some sort of shorthand
> for assert(p).

Nobody suggested that, but rather that it isn't a good idea to remove the
possibility of automatically checking such expressions.  They do crop up here
and there.  For example, in template code you might have an iterator or a raw
pointer, you don't know which, and write &*i to get at the raw pointer.  Now
with the suggested no-UB for that operation for raw pointers, that code
snippet is guaranteed to work as long as it's tested with only raw pointers,
some of which are (contrary to expectation) nullpointers.  Give it an iterator
where the final result is 0 (again, contrary to expectations), and it might
blow up, and that could easily have been uncovered with the raw pointer
testing if the compiler was allowed to add in a little check there.

In short, with *((T*)0) as UB and SeriousNoNo we know what to avoid.

With *((T*)0) well-defined people will start _using_ it, and that must be the
intention, for why else allow it?


> > For another example, one which currently is not UB,
> >
> >    struct T { virtual ~T() {} };
> >
> >    void foo( T* p )
> >    {
> >        typeid( *p );
> >    }
> >
> >    bool theHiredHelpsFunctionSucceeded()
> >    {
> >        try{ foo( 0 ); } catch( ... ) {}
> >        return true;
> >    }
> >
> >    int main()
> >    {
> >       if( theHiredHelpsFunctionSucceeded() )
> >       {
> >           // Something, perhaps something critical.
> >       }
> >    }
> >
> > All errors can not be detected.
>
> Maybe I'm missing something: as I read the example, it does exhibit
> undefined behavior (binding a reference to an lvalue which doesn't
> refer to a valid object),

Nope, para 5.2.8/2 (inconsistently, for sure!) explicitly allows dereferencing
a nullpointer in this context, guaranteeing a std::bad_typeid exception  --
which of course our hired help's function swallows.


> and the error can be detected -- where the
> error actually occurs, in foo().

Nope, foo is correct as can be per the current standard.


> This is where the error should be
> detected, because it is foo() that is written incorrectly (taking a
> pointer and assuming that it is non-null).

Nope.


> > The standard should support error detection to the degree it's possible.
> >
>
> Yes, but, as Stroustrup says, the compiler is not psychic.  I don't
> want my compiler rewriting my program to dump core at points where it
> guesses that I may have taken an action which, someday, could turn out
> to have been an error.

Well, I want it to invoke Just In Time debugging when the code goes UB... ;-)


> > No, not that, either: a delayed "lazy" check yields much less information, and
> > so is much less helpful.
>
> Doesn't your debugger let you look at the call stack?
>
> > Anyway, what's _gained_ by allowing *((T*)0)?
> >
>
> I'll have to leave answering that to the authors of the proposal.  ;-)

I won't speculate, either, more than I did above...

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com
Date: Mon, 24 Oct 2005 00:46:11 CST Raw View

Alf P. Steinbach wrote:
>
> Well then, exactly what's empty in an empty lvalue expression?
>

Perhaps I don't understand the question -- or the intent behind it.  If
232 is adopted, "empty lvalue" will be introduced into the standard as
a formal term meaning an lvalue which does not designate an object or
function.

The question, "What's empty in an empty lvalue?" sounds more
metaphysical than technical.  Informally, I don't suppose it hurts to
say "the lvalue is empty," following the ordinary rules of English
construction (i.e. in a phrase which has the form adjective-noun, the
adjective modifies the noun, thus an "empty lvalue" is an lvalue which
is empty).  But I wouldn't lay too much stress on the deeper
implications of this kind of construction.  "Empty lvalue" is (or will
be) a technical term, nothing more.

> > That's because there's no error in main().
>
> There is: derefencing p, even just for immediately applying &, is Undefined
> Behavior.
>

Not if 232 were adopted.

I thought that the point of your example was that making dereferencing
the null pointer well-defined would prevent the compiler from catching
*other* errors, which remain errors whether or not 232 is adopted.  The
point of my comment was that your example didn't appear to actually
include any *other* errors.

Surely you're not suggesting that dereferencing the null pointer remain
UB just so that more code can be labelled erroneous.  ;-)

> > Passing a null pointer to a
> > function that takes a pointer is, in the general case, not an error.
>
> ?
>

I'm not sure what the question mark means.  The compiler is not
entitled to assume that passing a null pointer to a function is an
error.  Are you disagreeing?  (I mentioned this because that was the
only possible "other error" I noticed in the example...but, again, I
may be missing something.)

> For example, in template code you might have an iterator or a raw
> pointer, you don't know which, and write &*i to get at the raw pointer.  Now
> with the suggested no-UB for that operation for raw pointers, that code
> snippet is guaranteed to work as long as it's tested with only raw pointers,
> some of which are (contrary to expectation) nullpointers.  Give it an iterator
> where the final result is 0 (again, contrary to expectations), and it might
> blow up, and that could easily have been uncovered with the raw pointer
> testing if the compiler was allowed to add in a little check there.

I don't follow this at all, I think mainly because I don't understand
what you mean by "the final result is 0."  Maybe a simple code example
would clarify.

>
> In short, with *((T*)0) as UB and SeriousNoNo we know what to avoid.
>

But why should we avoid it?  What's intrinsically wrong with evaluating
*((T*)0)?  Such an expression has useable properties (a static type, a
size, and an address).  What's wrong with using those properties?

It sounds like you are saying that this should be undefined behavior
because its occurrence might indicate some other error elsewhere in the
code.  Am I understanding correctly?

> > Maybe I'm missing something: as I read the example, it does exhibit
> > undefined behavior (binding a reference to an lvalue which doesn't
> > refer to a valid object),
>
> Nope, para 5.2.8/2 (inconsistently, for sure!) explicitly allows dereferencing
> a nullpointer in this context,

Ahh, right...I forgot about that.  In which case, I don't see what the
undetected error is supposed to be.  Put another way, I don't see why I
shouldn't be allowed to write code with exactly the semantics show in
your example.  I don't see anything there that the compiler should be
entitled to assume is an error.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Mon, 24 Oct 2005 15:24:09 GMT Raw View

* johnchx2@yahoo.com:
> * Alf P. Steinbach:
> >
>
> The question, "What's empty in an empty lvalue?" sounds more
> metaphysical than technical.

I agree, it's a meaningless and misleading term. ;-)


> > > That's because there's no error in main().
> >
> > There is: derefencing p, even just for immediately applying &, is Undefined
> > Behavior.
>
> Not if 232 were adopted.
>
> I thought that the point of your example was that making dereferencing
> the null pointer well-defined would prevent the compiler from catching
> *other* errors, which remain errors whether or not 232 is adopted.  The
> point of my comment was that your example didn't appear to actually
> include any *other* errors.
>
> Surely you're not suggesting that dereferencing the null pointer remain
> UB just so that more code can be labelled erroneous.  ;-)

More code does _not_ become erronous by keeping the current rules.

Changing the rules means that code that is currently formally incorrect, e.g.
that example, can't be detected as such automatically.

That means that code that's based on flawed thinking and/or based on incorrect
assumptions (non-nullness), can't then any more be detected as such
automatically, in all cases.

Changing the rules also makes currently correct code (at the coding/design
level) incorrect, as shown by the example below.

So, instead of more code becoming erronous by keeping the rules, as you
suggest above, the actual effect is that existing code becomes erronous by
changing the rules, that that in turn may mean that more new code will become
erronous, that changing the rules also restricts what tools can do for us,
that also that in turn may mean that more new code will become erronous, and
to use rhetoric, I'll not mention the conceptual clutter of the changed rules.


> > > Passing a null pointer to a
> > > function that takes a pointer is, in the general case, not an error.
> >
> > ?
>
> I'm not sure what the question mark means.

It means: what do you mean, please clarify?

It's like those long explanations that lvalues are expressions.

Yes, they're true in and of themselves, literally and seemingly trivially, but
surely they (and the above) must mean something, express a point?


> The compiler is not
> entitled to assume that passing a null pointer to a function is an
> error.  Are you disagreeing?

This is another example, a seemingly trivially true statement that for the
life of me I can't see the point of  --  what does it mean?


> (I mentioned this because that was the
> only possible "other error" I noticed in the example...but, again, I
> may be missing something.)

Yes, it seems that you're missing the correlation between (1) formal language
level errors or UB, and (2) in-practice coding and design level errors.

Dereferencing the pointer is (1), passing it to the function is (2), and
they're both caused by the same incorrect assumption of non-nullness: they're
strongly correlated.

Currently we can, in principle, at least, detect (1), for this case,
automatically.

That then detects the incorrect assumption or incorrect higher level usage
supplying this null pointer.

And that in turn can detect and prevent (2).

Which it won't with the added *((T*)0) is OK of resolution 232.

So that part of 232 should be removed.


> > For example, in template code you might have an iterator or a raw
> > pointer, you don't know which, and write &*i to get at the raw pointer.  Now
> > with the suggested no-UB for that operation for raw pointers, that code
> > snippet is guaranteed to work as long as it's tested with only raw pointers,
> > some of which are (contrary to expectation) nullpointers.  Give it an iterator
> > where the final result is 0 (again, contrary to expectations), and it might
> > blow up, and that could easily have been uncovered with the raw pointer
> > testing if the compiler was allowed to add in a little check there.
>
> I don't follow this at all, I think mainly because I don't understand
> what you mean by "the final result is 0."  Maybe a simple code example
> would clarify.

Well, the simplest code example was probably (I think) the one I gave, short
and succinct, to-the-point, nothing extraneous, and that was apparently not
sufficiently clarifying.  But OK.  Here's some faulty code:

    #include    <cassert>       // assert

    template< typename Ptr >
    char const* charPointerFrom( Ptr const& p )
    {
        // This function erronously assumes p is guaranteed non-null.
        // It's specification requires it to handle nulls.
        // #232 would make this code correct at the language level, thereby
        // breaking the SafePtr code shown below, and also removing the
        // possibility of detecting the error by program transformation.
        return &*p;     // For simple smart-pointer, yields a raw pointer.
    }

    // This would be a template class with smart-pointer machinery, all
    // omitted to concentrate on operator*().
    class SafePtr
    {
    private:
        char const*     myPointer;
    public:
        SafePtr( char const* p ): myPointer( p ) {}

        operator bool() const { return !!myPointer; }
        char const& operator*() const
        {
            // This assert is a practical way to detect wayward nulls.
            // It doesn't break code that conforms to the C++98 standard,
            // i.e. its specification is to allow all dereferencing that the
            // standard allows for raw pointers, that one should be able to
            // use a SafePtr whereever a raw pointer was previously used.
            // The proposed null-pointer non-UB in #232 breaks this code
            // by invalidating its assumptions.
            assert( myPointer != 0 );
            return *myPointer;
        }
    };

    SafePtr aSafePtr( int x ) { return (x == 1? "OK!" : 0); }

    int main( int argc, char ** )
    {
        charPointerFrom( aSafePtr( argc ) );
    }


> > In short, with *((T*)0) as UB and SeriousNoNo we know what to avoid.
> >
>
> But why should we avoid it?  What's intrinsically wrong with evaluating
> *((T*)0)?  Such an expression has useable properties (a static type, a
> size, and an address).  What's wrong with using those properties?

Nothing wrong with using the _properties_: you can do that without
dereferencing.

That's what traits classes and '==', '!' etc. are for.

Allowing *((T*)0), on the other hand, allows wayward nullpointers to
propagate, undetected, and that's UnGood.


> It sounds like you are saying that this should be undefined behavior
> because its occurrence might indicate some other error elsewhere in the
> code.  Am I understanding correctly?

Yes, sort of... ;-)

But it's stronger than that.

In the example above I purposefully included a case where the proposed #232
resolution breaks existing (and I believe not at all uncommon) code, by
invalidating its assumptions.


> > > Maybe I'm missing something: as I read the example, it does exhibit
> > > undefined behavior (binding a reference to an lvalue which doesn't
> > > refer to a valid object),
> >
> > Nope, para 5.2.8/2 (inconsistently, for sure!) explicitly allows dereferencing
> > a nullpointer in this context,
>
> Ahh, right...I forgot about that.  In which case, I don't see what the
> undetected error is supposed to be.  Put another way, I don't see why I
> shouldn't be allowed to write code with exactly the semantics show in
> your example.  I don't see anything there that the compiler should be
> entitled to assume is an error.

Well, that example was a bit more intricate than the first extremely short
one, which I didn't manage to make clarifying enough.

I think your own "forgot about that" is testimony that the code, as written by
SomeOne, was probably not intentionally correct at the language level (as of
C++98): that was just how things arbitrarily turned out.

The essence is that you have a code/design-level error (the use of typeid(*0))
that's currently not UB, i.e. it's currently OK _at the language level_, due
to an inconsistency in the standard.  That code/design-level error would still
have been detected (it's guaranteed to produce an exception), were it not for
our hired help's help in swallowing the exception silently, which means it's
an undetected code/design-level error that is still free to produce incorrect
results in other usage contexts.  If 232 more reasonably resolved the
inconsistency in the standard by simply removing the null-pointer-
dereferencing-support for typeid, then this code/design-level error could be
detected automatically, even with the hired help's code in between, but as the
proposed resolution of 232 is, the code will then just be a bomb waiting to go
off when someone else uses foo().

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: johnchx2@yahoo.com
Date: Mon, 24 Oct 2005 21:26:34 CST Raw View

Alf P. Steinbach wrote:
> * johnchx2@yahoo.com:
> > * Alf P. Steinbach:
> > >
> >
> > The question, "What's empty in an empty lvalue?" sounds more
> > metaphysical than technical.
>
> I agree, it's a meaningless and misleading term. ;-)
>

Cute.  But, as I'm sure you understood, my point was that your
*question* is meaningless.  The *term* is perfectly well defined and --
IMHO -- not especially hard to understand.

> More code does _not_ become erronous by keeping the current rules.
>

Well, by "more," I simply meant that the code exhibiting UB if
dereferencing a null pointer is well defined is a proper subset of the
code exhibiting UB if dereferencing a null pointer is undefined.  My
point being that, if I understand you, making dereferencing a null
pointer UB isn't an end in itself, but rather helps the compiler detect
other errors.

Precisely what the "current rules" are is, of course, somewhat
ambiguous.

> That means that code that's based on flawed thinking and/or based on incorrect
> assumptions (non-nullness), can't then any more be detected as such
> automatically, in all cases.

I think that's the core of our disagreement: you seem to assume that
dereferencing a pointer reveals an assumption that the pointer is
non-null, while in fact it may just as well reveal an assumption that
dereferencing a null pointer -- by itself -- is not an error.  This
assumption, or at least the assumption that this was the intent of the
standard, appears to be shared by at least two members of the standards
committee (Bill Gibbons and Tom Plum), and they don't seem to have been
shouted down.

> Yes, it seems that you're missing the correlation between (1) formal language
> level errors or UB, and (2) in-practice coding and design level errors.
>
> Dereferencing the pointer is (1), passing it to the function is (2), and
> they're both caused by the same incorrect assumption of non-nullness: they're
> strongly correlated.

Ah, let me try to be clearer: I claim that passing a null pointer to a
function is not an error on either level.  (A function which *receives*
a pointer parameter and assumes that it is non-null *does* have an
error of your second type.)

So, to summarize: if user code dereferences a pointer but never
attempts to read or write through it, nor binds the result to a
reference, and subsequently passes the pointer to a function, I see no
grounds to believe that the author of the code implicitly assumed that
the pointer was non-null.

Now, of course, it's perfectly possible that the programmer *did*
assume that the pointer was non-null.  But I'd prefer that the compiler
not try to read minds.

>     // This would be a template class with smart-pointer machinery, all
>     // omitted to concentrate on operator*().
>     class SafePtr
>     {
>     private:
>         char const*     myPointer;
>     public:
>         SafePtr( char const* p ): myPointer( p ) {}
>
>         operator bool() const { return !!myPointer; }
>         char const& operator*() const
>         {
>             // This assert is a practical way to detect wayward nulls.
>             // It doesn't break code that conforms to the C++98 standard,
>             // i.e. its specification is to allow all dereferencing that the
>             // standard allows for raw pointers, that one should be able to
>             // use a SafePtr whereever a raw pointer was previously used.
>             // The proposed null-pointer non-UB in #232 breaks this code
>             // by invalidating its assumptions.
>             assert( myPointer != 0 );
>             return *myPointer;
>         }
>     };
>

If it were a template class, and it were instantiated with a
polymorphic type, then it would not meet the specification.  In
particular, it would not work as expected in a typeid expression.

But that's beside the point.  The real problem with the above is the
rather impishly clever underspecification of its semantics.  The
usefulness of such a class in real code would be that it provides a
guaranteed behavior (such as assert() or throwing an exception) when
the attempt is made to dereference a null pointer.  I'm suspicious of
whether such underspecification is really all that widespread.  And if
its semantics were specified in a more natural and useful fashion
("operator*() asserts if the managed pointer is null"), then changing
the standard (if it really is a change) wouldn't "break" either it or
any code that uses it.

> Allowing *((T*)0), on the other hand, allows wayward nullpointers to
> propagate, undetected, and that's UnGood.

My theme again: to catch for "error X," test for "error X," not
"condition Y which, in itself, is harmless, but which might indicate an
intention to commit error X at some point down the road."

*((T*)0) doesn't propagate anything.  If you want to prevent null
pointers from propagating, catch them where and when the propagate.

> > > > Maybe I'm missing something: as I read the example, it does exhibit
> > > > undefined behavior (binding a reference to an lvalue which doesn't
> > > > refer to a valid object),
> > >
> > > Nope, para 5.2.8/2 (inconsistently, for sure!) explicitly allows dereferencing
> > > a nullpointer in this context,
> >
> > Ahh, right...I forgot about that.  In which case, I don't see what the
> > undetected error is supposed to be.  Put another way, I don't see why I
> > shouldn't be allowed to write code with exactly the semantics show in
> > your example.  I don't see anything there that the compiler should be
> > entitled to assume is an error.
>
> Well, that example was a bit more intricate than the first extremely short
> one, which I didn't manage to make clarifying enough.
>
> I think your own "forgot about that" is testimony that the code, as written by
> SomeOne, was probably not intentionally correct at the language level (as of
> C++98): that was just how things arbitrarily turned out.
>
> The essence is that you have a code/design-level error (the use of typeid(*0))
> that's currently not UB, i.e. it's currently OK _at the language level_, due
> to an inconsistency in the standard.  That code/design-level error would still
> have been detected (it's guaranteed to produce an exception), were it not for
> our hired help's help in swallowing the exception silently, which means it's
> an undetected code/design-level error that is still free to produce incorrect
> results in other usage contexts.

Again, I have to ask: what is the alleged code/design error?  Is the
code, as written, guaranteed to be not what the author intended?
Remember, the compiler doesn't get to read the spec.  Evaluating typeid
and ignoring any errors could very well be just what this function is
supposed to do.  Why should it be illegal?

It's one thing to say, "This code looks suspicious, let's emit a
warning."  It's another to say, "This is an error and I'm going to
refuse to compile it."  And yet another to say, "This is UB, so I'm
going to shoot a puppy, crash the rocket, re-format your hard drive,
and whatever else comes to mind."  You're arguing that this should be a
puppy-shooting offense, when it's not self-evident that it's even wrong.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net
Date: Mon, 24 Oct 2005 22:26:48 CST Raw View

Alf P. Steinbach wrote:
> * johnchx2@yahoo.com:
> > * Alf P. Steinbach:
.
> > Surely you're not suggesting that dereferencing the null pointer remain
> > UB just so that more code can be labelled erroneous.  ;-)
>
> More code does _not_ become erronous by keeping the current rules.

I think he meant something else. I think he was comparing adopting the
proposed change, with adopting the proposed change except that
dereferencing null pointers would remain erroneous.

> Changing the rules means that code that is currently formally incorrect, e.g.
> that example, can't be detected as such automatically.

Which isn't a problem, if the only reason it's currently incorrect is
because the standard says so.

> That means that code that's based on flawed thinking and/or based on incorrect
> assumptions (non-nullness), can't then any more be detected as such
> automatically, in all cases.

It's not clear to me that any of those issues apply. Any attempt to
either read or write through an empty lvalue would continue to be
undefined behavior, and as such can still be automatically checked in
the fashion you imply. If all you're doing with the empty lvalue is
taking it's address, how does that indicate flawed thinking/incorrect
assumptions? If the standard is changed to make this syntax legal, how
is non-nullness of p an assumption implied by the use of &*p? Even for
existing code, where that assumption would be implied, how does the
fact that &*p no longer has undefined behavior cause a problem?

> > > > Passing a null pointer to a
> > > > function that takes a pointer is, in the general case, not an error.
> > >
> > > ?
> >
> > I'm not sure what the question mark means.
>
> It means: what do you mean, please clarify?

Yes, but it's not clear what it is that need clarification. You gave an
example of code which, according to you contains an "error in main"
that cannot be detected if the resolution to DR 232 is approved. In
that code, foo() dereferences a null pointer, which is not an error,
assuming that DR 232 is approved. This dereferenced null pointer is
then passed to typeid(). Since T has a virtual destructor, typeid() is
required to actually read the contents of the argument passed to it, in
order to determine it's type. It's perfectly feasible, under DR 232, to
detect this problem at the point of the read, rather than in the
dereferencing of 'p', as would be the case under the current standard.

Therefore, he was trying to guess which error in your example was the
one that you thought would be rendered undetectable. He made a guess
that you were suggesting that passing a null pointer as a function
argument was this error. If it wasn't the error you were referring to,
the best answer would have been to identify the error that had been
rendered indetectable. Responding with "?" doesn't do that.

> > (I mentioned this because that was the
> > only possible "other error" I noticed in the example...but, again, I
> > may be missing something.)
>
> Yes, it seems that you're missing the correlation between (1) formal language
> level errors or UB, and (2) in-practice coding and design level errors.
>
> Dereferencing the pointer is (1), passing it to the function is (2), and
> they're both caused by the same incorrect assumption of non-nullness: they're
> strongly correlated.

Why do you think that passing a pointer to a function implies an
assumption of non-nullness? I pass null pointers to functions all the
time, as a matter of deliberate and (IMO) reasonable design. In
general, I use it as a flag; a non-null pointer means "perform certain
actions on the object pointed at by this pointer". A null pointer means
"skip those actions". C++ provides a lot of other ways to achieve the
same effect, but I don't see anything wrong with this particular way of
doing it. If a null pointer isn't a valid value for a given argument,
you should consider changing that argument to a reference.

> Well, the simplest code example was probably (I think) the one I gave, short
> and succinct, to-the-point, nothing extraneous, and that was apparently not
> sufficiently clarifying.  But OK.  Here's some faulty code:
>
>     #include    <cassert>       // assert
>
>     template< typename Ptr >
>     char const* charPointerFrom( Ptr const& p )
>     {
>         // This function erronously assumes p is guaranteed non-null.
>         // It's specification requires it to handle nulls.
>         // #232 would make this code correct at the language level, thereby
>         // breaking the SafePtr code shown below, and also removing the
>         // possibility of detecting the error by program transformation.
>         return &*p;     // For simple smart-pointer, yields a raw pointer.
>     }
>
>     // This would be a template class with smart-pointer machinery, all
>     // omitted to concentrate on operator*().
>     class SafePtr
>     {
>     private:
>         char const*     myPointer;
>     public:
>         SafePtr( char const* p ): myPointer( p ) {}
>
>         operator bool() const { return !!myPointer; }
>         char const& operator*() const
>         {
>             // This assert is a practical way to detect wayward nulls.
>             // It doesn't break code that conforms to the C++98 standard,
>             // i.e. its specification is to allow all dereferencing that the
>             // standard allows for raw pointers, that one should be able to
>             // use a SafePtr whereever a raw pointer was previously used.
>             // The proposed null-pointer non-UB in #232 breaks this code
>             // by invalidating its assumptions.
>             assert( myPointer != 0 );
>             return *myPointer;
>         }
>     };
>
>     SafePtr aSafePtr( int x ) { return (x == 1? "OK!" : 0); }
>
>     int main( int argc, char ** )
>     {
>         charPointerFrom( aSafePtr( argc ) );
>     }

I may be missing something here. Under the current standard, if
argc!=1, then aSafePtr returns a SafePtr.myPointer is initialized with
0, making it a null pointer. When operator*() is called, it checks
whether myPointer is null, and finds that it is, and the assert()
triggers. This is true under both the current standard and the proposed
change.

Are you worried about the case where NDEBUG is #defined? In that case,
the current standard makes the behavior of "*myPointer" undefined. The
proposed change would allow "*myPointer", and the "&*p" expression in
charPointerFrom() would be required to have a null pointer as it's
value. The behavior of the code has gone from undefined to defined,
with no loss of functionality that I can see. Why do you see this as a
problem?

> > But why should we avoid it?  What's intrinsically wrong with evaluating
> > *((T*)0)?  Such an expression has useable properties (a static type, a
> > size, and an address).  What's wrong with using those properties?
>
> Nothing wrong with using the _properties_: you can do that without
> dereferencing.
>
> That's what traits classes and '==', '!' etc. are for.
>
> Allowing *((T*)0), on the other hand, allows wayward nullpointers to
> propagate, undetected, and that's UnGood.

Why should they be detected at that point, rather than at the point
where they actually cause problems? The only problem caused directly by
this code is the fact that the behavior is currently undefined, and
that would no longer be the case if the proposed resolution is
approved. Other code which might follow on after this code might cause
problems, but it's precisely that other code which should, ideally
trigger warnings.

> > It sounds like you are saying that this should be undefined behavior
> > because its occurrence might indicate some other error elsewhere in the
> > code.  Am I understanding correctly?
>
> Yes, sort of... ;-)
>
> But it's stronger than that.
>
> In the example above I purposefully included a case where the proposed #232
> resolution breaks existing (and I believe not at all uncommon) code, by
> invalidating its assumptions.

I don't call changing undefined behavior into defined behavior
"breaking". Could you give an example of code that would, given certain
imputs, have well-defined behavior under the current standard, which
would have either different behavior or undefined behavior under the
proposed resolution to DR 232? That would be something I'd be willing
to call "breaking".

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Tue, 25 Oct 2005 17:48:46 CST Raw View

* kuyper@wizard.net:
> [...]
[Alf hit the "Send" button too soon!  And now he regrets.]

With regard to the follow-up I just sent, please ignore the static_cast
counter-example: a _micro-second_ after I clicked the "Send" button I of
course thought of a situation where that would change things, namely
conversion detection in template code.

The reference counter-example, though, stands, and one only needs one
counter-example.

So the question also stands: is the logic/requirement, on consideration,
reasonable and valid?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Tue, 25 Oct 2005 22:46:05 GMT Raw View

* kuyper@wizard.net:
> * Alf P. Steinbach:
> >
> > In the example above I purposefully included a case where the proposed #232
> > resolution breaks existing (and I believe not at all uncommon) code, by
> > invalidating its assumptions.
>
> [snipped a lot because it's either off-topic, who meant what, or rehash]
>
> I don't call changing undefined behavior into defined behavior
> "breaking". Could you give an example of code that would, given certain
> imputs, have well-defined behavior under the current standard, which
> would have either different behavior or undefined behavior under the
> proposed resolution to DR 232? That would be something I'd be willing
> to call "breaking".

Is that, on consideration, reasonable, valid logic?

Let's apply that logic to the example I showed, where &*p fails miserably when
p is a smart-pointer encapsulating a null-pointer (as it must fail, no matter
how operator* is implemented by the smart-pointer, assert or not).

That example is key, because it means that the only expression the new rules
would make meaningful, &*p, doesn't work generically in templated code, that
_100%_ of what the proposed change allows, is generally a show-stopper, a
source of confusion and bugs, and not an enabler for anything.

But with the logic above the solution to that, again, is obvious: let's allow
binding references to "empty lvalues" such as *p when p is 0, although of
course not actually using such a reference for anything but applying & to it.
Then if p is a smart-pointer nothing bad happens with &*p.  All OK!

After all,

   could you give an example of code that would, given certain inputs, have
   well-defined behavior under the current standard, which would have
   either different behavior or undefined behavior under this proposal?

That would be something I'd be willing to call "breaking".

But hey, why stop there?

It's downright awkward to have to write static_cast (or the C cast) when
downcasting.  Let's make downcasting an implicit conversion!  Yep!

After all,

   could you give an example of code that would, given certain inputs, have
   well-defined behavior under the current standard, which would have
   either different behavior or undefined behavior under this proposal?

That would be something I'd be willing to call "breaking".

And so on, although my example-generator shut down there.

Is your requirement, on consideration, reasonable, valid logic?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kuyper@wizard.net
Date: Wed, 26 Oct 2005 10:59:54 CST Raw View

First off, I want to apologize for failing to pay adequate attention to
one key point that was raised earlier in this thread. Section 5.2.8p2
makes the behavior of typeid(*p), where p is a pointer to a polymorphic
type with a null value, well-defined.

Alf P. Steinbach wrote:
> * kuyper@wizard.net:
> > * Alf P. Steinbach:
> > >
> > > In the example above I purposefully included a case where the proposed #232
> > > resolution breaks existing (and I believe not at all uncommon) code, by
> > > invalidating its assumptions.
> >
> > [snipped a lot because it's either off-topic, who meant what, or rehash]
> >
> > I don't call changing undefined behavior into defined behavior
> > "breaking". Could you give an example of code that would, given certain
> > imputs, have well-defined behavior under the current standard, which
> > would have either different behavior or undefined behavior under the
> > proposed resolution to DR 232? That would be something I'd be willing
> > to call "breaking".
>
> Is that, on consideration, reasonable, valid logic?

It's not a matter of valid or invalid logic, it's a matter of
reasonable or unreasonable definitions. If the one version of the
standard says that the behavior of a program is undefined, and a newer
version defines the behavior, then the new behavior allowed by the new
version was also allowed by the old version. How can you justify saying
that a proposed change would break code, if the proposed required
behavior is allowed behavior under the old standard?

In any event, after understanding how 5.2.8p2 applies in this
situation, I'm left with a different argument. The proprosed resolution
for DR 232 does not involve repealing 5.2.8p2. Therefore, the example
code you gave has exactly the same well-defined behavior (the
typeid(*p) expression throw()s std::bad_typeid), regardless of whether
or not the proposed resolution of 232 is approved. How can you jusfity
saying that a proposed change would break code, if the required
behavior with the change is the same as the required behavior without
it?

> Let's apply that logic to the example I showed, where &*p fails miserably when
> p is a smart-pointer encapsulating a null-pointer (as it must fail, no matter
> how operator* is implemented by the smart-pointer, assert or not).

In particular, the assert() is triggered, whether or not the proposed
resolution to 232 is approved. Therefore, how is this a problem?

> That example is key, because it means that the only expression the new rules
> would make meaningful, &*p, doesn't work generically in templated code, that
> _100%_ of what the proposed change allows, is generally a show-stopper, a
> source of confusion and bugs, and not an enabler for anything.

The proposed change allows &*p to be, in effect, a synonym for p, just
as it is in C99. However, since C++ allows operator overloading, you
can't expect such identities to work when applied to cases where
user-defined operator overloads interfere with the identity. This is no
different from the fact that if a++ means the same thing as (a=a+1), a
fact that ceases to be true if applicable operator overloads for  ++,
=, or + don't cooperate in a way that honors such an identity.

The key question is why is this valuable? Why would anyone write "&*p"
when they could have simply written "p"? I honestly don't know, and I
was hoping that someone actively proposing this change could suggest a
reason. I imagine it might be possible to come up with an example where
template code intended for a general case happens to be equivalent to
&*p in some special case, and they wanted to not have to treat that
case as special. However, I can't think of a good example like that
myself.

Reading the DR, the main argument seems to be that there's no good
reason for forbidding it; that seems good enough for me. Every case
where &*p would actually lead to a problem, the problem can be detected
somewhere else other than the *p, and therefore that is where a
diagnostic should be triggered.

> But with the logic above the solution to that, again, is obvious: let's allow
> binding references to "empty lvalues" such as *p when p is 0, although of
> course not actually using such a reference for anything but applying & to it.
> Then if p is a smart-pointer nothing bad happens with &*p.  All OK!
>
> After all,
>
>    could you give an example of code that would, given certain inputs, have
>    well-defined behavior under the current standard, which would have
>    either different behavior or undefined behavior under this proposal?
>
> That would be something I'd be willing to call "breaking".
>
> But hey, why stop there?
>
> It's downright awkward to have to write static_cast (or the C cast) when
> downcasting.  Let's make downcasting an implicit conversion!  Yep!

There's clearly a value in requiring downcasting to be explicit. It
allows you to catch errors at compile time that could otherwise be
detected only by their effects. Making &*p legal simply requires error
detection to be moved to a more apropriate location; and in almost all
cases, the only error detection that could be done, with or without
that change, would be at runtime.

>    could you give an example of code that would, given certain inputs, have
>    well-defined behavior under the current standard, which would have
>    either different behavior or undefined behavior under this proposal?
>
> That would be something I'd be willing to call "breaking".

Well, as a matter of fact, I wouldn't describe such a change as having
broken code. I would describe it as making the language less safe;
which doesn't apply to &*p.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]