Topic: Defect report: lvalue-to-rvalue conversion on one-past-the-end
Author: "Richard Smith"<richard@metafoo.co.uk>
Date: Tue, 8 Nov 2011 11:31:15 -0800 (PST)
Raw View
Hi,
In C89 and C99, it is not legal to (effectively) perform an lvalue-to-rvalue
conversion on a one-past-the-end pointer. C99 TC3 6.5.6/8 says the following
about the result of adding a pointer and an integer:
"If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated."
C++ copies the entire paragraph as [expr.add]/5, except for that one sentence.
Further, C++11 [basic.compound]/3 says:
"If an object of type T is located at an address A, a pointer of type cv T*
whose value is the address A is said to point to that object, regardless of
how the value was obtained. [ Note: For instance, the address one past the end
of an array (5.7) would be considered to point to an unrelated object of the
array s element type that might be located at that address. There are further
restrictions on pointers to objects with dynamic storage duration; see
3.7.4.3. end note ]"
This seems tantamount to saying that such pointers are legitimate to use, if
they happen to point to an object of the right type. Not only is this
problematic in theory, it's also problematic in practice -- existing compilers
for C++ optimize using the C rules, for instance for alias analysis. Consider
the following example:
#include<cstdint>
int main(int argc, char **argv) {
long long a[1] = { 10 }, b = 20;
if (a + 1 ==&b)
a[argc] = argc; // a[1] == *(a+1), which points to b,
// regardless of how the value was obtained
else if (&b + 1 == a)
a[-argc] = argc; // a[-1] == *(a-1), and a points one past
// the end of b, so this points to b
else
return 0;
return b;
}
Remarkably, when run with argc == 1, this does not appear to have undefined
behaviour; clearly it should! When built with many optimizing compilers this
code "incorrectly" returns 20.
In addition, this causes trouble for the [expr.const]/2 restriction that only
the active member of a union can have an lvalue-to-rvalue conversion performed
on it. Consider:
struct A {}; struct B {};
union U {
constexpr U(A, int n) : a(n) {}
constexpr U(B, int n) : b(n) {}
int a, b;
};
constexpr U arr[] = { { A(), 0 }, { B(), 0 } };
constexpr int *p =&arr[0].a; // ok
constexpr int *q = p + 1; // ok, one-past-the-end pointer
// points to arr[1].a and arr[1].b
constexpr int r = *q; // does this refer to a non-active member of the union?
C++ should adopt the C rule, and disallow lvalue-to-rvalue conversions on
lvalues which refer to objects outside the subobject from which the lvalue is
derived.
[expr.add]/5 also only allows a one-past-the-end pointer to be formed by
adding 1 to a pointer. In particular:
int xs[2];
int *p = xs + 2; // undefined behaviour!
int *q = xs + 1 + 1; // ok!
int *r = q - 1 - 1; // ok!
int *s = q - 2; // undefined behaviour!
This oddity is inherited from C, but such uses are so overwhelmingly common
that they should be permitted.
--
Richard
[ comp.std.c++ is moderated. To submit articles, try posting with your ]
[ newsreader. If that fails, use mailto:std-cpp-submit@vandevoorde.com ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: =?windows-1252?Q?Daniel_Kr=FCgler?=<daniel.kruegler@googlemail.com>
Date: Fri, 11 Nov 2011 13:03:44 -0800 (PST)
Raw View
Am 08.11.2011 20:31, schrieb Richard Smith:
> Hi,
>
> In C89 and C99, it is not legal to (effectively) perform an lvalue-to-rvalue
> conversion on a one-past-the-end pointer. C99 TC3 6.5.6/8 says the following
> about the result of adding a pointer and an integer:
>
> "If the result points one past the last element of the array object, it
> shall not be used as the operand of a unary * operator that is evaluated."
>
> C++ copies the entire paragraph as [expr.add]/5, except for that one sentence.
> Further, C++11 [basic.compound]/3 says:
>
> "If an object of type T is located at an address A, a pointer of type cv T*
> whose value is the address A is said to point to that object, regardless of
> how the value was obtained. [ Note: For instance, the address one past the end
> of an array (5.7) would be considered to point to an unrelated object of the
> array s element type that might be located at that address. There are further
> restrictions on pointers to objects with dynamic storage duration; see
> 3.7.4.3. end note ]"
>
> This seems tantamount to saying that such pointers are legitimate to use, if
> they happen to point to an object of the right type. Not only is this
> problematic in theory, it's also problematic in practice -- existing compilers
> for C++ optimize using the C rules, for instance for alias analysis. Consider
> the following example:
>
> #include<cstdint>
> int main(int argc, char **argv) {
> long long a[1] = { 10 }, b = 20;
> if (a + 1 ==&b)
> a[argc] = argc; // a[1] == *(a+1), which points to b,
> // regardless of how the value was obtained
> else if (&b + 1 == a)
> a[-argc] = argc; // a[-1] == *(a-1), and a points one past
> // the end of b, so this points to b
> else
> return 0;
> return b;
> }
>
> Remarkably, when run with argc == 1, this does not appear to have undefined
> behaviour; clearly it should! When built with many optimizing compilers this
> code "incorrectly" returns 20.
>
> In addition, this causes trouble for the [expr.const]/2 restriction that only
> the active member of a union can have an lvalue-to-rvalue conversion performed
> on it. Consider:
>
> struct A {}; struct B {};
> union U {
> constexpr U(A, int n) : a(n) {}
> constexpr U(B, int n) : b(n) {}
> int a, b;
> };
> constexpr U arr[] = { { A(), 0 }, { B(), 0 } };
> constexpr int *p =&arr[0].a; // ok
> constexpr int *q = p + 1; // ok, one-past-the-end pointer
> // points to arr[1].a and arr[1].b
> constexpr int r = *q; // does this refer to a non-active member of the union?
>
> C++ should adopt the C rule, and disallow lvalue-to-rvalue conversions on
> lvalues which refer to objects outside the subobject from which the lvalue is
> derived.
>
> [expr.add]/5 also only allows a one-past-the-end pointer to be formed by
> adding 1 to a pointer. In particular:
>
> int xs[2];
> int *p = xs + 2; // undefined behaviour!
> int *q = xs + 1 + 1; // ok!
> int *r = q - 1 - 1; // ok!
> int *s = q - 2; // undefined behaviour!
>
> This oddity is inherited from C, but such uses are so overwhelmingly common
> that they should be permitted.
Forwarded to CWG.
Greetings from Bremen,
Daniel Kr gler
--
[ comp.std.c++ is moderated. To submit articles, try posting with your ]
[ newsreader. If that fails, use mailto:std-cpp-submit@vandevoorde.com ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]