Topic: self-referential inserts in std::vector
Author: cybermax_69@yahoo.com (Piyo)
Date: Wed, 7 Feb 2007 06:41:42 GMT Raw View
I incited a major debate as to whether or not self-referential
inserts are allowed (or not) by the C++ standard. What do I mean
by self-referential insert? This is a simplistic description of
the issue:
(Code 0)
std::vector<int> v;
v.push_back(0);
v.push_back( v[0] ); // this is the self-referential insert.
I advocated that this is legal and should not crash. Others say,
that since the validity of the reference/iterator/pointer passed to
push_back, gets invalidated when a reallocation of memory occurs,
it is not safe to do the following (extending the simplistic code
to demonstrate it)
(Code 1)
std::vector<int> v;
v.push_back(0);
for( int i=0; i < 10000; ++i )
{
v.push_back( v[0] );
}
instead my opponents argue that the "correct" way (as suggested by
the C++ standards) is to insure that the reference/iterator/pointer
never gets invalidated via a temp copy:
(Code 2)
std::vector<int> v;
v.push_back(0);
for( int i=0; i < 10000; ++i )
{
int temp = v[0]; // make a temp copy prior to insertion
v.push_back( temp );
}
While I do agree that doing (Code 2) does guarantee that the lifetime
of the reference is not invalidated during the call, I still do not
believe that (Code 1) is necessarily incorrect. So I requested them to
furnish proof that the C++ standard does indeed prohibit (Code 1).
Here is the reply:
-----Begin Quote----------------------------------------------------
It is the ISO C++ Standard that you disagree with, not me:
Section 23.2.4.2.5 says (regarding std::vector<T> capacity):
"Notes: Reallocation invalidates all the references, pointers, and
iterators referring to the elements in the sequence. It is guaranteed
that no reallocations takes place during insertions that happen after a
call to reserve() until the time when an insertion would make the size
of the vector greater than the size specified in the most recent call to
reserve()."
Section 23.2.4.3.1 says (regarding std::vector<T>::insert):
"Notes: Causes reallocation if the new size is greater than the old
capacity. If no reallocation happens, all the iterators and references
before the insertion point remain valid. If an exception is thrown other
than by the copy constructor or assignment operator of T there are no
effects."
[There is no guarantee from the standard that the reference will not
become invalidated on reallocation.]
Though not normative, Scott Meyers addresses this very point in
"Effective STL", Item 5, page 27:
"Using iterative calls to insert in an explicit loop, it would probably
look more or less like this:
vector<int>::iterator insertLoc(v.begin());
for (int i = 0; i < numValues; ++i) {
insertLoc = v.insert(insertLoc,data[i]);
}
"Note how we have to be careful to save the return value of insert [!]
for the next loop iteration. If we didn't update insertLoc after each
insertion, we'd have two problems. First, all loop operations after the
first would yield undefined behavior, because each insert call would
invalidate insertLoc. [!]"
-----End Quote----------------------------------------------------
Now after reading this, all I can gather is a set of post-conditions
from the C++ standards that he quoted. Unfortunately, I myself do not
own a copy of the C++ standards so I also cannot confirm that there does
indeed exist a pre-condition that prohibits me from doing a
self-referential insert. Due to the lack of accessibility on my part, I
turned to the different implementations I could get my hands on:
Microsoft's VC++ 8.0 and GNU's C++ implementation of
std::vector::push_back.
Here is what I have found:
Microsoft does indeed create a temporary copy of the object to insert
prior to reallocating new memory (and thus deleting the old buffer).
GNU on the other hand, keeps the old buffer around until after the
insertion is complete before delete the old buffer. In both cases,
self-referential inserts are valid. This still does not satisfy
everyone. Why? It is ultimately, vendor/implementor dependent as to
whether or not self-referential inserts is handled properly or not.
So I throw my hands up and throw it to the community here to gather your
opinions on this matter. I hope that all of you can help me understand
why self-insertion is not allowed by the C++ standard.
Thank you all for your time with this matter.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: "James Kanze" <james.kanze@gmail.com>
Date: Wed, 7 Feb 2007 09:59:18 CST Raw View
Piyo wrote:
> I incited a major debate as to whether or not self-referential
> inserts are allowed (or not) by the C++ standard. What do I mean
> by self-referential insert? This is a simplistic description of
> the issue:
> (Code 0)
> std::vector<int> v;
> v.push_back(0);
> v.push_back( v[0] ); // this is the self-referential insert.
> I advocated that this is legal and should not crash.
That's what the standard says.
> Others say,
> that since the validity of the reference/iterator/pointer passed to
> push_back, gets invalidated when a reallocation of memory occurs,
Which means that you cannot use the reference/iterator/pointer
after returning from push_back. *You* are not responsible for
what happens in push back. You met the pre-conditions, by
passing it a valid reference.
Note that in the case of insertion of a range, there is an
explicit pre-condition that the iterators not be iterators into
the container. So something like:
v.insert( v.end(), v.begin(), v.end() ) ;
to double the size of the vector is illegal. But it's illegal
because of an explicitly stated pre-condition, and not because
of speculation as to how the function is implemented.
> it is not safe to do the following (extending the simplistic code
> to demonstrate it)
> (Code 1)
> std::vector<int> v;
> v.push_back(0);
> for( int i=0; i < 10000; ++i )
> {
> v.push_back( v[0] );
> }
> instead my opponents argue that the "correct" way (as suggested by
> the C++ standards) is to insure that the reference/iterator/pointer
> never gets invalidated via a temp copy:
> (Code 2)
> std::vector<int> v;
> v.push_back(0);
> for( int i=0; i < 10000; ++i )
> {
> int temp = v[0]; // make a temp copy prior to insertion
> v.push_back( temp );
> }
> While I do agree that doing (Code 2) does guarantee that the lifetime
> of the reference is not invalidated during the call, I still do not
> believe that (Code 1) is necessarily incorrect. So I requested them to
> furnish proof that the C++ standard does indeed prohibit (Code 1).
> Here is the reply:
> -----Begin Quote----------------------------------------------------
> It is the ISO C++ Standard that you disagree with, not me:
> Section 23.2.4.2.5 says (regarding std::vector<T> capacity):
> "Notes: Reallocation invalidates all the references, pointers, and
> iterators referring to the elements in the sequence. It is guaranteed
> that no reallocations takes place during insertions that happen after a
> call to reserve() until the time when an insertion would make the size
> of the vector greater than the size specified in the most recent call to
> reserve()."
> Section 23.2.4.3.1 says (regarding std::vector<T>::insert):
> "Notes: Causes reallocation if the new size is greater than the old
> capacity. If no reallocation happens, all the iterators and references
> before the insertion point remain valid. If an exception is thrown other
> than by the copy constructor or assignment operator of T there are no
> effects."
> [There is no guarantee from the standard that the reference will not
> become invalidated on reallocation.]
Which, IMHO, is irrelevant. You passed a valid reference.
That's all that the standard requires. After that, it's up to
the implementation to make it work.
> Though not normative, Scott Meyers addresses this very point in
> "Effective STL", Item 5, page 27:
> "Using iterative calls to insert in an explicit loop, it would probably
> look more or less like this:
> vector<int>::iterator insertLoc(v.begin());
> for (int i = 0; i < numValues; ++i) {
> insertLoc = v.insert(insertLoc,data[i]);
> }
> "Note how we have to be careful to save the return value of insert [!]
> for the next loop iteration. If we didn't update insertLoc after each
> insertion, we'd have two problems. First, all loop operations after the
> first would yield undefined behavior, because each insert call would
> invalidate insertLoc. [!]"
Which is an entirely different issue. You (the user) use
insertLoc after the call. When it is invalidated. That is your
responsibility.
> -----End Quote----------------------------------------------------
> Now after reading this, all I can gather is a set of post-conditions
> from the C++ standards that he quoted.
You understood it perfectly.
> Unfortunately, I myself do not
> own a copy of the C++ standards so I also cannot confirm that there does
> indeed exist a pre-condition that prohibits me from doing a
> self-referential insert.
One exists (explicitly) for range insertion, i.e. the three
parameter insertion. It does not exist for push_back and the
single element insertions.
> Due to the lack of accessibility on my part, I
> turned to the different implementations I could get my hands on:
> Microsoft's VC++ 8.0 and GNU's C++ implementation of
> std::vector::push_back.
> Here is what I have found:
> Microsoft does indeed create a temporary copy of the object to insert
> prior to reallocating new memory (and thus deleting the old buffer).
> GNU on the other hand, keeps the old buffer around until after the
> insertion is complete before delete the old buffer. In both cases,
> self-referential inserts are valid. This still does not satisfy
> everyone. Why? It is ultimately, vendor/implementor dependent as to
> whether or not self-referential inserts is handled properly or not.
Well, they don't do it that way for the fun of it. The standard
requires that they make it work, so they choose strategies which
make it work.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S mard, 78210 St.-Cyr-l' cole, France, +33 (0)1 30 23 00 34
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: "Richard Smith" <richard@ex-parrot.com>
Date: Wed, 7 Feb 2007 10:07:07 CST Raw View
On Feb 7, 6:41 am, cybermax...@yahoo.com (Piyo) wrote:
> I incited a major debate as to whether or not self-referential
> inserts are allowed (or not) by the C++ standard. What do I mean
> by self-referential insert? This is a simplistic description of
> the issue:
>
> (Code 0)
> std::vector<int> v;
> v.push_back(0);
> v.push_back( v[0] ); // this is the self-referential insert.
I think the standard does require it to be legal. It's also hard to
what an implementation could do to cause this to fail without also
invalidating the exception safety guarantees that the standard
imposes.
First of all, push_back is defined [23.2.4] to have the signature
void push_back(const T& x);
i.e. to take its argument by const reference. The semantics of this
are defined in term of the insert method [23.1.1/Table 68]:
a.push_back(x) [is equivalent to] a.insert(a.end(), x)
> -----Begin Quote----------------------------------------------------
> It is the ISO C++ Standard that you disagree with, not me:
>
> Section 23.2.4.2.5 says (regarding std::vector<T> capacity):
>
> "Notes: Reallocation invalidates all the references, pointers, and
> iterators referring to the elements in the sequence. It is guaranteed
> that no reallocations takes place during insertions that happen after a
> call to reserve() until the time when an insertion would make the size
> of the vector greater than the size specified in the most recent call to
> reserve()."
OK. So this gives an obvious way of solving the problem -- simply
call reserve() before calling push_back(). But the question remains
-- should the code, without a call to reserve, be legal?
> Section 23.2.4.3.1 says (regarding std::vector<T>::insert):
>
> "Notes: Causes reallocation if the new size is greater than the old
> capacity. If no reallocation happens, all the iterators and references
> before the insertion point remain valid. If an exception is thrown other
> than by the copy constructor or assignment operator of T there are no
> effects."
>
> [There is no guarantee from the standard that the reference will not
> become invalidated on reallocation.]
OK. Section 23.2.4.3/1 merely defines the vector-specific properties
of insert(); it is a generic requirement of the Sequence concept
[23.1.1] and various other requirements on it are imposed there:
[23.1/10/bullet 1] "If an exception is thrown by an insert() function
while inserting a single element, that function has
no effects."
[23.1.1/Table 67]: "a.insert(p,t) ... inserts a copy of t before p."
At the call to insert(), p is a valid iterator into a -- it is
a.begin(), and t is a valid reference, which happens to be a reference
into the vector. As the standard doesn't explicitly state (so far as
I can see) that the reference must remain valid even after a
reallocation, I don't think it can be required to.
However, let's think how push_back might actually be implemented (the
same logic applies to insert, but that is a more complicated case).
Assuming a reallocation is required you have to do the following:
1. allocate new storage
2. copy existing elements from existing storage to new storage
3. append new element to new storage
4. update vector's storage pointer(s) to point to new storage
5. destroy existing storage
Of these, any of 1, 2 and 3 might all throw exceptions -- 1 from the
allocator's allocate() function; 2 and 3 from the value_type's copy
constructor. 4 is simply updating a few pointers (three in a typical
implementation) and can't throw, and 5 involves calls to the
value_type's destructor and the allocator's deallocate function, both
of which the standard allows implementations to assume will not throw
[17.4.3.6/2/bullet 4 and 20.1.5/Table 32].
If 1, 2 or 3 throws an exception, the implementation must ensure that
push_back has no effects [23.1/10/bullet 1]. 4 can be reversed if
necessary, but 5 cannot as there is no guarantee that memory can be
reallocated or objects reconstructed without exceptions occurring.
And therefore 5 must occur after 1, 2 and 3 have all completed.
However, it is only step 5 that invalidates references into the
vector. (Depending on implementation, either 4 or 5 might invalidate
iterators, but that's not pertinent.) As it is step 3 that requires
the reference to be valid, and we've established that this must be
complete before step 5 can commence, it seems that purely from the
practicalities of implementation, the code will work even if it isn't
guaranteed to (which I believe it is).
Except life isn't that simple. An implementation might spot that it
is dealing with a built-in type with a trivial no-throw copy
constructor and might choose to make additional optimisations for that
case. For example, a good implementation would probably change the
repeated calls to the copy constructor in step 2 to a single memcpy.
In all probability, this logic would be abstracted in
std::uninitialized_copy (or similar), but it could be done by having
separate code paths through the push_back function. And if there's a
separate code path, there's no reason for 5 to come after 3 in this
path and your "self-referential insert" might break for an int. But
I'd be surprised if any such implementations exist.
> Though not normative, Scott Meyers addresses this very point in
> "Effective STL", Item 5, page 27:
>
> "Using iterative calls to insert in an explicit loop, it would probably
> look more or less like this:
>
> vector<int>::iterator insertLoc(v.begin());
> for (int i = 0; i < numValues; ++i) {
> insertLoc = v.insert(insertLoc,data[i]);
>
> }
This is a quite different issue as data[i] is not a reference into v.
And also bear in mind that a complicated example was deliberately
chosen here, as this is part of the "Prefer range member functions to
their single-element counterparts" item -- i.e. it is an example of
what not to do.
--
Richard Smith
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: markus.moll@esat.kuleuven.be (Markus Moll)
Date: Wed, 7 Feb 2007 16:11:10 GMT Raw View
Hi
Piyo wrote:
> I incited a major debate as to whether or not self-referential
> inserts are allowed (or not) by the C++ standard. What do I mean
> by self-referential insert? This is a simplistic description of
> the issue:
>=20
> (Code 0)
> std::vector<int> v;
> v.push_back(0);
> v.push_back( v[0] ); // this is the self-referential insert.
>=20
> I advocated that this is legal and should not crash.=20
[...]
> instead my opponents argue that the "correct" way (as suggested by
> the C++ standards) is to insure that the reference/iterator/pointer
> never gets invalidated via a temp copy:
IMO, they are wrong. Reasoning is found below.
> -----Begin Quote----------------------------------------------------
> It is the ISO C++ Standard that you disagree with, not me:
Here, I disagree.
> Section 23.2.4.2.5 says (regarding std::vector<T> capacity):
> "Notes: Reallocation invalidates [...]
17.3.1.1(2) Paragraphs labelled =91=91Note(s):=92=92 or =91=91Example(s):=
=92=92 are=20
informative, other paragraphs are normative.
> Section 23.2.4.3.1 says (regarding std::vector<T>::insert):
> "Notes: Causes reallocation [...]
Same here.
> [There is no guarantee from the standard that the reference will not
> become invalidated on reallocation.]
Indeed, but there is the guarantee that v.push_back(x) inserts a copy of=20
x to the end of v. (23.1.1)
The implementation has to take care to not use invalid references.
> Though not normative, Scott Meyers addresses this very point in
> "Effective STL", Item 5, page 27:
No, this is in fact another (real) problem.
hth
Markus
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Wed, 7 Feb 2007 16:12:56 GMT Raw View
In article <iLdyh.74232$qO4.972@newssvr13.news.prodigy.net>,
cybermax_69@yahoo.com (Piyo) wrote:
> So I throw my hands up and throw it to the community here to gather your
> opinions on this matter. I hope that all of you can help me understand
> why self-insertion is not allowed by the C++ standard.
For push_back and insert taking a const T&, self-referential inserts are
allowed. Implementations must do whatever it takes to make it work.
Rationale: The standard does not forbid it.
For the insert overloads taking a range to insert:
template <class InputIterator>
void
insert(iterator position, InputIterator first, InputIterator last);
self-referential inserts are not allowed.
Rationale: The standard forbids it in 23.1.1 [sequence.reqmts] in the
table of "Sequence requirements":
a.insert (p,i,j)
pre: i and j are not iterators into a.
A LWG defect report was opened on this issue. The latest official
response to it was to leave it in Open status:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#526
However just a few weeks ago an ad-hoc meeting of the LWG discussed this
issue again. The conclusions are unofficially listed here:
http://home.twcny.rr.com/hinnant/cpp_extensions/issues_preview/lwg-active
.html#526
(see "Rationale")
These conclusions are due to be officially voted on this coming April.
> Unfortunately, I myself do not
> own a copy of the C++ standards
A paper copy of the C++03 standard is available here:
http://search.barnesandnoble.com/booksearch/isbninquiry.asp?ean=978047084
6742&z=y
($75 US)
An electronic (PDF) version is available here:
http://webstore.ansi.org/ansidocstore/product.asp?sku=INCITS%2FISO%2FIEC+
14882%2D2003&source=google&adgroup=c++&keyword=c%2B%2B%20standards&gclid=
CK-IjcvGnIoCFRyVFQodbFaonQ
($30 US) Yikes. For years the price was $18, looks like it has gone up.
The latest draft for the C++0X standard (currently unofficial) is
available here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2135.pdf
It is free. Expect it to be slightly obsoleted about twice a year for
awhile.
-Howard
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: yechezkel@emailaccount.com (Yechezkel Mett)
Date: Thu, 8 Feb 2007 14:48:08 GMT Raw View
Howard Hinnant wrote:
> For push_back and insert taking a const T&, self-referential inserts are
> allowed. Implementations must do whatever it takes to make it work.
>
> Rationale: The standard does not forbid it.
>
> For the insert overloads taking a range to insert:
>
> template <class InputIterator>
> void
> insert(iterator position, InputIterator first, InputIterator last);
>
> self-referential inserts are not allowed.
>
> Rationale: The standard forbids it in 23.1.1 [sequence.reqmts] in the
> table of "Sequence requirements":
>
> a.insert (p,i,j)
> pre: i and j are not iterators into a.
>
> A LWG defect report was opened on this issue. The latest official
> response to it was to leave it in Open status:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#526
>
> However just a few weeks ago an ad-hoc meeting of the LWG discussed this
> issue again. The conclusions are unofficially listed here:
>
> http://home.twcny.rr.com/hinnant/cpp_extensions/issues_preview/lwg-active
> ..html#526
>
> (see "Rationale")
After reading reading it, it's not clear to me how the problem with the
reverse_iterator(last + 1) is dealt with. Does it means that the
standard says that std::copy copies the elements in order, and if they
are overwritten in the process that's what's supposed to happen? If so
why is the explicit requirement "result shall not be in the range
[first,last)" needed?
Yechezkel Mett
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: cybermax_69@yahoo.com (Piyo)
Date: Thu, 8 Feb 2007 16:09:10 GMT Raw View
What I originally thought was a petulant squabble between
me and my peers apparently is something that even the
Standards Committee is working on! I would like to thank
you all for you time and input into this matter. I am sure
that acquiring that standards document is a very good
investment for my future with the C++ language.
Thank you all once again!!
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Thu, 8 Feb 2007 17:34:46 GMT Raw View
In article <45cb1140$0$97229$892e7fe2@authen.yellow.readfreenews.net>,
yechezkel@emailaccount.com (Yechezkel Mett) wrote:
> > http://home.twcny.rr.com/hinnant/cpp_extensions/issues_preview/lwg-active
> > ..html#526
> >
> > (see "Rationale")
>
> After reading reading it, it's not clear to me how the problem with the
> reverse_iterator(last + 1) is dealt with. Does it means that the
> standard says that std::copy copies the elements in order, and if they
> are overwritten in the process that's what's supposed to happen? If so
> why is the explicit requirement "result shall not be in the range
> [first,last)" needed?
Fair question. The sentiment in the group was that we did not feel we
could improve upon the description of the algorithm without serious risk
of making matters worse. For example we want to be careful to continue
allow overlapping ranges such as:
copy(first, last, first-1);
I suspect that if someone comes up with proposed wording which clearly
improves the situation, without getting overly complicated nor
disallowing the subset of overlapping ranges we specifically want to
allow, such wording would be seriously entertained by the LWG.
Suggestions welcome. :-)
-Howard
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]