Topic: Questions about vector's interface


Author: brangdon@cix.co.uk (Dave Harris)
Date: Mon, 22 Jan 2007 21:34:30 GMT
Raw View
greghe@pacbell.net (Greg Herlihy) wrote (abridged):
> To recapitulate the arguments (and counter arguments) made thus far:

You seemed to have misunderstood the argument I posted in the article you
are commenting to, namely that resize() should use the same convention as
insert() for reasons of consistency. This is independent of whether
resize() invokes insert(), or whether references are faster than copies,
or whether a reference parameter might be invalidated. It's purely a
matter of consistency and "least surprise". Why should insert() and
resize() be different?


> Er, this discussion is going in circles.

No, it is three straight lines. The first line is about invalidating
arguments and (in my view) Howard's article nails it. The second line is
about possible performance gains and is still on-going. It probably needs
some actual measurements to settle it, or else to look at real code. The
third line is the one I mention above, about logical consistency.

-- Dave Harris, Nottingham, UK.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Thu, 25 Jan 2007 21:34:26 GMT
Raw View
In article <1169418504.979412.172490@m58g2000cwm.googlegroups.com>,
 "Peter Dimov" <pdimov@gmail.com> wrote:

> Howard Hinnant wrote:
>
> [...]
>
> > "Appreciable" is a subjective term.  So I'll just count copies...
> >
> > I wrote this little test harness to help, and tested it on gcc 4.0.1:
> >
> > template <class T>
> > struct vec
> > {
> >     void resize(unsigned, T t) {}
> > };
> >
> > struct A
> > {
> >     A() {}
> >     A(const A&) {std::cout << "A(const A&)\n";}
> > };
> >
> > Once inside resize, I believe we can expect the N copies of A, no matter
> > whether it is passed by value or by const&.  So we're just counting
> > copies to get the parameter in.
>
> Interestingly, this is no longer true if we have move support. The
> implementation of resize can move from the by-value argument.

If n <= size(), the t can not be moved from, it will be ignored.

If n > size(), t could be moved from *once* if passed by value.  One
would still need n-size()-1 copies of the value of t before the move.
The extra complication in the logic may adversely impact code size or
performance, though likely only slightly.  Here's pseudo code in the
spirit of the current standard:

void resize(size_type sz, T t)
{
   if (sz > size())
   {
       insert(end(), sz - size() - 1, t);
       push_back(std::move(t));  // move last copy
   }
   else if (sz < size())
       erase(begin() + sz, end());
}

In general when writing the rvalue reference recommendations for chapter
23 I added move support to only those signatures which were inserting a
single t.  E.g.:

void push_back(const T& t);
void push_back(T&& t);

iterator insert(iterator position, const T& t);
iterator insert(iterator position, T&& t);

And did not add move support to those signatures inserting multiple
copies of t:

void insert(iterator position, size_type n, const T& t);

This has the side effect that the former can be used to insert move-only
types whereas the latter can not.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: ark@acm.org ("Andrew Koenig")
Date: Sat, 20 Jan 2007 17:02:11 GMT
Raw View
"Greg Herlihy" <greghe@pacbell.net> wrote in message
news:1168579771.142459.237070@a75g2000cwd.googlegroups.com...

> The question in this case is very simple: whether an implementation of
> vector::resize() declared like this:

>    vector::resize(size_type sz, const T& c = T());

> would be at least as fast as - if not faster - than vector's current
> resize() method:

>    vector::resize(size_type sz, T c = T());

> for all eligible types, T, and on all machine architectures for which a
> C++ compiler exists.

I don't think the issue is quite this simple.

If resize is defined as

    vector::resize(size_type sz, const T& c = T());

then a question arises about the semantics of

    v.resize(v.size() + 42, v[0]);

The point is that resize can reallocate the contents of the vector being
resized, which invalidates any references to elements of the original
vector.  So in order to understand the semantics of the call above, it
becomes necessary to know whether the value of the element being used as the
initializer is fetched before or after the reallocation.

If resize is defined as

    vector::resize(size_type sz, T c = T());

then this question does not arise, because the specification requires the
initializer to be copied before reallocation begins.

Either way, the implementation is free to optimize away unnecessary copy
operations, so long as the effect of that optimization is invisible to the
user.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: brangdon@cix.co.uk (Dave Harris)
Date: Sun, 21 Jan 2007 15:19:46 GMT
Raw View
ark@acm.org ("Andrew Koenig") wrote (abridged):
> If resize is defined as
>
>     vector::resize(size_type sz, T c = T());
>
> then this question does not arise, because the specification requires
> the initializer to be copied before reallocation begins.

But the same issue arises with insert, and insert does take a reference.
Why should resize be different? Shouldn't they at least be consistent?

-- Dave Harris, Nottingham, UK.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Sun, 21 Jan 2007 15:18:20 CST
Raw View
Dave Harris wrote:
> ark@acm.org ("Andrew Koenig") wrote (abridged):
> > If resize is defined as
> >
> >     vector::resize(size_type sz, T c = T());
> >
> > then this question does not arise, because the specification requires
> > the initializer to be copied before reallocation begins.
>
> But the same issue arises with insert, and insert does take a reference.
> Why should resize be different? Shouldn't they at least be consistent?

Er, this discussion is going in circles. So far on this thread: four
different people have expressed three differing opinions split between
two possible parameter passing conventions for one argument that is
passed to vector::resize().

To recapitulate the arguments (and counter arguments) made thus far:

    #1 (Pete/Andrew)
    Position: Pass by value
    Counter-argument to #3: Must avoid invalidating argument

    #2 (Howard)
    Position: Pass-by-reference (to fix performance bug)
    Counter-argument to #1: insert() may invalidate argument anyway

    #3 (Greg)
    Position: Leave resize() declaration as-is
    Counter-argument to #2: overall effects of change too uncertain &
    low level optimizations cannot be implemented with high level
    changes

Essentially the discussion has devolved into a rock-scissors-paper
stand-off. And I've never found a winning strategy for that game, so I
fear we have reached a logical impasse.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Peter Dimov" <pdimov@gmail.com>
Date: Sun, 21 Jan 2007 23:06:48 CST
Raw View
Howard Hinnant wrote:

[...]

> "Appreciable" is a subjective term.  So I'll just count copies...
>
> I wrote this little test harness to help, and tested it on gcc 4.0.1:
>
> template <class T>
> struct vec
> {
>     void resize(unsigned, T t) {}
> };
>
> struct A
> {
>     A() {}
>     A(const A&) {std::cout << "A(const A&)\n";}
> };
>
> Once inside resize, I believe we can expect the N copies of A, no matter
> whether it is passed by value or by const&.  So we're just counting
> copies to get the parameter in.

Interestingly, this is no longer true if we have move support. The
implementation of resize can move from the by-value argument.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: brangdon@cix.co.uk (Dave Harris)
Date: Sun, 21 Jan 2007 23:18:28 CST
Raw View
greghe@pacbell.net (Greg Herlihy) wrote (abridged):
> Now imagine resizing a vector containing elements small enough to
> fit within a register (such as a vector<int> or vector<const char *>).
> In this scenario, passing an argument by-reference to resize() is less
> efficient - since the value must now be copied out of its register
> into a memory location while a pointer to that location is passed in
> its place.

Insert() takes a reference, so we are already incurring that cost in other
situations. We should at least be consistent.

Further, if resize() has the natural implementation in terms of insert(),
then we have the worst of both worlds. The value is both copied and
referenced. Similarly if the argument gets passed to
allocator::construct() - which may be a user-defined function that cannot
be omitted. It may eventually end up passed to a copy constructor, which
again takes its argument by reference. If the reference has to be taken,
it might as well be taken early, because copying references never costs
more than copying objects.


> Moreover, while the benefit of this change is to eliminate one copy
> operation per resize operation, its cost is to make inserting each
> object into the vector take longer (by replacing a register-to-memory
> location copy with a copy between two memory locations).

If the value is something that can be placed in a register, the compiler
(a) can do that (b) can do it in constant time. So instead of N
memory-to-memory copies we get 1 memory-to-register copy plus N
register-to-memory copies.


> So the bargain seems to be: in exchange for making one set of
> resizing operations faster by a constant amount of time, it is
> necessary to slow down another set of resize operations by a
> variable amount, an amount of time proportional to the however
> long resizing currently takes.

No, the slow down is by a constant amount, not a variable amount.

And further, the constant will be small and bounded, just as the time to
spill a register to memory is small and bounded. Typically it will be one
or two machine instructions. Where-as the cost of copying a value is
unbounded because it may depend on the user-defined copy constructor,
which can be arbitrarily slow.

In addition, it is surely easier to optimise away unnecessary referencing
or dereferencing, because those will be compiler-defined operations. The
compiler knows they have no visible side-effects. Where-as copying is
often a user-defined operation. The definition may not even be visible to
the compiler. It is harder to optimise away.


> Because if resizing a vector would be slower in some cases - that is -
> if the answer turns out to be "no" - then there is no performance bug
> (or at least this change is not its fix). Instead, there is a tradeoff.
> And making tradeoffs is a design decision, not a defect to be fixed.

A bad trade-off is a defect to be fixed. And that is what I believe we
have here. To me it seems clear-cut. We have a clear and massive slow-down
for objects with slow user-defined copy constructors. For types like int
we /might/ have an improvement, but it will be small, bounded, and good
optimisers can probably avoid it altogether.

We also have the inconsistency of resize() being out of step with
insert(), allocators and copy constructors, which cannot be a good thing.

-- Dave Harris, Nottingham, UK.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Fri, 12 Jan 2007 11:33:00 CST
Raw View
"Stephen Howe" wrote:
> > Ironically enough, I considered the resize(size,T)
> > issue too controversial to actually suggest fixing it in this paper.
>
> But is it considered at all?
> It seems terrible that for C++0x - performance bugs are left in the
> standard.
> There should be no unneccesary copy constructors & corresponding destructors
> invoked.

The question in this case is very simple: whether an implementation of
vector::resize() declared like this:

    vector::resize(size_type sz, const T& c = T());

would be at least as fast as - if not faster - than vector's current
resize() method:

    vector::resize(size_type sz, T c = T());

for all eligible types, T, and on all machine architectures for which a
C++ compiler exists.

If the answer to this question is "yes" then a performance defect (of
some magnitude) would exist with the current declaration of
vector::resize(). But given the wide variety of potential types for T
and architectures to consider, even knowing the effect of such a change
for certain would be a challenge. After all, it is not enough to find
some vector resizing operations which benefit from the change, it is
necessary to show somehow that resizing a vector would never be slower.
Because if resizing a vector would be slower in some cases - that is -
if the answer turns out to be "no" - then there is no performance bug
(or at least this change is not its fix). Instead, there is a tradeoff.
And making tradeoffs is a design decision, not a defect to be fixed.

So is it conceivable that changing an argument passed by-value to one
passed by-reference would actually making resizing a vector slower?
Let's consider a C++ program compiled for a machine achitecture in
which arguments in function calls are passed in registers (such as the
PowerPC). Now imagine resizing a vector containing elements small
enough to fit within a register (such as a vector<int> or vector<const
char *>). In this scenario, passing an argument by-reference to
resize() is less efficient - since the value must now be copied out of
its register into a memory location while a pointer to that location is
passed in its place. In other words, the "work" that the C program
appears to be performing does not always correspond to the work that
the compiled binary performs - in fact, in this case, it is the
opposite. At the C++ language level: passing by value appears to
require a copy operation - at the machine level: no copy is made. At
the C level: passing the argument by reference does not make a copy, at
the machine level: the argument is copied from a register into a memory
location.

So the essential premise - that machine level optimizations can be
achieved from changes being made at the source code level - is
mistaken. The only sure optimizations that can be implemented at the
source code level are those that eliminate logical operations - not
those that make changes but leave the amount of "work" performed the
same. There is no work saved at the language level, the parameter is
still passed to resize() in both cases, so no "optimization" at the
language level has been made with this change, so no performance
improvement in the compiled program is assured.

Moreover, while the benefit of this change is to eliminate one copy
operation per resize operation, its cost is to make inserting each
object into the vector take longer (by replacing a register-to-memory
location copy with a copy between two memory locations). So the bargain
seems to be: in exchange for making one set of resizing operations
faster by a constant amount of time, it is necessary to slow down
another set of resize operations by a variable amount, an amount of
time proportional to the however long resizing currently takes. So
resizing a vector by a large amount may either be a little bit faster -
or a lot slower - than it was before. Even if the former cases
outnumber the latter, it is still an unattractive deal, since no matter
how much a program stands to gain with this change, it only takes a few
operations to nullify those gains.

One could argue that a compiler's optimizer is likely to minimize the
cost of this change, maybe even completely get rid of  it. But this is
the same, optimizing compiler deemed unable to make the optimization
for which this change was necessary in the first place. And it seems
more likely that an optimizer would be able to eliminate the copy of
resize()'s value parameter on its own (especially if resize() is
inlined) than it would be able to undo the effects of pushing a value
out of a register. But one thing is certain, these types of issues
cannot be optimized at the language level - the application
environments affected by these types of changes are too diverse for
these kinds of decisions to be made here. Instead the Standard should
grant the necessary latitude to implementions in order to make the
desired optimizations on a case-by-case basis, and then leave it up to
them to do just that.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Gennaro Prota <gennaro_prota@yahoo.com>
Date: Fri, 5 Jan 2007 10:56:26 CST
Raw View
I first posted this on December 22, 2006. But I'm quite obstinate, so
here's my belated reply :-)

On Thu, 21 Dec 2006 20:48:51 GMT, Howard Hinnant wrote:

>> vector<int> v;
>> // populate v;
>>
>> v.resize(v[0]);
>>
>> If resize takes its argument by reference, the reference to v[0] becomes
>> invalid during the resize. Passing the argument by values means that
>> resize is always dealing with a copy, so there's no validity problem.
>
>This was the reason, but it is in error.

That's a typo for v.resize( s, v[ 0 ] ), isn't it?

>Taking t by value is a plain
>and simple performance bug in the standard.
>
>There are 3 cases to consider:
>
>1.  Size is shrinking.
>2.  Size is growing, but to less than or equal to capacity().
>3.  Size is growing beyond capacity().
>
>In case 1, t is never referenced.  So the fact that the call may
>invalidate it is irrelevant.
>
>In case 2, t can not be invalidated.
>
>In case 3, t can be invalidated (as the old buffer is destroyed), but it
>is trivial to delay deleting the old buffer until the new buffer is
>fully populated, including the new copies of t.

In addition to what you point out, the choice is quite against the
don't pay for what you don't use principle: if one wants to pass a
value from v itself then why not writing

  T t( v[ i ] );
  v.resize( v, t );

Or am I missing something?

___
X-Replace-Address
     \|||/                Gennaro Prota   -   For hire
     (o o)           https://sourceforge.net/projects/breeze/
--ooO-(_)-Ooo-----   (to mail: name _ surname / yaho ! com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Sat, 6 Jan 2007 17:11:46 GMT
Raw View
In article <trhsp2975sgt991bgk6rlilh6oeajvoqmk@4ax.com>,
 Gennaro Prota <gennaro_prota@yahoo.com> wrote:

> >> vector<int> v;
> >> // populate v;
> >>
> >> v.resize(v[0]);
> >>
> That's a typo for v.resize( s, v[ 0 ] ), isn't it?

Yes.

> >Taking t by value is a plain
> >and simple performance bug in the standard.
> >
> >There are 3 cases to consider:
> >
> >1.  Size is shrinking.
> >2.  Size is growing, but to less than or equal to capacity().
> >3.  Size is growing beyond capacity().
> >
> >In case 1, t is never referenced.  So the fact that the call may
> >invalidate it is irrelevant.
> >
> >In case 2, t can not be invalidated.
> >
> >In case 3, t can be invalidated (as the old buffer is destroyed), but it
> >is trivial to delay deleting the old buffer until the new buffer is
> >fully populated, including the new copies of t.
>
> In addition to what you point out, the choice is quite against the
> don't pay for what you don't use principle: if one wants to pass a
> value from v itself then why not writing
>
>   T t( v[ i ] );
>   v.resize( v, t );

I don't recommend making a copy to avoid self referencing in this case.
The container can, and should handle it for you.  The container already
handles it for you for insert, so there's no reason it can't handle it
for you for resize as well (which is a simpler case than insert).

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: sjhoweATdialDOTpipexDOTcom@giganews.com ("Stephen Howe")
Date: Thu, 28 Dec 2006 00:12:21 GMT
Raw View
> Ironically enough, I considered the resize(size,T)
> issue too controversial to actually suggest fixing it in this paper.

But is it considered at all?
It seems terrible that for C++0x - performance bugs are left in the
standard.
There should be no unneccesary copy constructors & corresponding destructors
invoked.

Stephen Howe


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: pete@versatilecoding.com (Pete Becker)
Date: Thu, 28 Dec 2006 08:00:08 GMT
Raw View
Stephen Howe wrote:
>> Ironically enough, I considered the resize(size,T)
>> issue too controversial to actually suggest fixing it in this paper.
>
> But is it considered at all?
> It seems terrible that for C++0x - performance bugs are left in the
> standard.
> There should be no unneccesary copy constructors & corresponding destructors
> invoked.
>

Have you seen an application in which this copy was shown to be a
bottleneck?

--

 -- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Thu, 28 Dec 2006 08:00:38 GMT
Raw View
In article <coidnZrP95hpew_YRVnytwA@pipex.net>,
 sjhoweATdialDOTpipexDOTcom@giganews.com ("Stephen Howe") wrote:

> > Ironically enough, I considered the resize(size,T)
> > issue too controversial to actually suggest fixing it in this paper.
>
> But is it considered at all?
> It seems terrible that for C++0x - performance bugs are left in the
> standard.
> There should be no unneccesary copy constructors & corresponding destructors
> invoked.

Write a paper.  Make a defect report.  Do something that the committee
can not ignore (we can ignore posts to comp.std.c++ unless they are a DR
or request to submit a paper).  If somebody has already submitted a DR
or paper on the subject, do not assume that it is taken care of.
Duplicates count; they are not wasted effort at all.  Just because I
know it is a defect, don't assume I have the political power to get it
changed.  *You* (collectively speaking) have the power to get it
changed.  Send me a DR I can't ignore. :-)

To submit a defect report:

Post to comp.std.c++ with the words "Defect Report:" in the subject
line.  Defect reports need not be lengthy.  But they are more effective
if they clearly propose wording changes to the latest current working
draft (currently N2135).  A brief motivating statement is helpful as
well.  If you know you are submitting a duplicate, it might be helpful
to include motivation which other dr's have not yet stated.  Otherwise
just re-asserting the motivation is also helpful.

Your vote counts.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: pete@versatilecoding.com (Pete Becker)
Date: Thu, 21 Dec 2006 16:53:24 GMT
Raw View
Scott Meyers wrote:
> I recently noticed a couple of things about vector's interface I don't
> understand.
>

> 2.  vector::resize has this signature:
>
>       void resize(size_type, T = T());
>
>     Why is the second argument passed by value instead of by
>     ref-to-const?  The analogous constructor uses ref-to-const:
>
>       explicit vector(size_type, const T& = T(), [...] );
>
> Thanks for any clarification of these design decisions.
>

vector<int> v;
// populate v;

v.resize(v[0]);

If resize takes its argument by reference, the reference to v[0] becomes
invalid during the resize. Passing the argument by values means that
resize is always dealing with a copy, so there's no validity problem.

--

 -- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Thu, 21 Dec 2006 20:48:51 GMT
Raw View
In article <12okbusm53ndm31@corp.supernews.com>,
 usenet@aristeia.com (Scott Meyers) wrote:

> I recently noticed a couple of things about vector's interface I don't
> understand.
>
> 1.  vector::at has no exception specification.  Why not?

Because the only thing that would add is code bloat.  The purpose of an
exception specification is to create a run time check which checks for
exceptions which don't meet the spec, and call unexpected() when that
happens.  There is no way for that to happen within at().  Compilers may
or may not be able to detect that, and optimize the check away.

In article <TrudnbTYYadRDhfYnZ2dnUVZ_ruknZ2d@giganews.com>,
 pete@versatilecoding.com (Pete Becker) wrote:

> Scott Meyers wrote:
> > I recently noticed a couple of things about vector's interface I don't
> > understand.
> >
>
> > 2.  vector::resize has this signature:
> >
> >       void resize(size_type, T = T());
> >
> >     Why is the second argument passed by value instead of by
> >     ref-to-const?  The analogous constructor uses ref-to-const:
> >
> >       explicit vector(size_type, const T& = T(), [...] );
> >
> > Thanks for any clarification of these design decisions.
> >
>
> vector<int> v;
> // populate v;
>
> v.resize(v[0]);
>
> If resize takes its argument by reference, the reference to v[0] becomes
> invalid during the resize. Passing the argument by values means that
> resize is always dealing with a copy, so there's no validity problem.

This was the reason, but it is in error.  Taking t by value is a plain
and simple performance bug in the standard.

There are 3 cases to consider:

1.  Size is shrinking.
2.  Size is growing, but to less than or equal to capacity().
3.  Size is growing beyond capacity().

In case 1, t is never referenced.  So the fact that the call may
invalidate it is irrelevant.

In case 2, t can not be invalidated.

In case 3, t can be invalidated (as the old buffer is destroyed), but it
is trivial to delay deleting the old buffer until the new buffer is
fully populated, including the new copies of t.  I would even imagine
this is how most implementations behave anyway.

For entertainment value, note that resize is specified in terms of:

if (sz > size())
    insert(end(), sz-size(), c);
  else if (sz < size())
    erase(begin()+sz, end());
  else
    ;                           //  do nothing

And if you look at the spec for insert(pos, size, value):

void insert(iterator position, size_type n, const T& x);

(note the const T&), and their is no precondition that x not be a
reference into *this.  Therefore implementations already have to watch
out for this issue within insert.  insert (unlike resize) is capable of
invalidating the reference out from under itself when x is reference
into *this occurring at or beyond the point of insertion (not possible
when inserting at end()).

So the only purpose of passing T by value in resize() is to keep those
bored little electrons in your computer a little busier.  For what its
worth, I took the CodeWarrior implementation outlaw on this issue a long
time ago (and documented that fact in the header synopsis of each
affected container).  No complaints were ever lodged.

This issue is mentioned in one of the rvalue reference papers:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1858.html

(search for resize)  Ironically enough, I considered the resize(size,T)
issue too controversial to actually suggest fixing it in this paper.
<sigh>

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Fri, 22 Dec 2006 17:37:37 CST
Raw View
Howard Hinnant wrote:
> pete@versatilecoding.com (Pete Becker) wrote:
> > > vector<int> v;
> > // populate v;
> >
> > v.resize(v[0]);
> >
> > If resize takes its argument by reference, the reference to v[0] becomes
> > invalid during the resize. Passing the argument by values means that
> > resize is always dealing with a copy, so there's no validity problem.
>
> This was the reason, but it is in error.  Taking t by value is a plain
> and simple performance bug in the standard.

Why would we expect vector::resize() to be appreciably more efficient
if it accepted the initializer value from a const reference - instead
of from (as is currently specified) a value parameter? As far as I can
tell the requirements of the two most time-consuming operations that
resize() potentially has to perform (allocate memory and initialize
each newly allocated element) would be unaffected by such a change. And
although I don't know much about compiler technology, I would have
guessed that initializing newly-allocated objects with an rvalue has
the potential to be much more efficent (at least when the initializer
value is known at compile-time) than initializing the same same objects
with a value copied from an lvalue parameter - a value that can only be
ascertained at runtime.

> For entertainment value, note that resize is specified in terms of:
>
> if (sz > size())
>     insert(end(), sz-size(), c);
>   else if (sz < size())
>     erase(begin()+sz, end());
>   else
>     ;                           //  do nothing
>
> And if you look at the spec for insert(pos, size, value):
>
> void insert(iterator position, size_type n, const T& x);
>
> (note the const T&), and their is no precondition that x not be a
> reference into *this.  Therefore implementations already have to watch
> out for this issue within insert.  insert (unlike resize) is capable of
> invalidating the reference out from under itself when x is reference
> into *this occurring at or beyond the point of insertion (not possible
> when inserting at end()).

I'm not sure I understand why only those iterators beyond the point of
insertion may be invalidated by an insert() operation. If the vector
needs to be resized in order to accomodate the objects being inserted
into the vector, then calling insert() must have the same potential
side effects as a call to resize(), namely, that the call may
invalidate all of the vector's iterators no matter their position in
the vector.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: usenet@aristeia.com (Scott Meyers)
Date: Fri, 22 Dec 2006 23:53:20 GMT
Raw View
Howard Hinnant wrote:
> Because the only thing that would add is code bloat.  The purpose of an
> exception specification is to create a run time check which checks for
> exceptions which don't meet the spec, and call unexpected() when that
> happens.

This is an interesting perspective on exception specifications (ESes)
that I have not heard before.  It suggests that the role of an ES is not
to document to callers what the exception-related interface to the
function is, but instead to enforce that interface (which is presumably
documented elsewhere).  Interestingly, this is consistent with the
notion that a function's type does not include its ES (except when it
does :-}).

Are you speaking primarily for yourself in defining the role of an ES
this way, or would you say that you believe you reflect consensus on the
committee regarding the role of ESes as being more enforcement than
documentation?  (I realize that you can't speak for the committee, hence
my question about whether you *believe* that this sentiment is probably
shared by most members of the committee.  Still an unfair question, I
know, but I'm doing my best to ask whether you think this view of ESs is
"commonly accepted.")

> So the only purpose of passing T by value in resize() is to keep those
> bored little electrons in your computer a little busier.

Nice analysis, thanks :-)

Scott

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: kuyper@wizard.net
Date: Fri, 22 Dec 2006 21:52:42 CST
Raw View
Greg Herlihy wrote:
.
> I'm not sure I understand why only those iterators beyond the point of
> insertion may be invalidated by an insert() operation. If the vector
> needs to be resized in order to accomodate the objects being inserted
> into the vector, then calling insert() must have the same potential
> side effects as a call to resize(), namely, that the call may
> invalidate all of the vector's iterators no matter their position in
> the vector.

Section 23.2.4.3p1 says: "Notes: Causes reallocation if the new size is
greater than the old capacity. If no reallocation happens, all of the
iterators and references before the insertion point remain valid. ..."

This garantee is only useful when the user has made sure that capacity
is adequate to ensure that reallocation is not allowed. Of course, it's
perfectly feasible for the user to ensure that.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Sat, 23 Dec 2006 18:59:57 GMT
Raw View
In article <12omrng6bv9bm41@corp.supernews.com>,
 usenet@aristeia.com (Scott Meyers) wrote:

> Howard Hinnant wrote:
> > Because the only thing that would add is code bloat.  The purpose of an
> > exception specification is to create a run time check which checks for
> > exceptions which don't meet the spec, and call unexpected() when that
> > happens.
>
> This is an interesting perspective on exception specifications (ESes)
> that I have not heard before.  It suggests that the role of an ES is not
> to document to callers what the exception-related interface to the
> function is, but instead to enforce that interface (which is presumably
> documented elsewhere).  Interestingly, this is consistent with the
> notion that a function's type does not include its ES (except when it
> does :-}).
>
> Are you speaking primarily for yourself in defining the role of an ES
> this way

Yes, in that I'm representing only myself.  However I'm attempting to
speak from the perspective of the compiler/language.  An ES has an
effect, and that effect is to generate code to enforce a run time check
that the ES is respected, else unexpected() is called.  That run time
check may or may not be optimized back out during a later phase of
compilation.

Yes, there is a documentation aspect to ES, in the same way that every
part of the language documents the programmer's intent (as opposed to
say machine language).  E.g. the declaration of a function documents its
purpose, arguments and return types.

If you want the documentation aspect of ES, but not the associated run
time check, better to simply use a comment.  If you want the run time
check, then ES is your tool.  If you want a compile time check of the
exception specification, write a paper. ;-)  The only compile time
checking I would personally be in favor of is a no-throw spec.

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: howard.hinnant@gmail.com (Howard Hinnant)
Date: Sun, 24 Dec 2006 17:45:31 GMT
Raw View
In article <1166757304.037836.30250@73g2000cwn.googlegroups.com>,
 "Greg Herlihy" <greghe@pacbell.net> wrote:

> > This was the reason, but it is in error.  Taking t by value is a plain
> > and simple performance bug in the standard.
>
> Why would we expect vector::resize() to be appreciably more efficient
> if it accepted the initializer value from a const reference - instead
> of from (as is currently specified) a value parameter? As far as I can
> tell the requirements of the two most time-consuming operations that
> resize() potentially has to perform (allocate memory and initialize
> each newly allocated element) would be unaffected by such a change. And
> although I don't know much about compiler technology, I would have
> guessed that initializing newly-allocated objects with an rvalue has
> the potential to be much more efficent (at least when the initializer
> value is known at compile-time) than initializing the same same objects
> with a value copied from an lvalue parameter - a value that can only be
> ascertained at runtime.

"Appreciable" is a subjective term.  So I'll just count copies...

I wrote this little test harness to help, and tested it on gcc 4.0.1:

template <class T>
struct vec
{
    void resize(unsigned, T t) {}
};

struct A
{
    A() {}
    A(const A&) {std::cout << "A(const A&)\n";}
};

Once inside resize, I believe we can expect the N copies of A, no matter
whether it is passed by value or by const&.  So we're just counting
copies to get the parameter in.  We want to do this for both lvalue and
rvalue arguments, and modeling both pass by value and pass by const T&.
For example, here's the code for passing an lvalue:

int main()
{
   vec<A> v;
   A a;
   v.resize(0, a);
}

Here are my empirical results:

             T        const T&

lvalue       1           0
rvalue       0           0

As a sanity check, I ran the same test on CodeWarrior, and got the same
results.

It appears to me that const T& (in general) is the more efficient
approach.  As T can have an arbitrarily expensive copy constructor, and
the number of T's copied into the vector internally can be arbitrarily
small (including 0), then (imho) it is prudent to pass by const T& for
the case of resize.  A copy saved is a copy earned. ;-)

> I'm not sure I understand why only those iterators beyond the point of
> insertion may be invalidated by an insert() operation. If the vector
> needs to be resized in order to accomodate the objects being inserted
> into the vector, then calling insert() must have the same potential
> side effects as a call to resize(), namely, that the call may
> invalidate all of the vector's iterators no matter their position in
> the vector.

Yes, sorry, my language got sloppy.  In the case of insert(pos, n, t),
there are 3 cases to consider:

1.  n == 0, there is no possibility for invalidation of t.
2.  size() + n <= capacity().
3.  size() + n > capacity().

In the case of 2, if t references an element within the same vector
prior to position "pos", then t will not be invalidated.  If it
references an element at or beyond "pos", then it will be invalidated.
Since t is pass by const&, a properly implemented vector::insert must
take this into account.  Some implementations will make an internal copy
"just in case".  My preferred implementation technique is to simply
detect this situation, and chase the moved element with a pointer.  This
simple "check & chase" is potentially far more efficient than a copy.
Note that if "pos" == end(), (as it would if being called from
resize()), then there is no possibility for this mode of invalidation.

In the case of 3, we have the identical situation that I spoke about for
resize when size() + n > capacity().  t could be invalidated as the old
buffer is destroyed.  But it is both easy and typical to completely
construct the new buffer, including all new elements, before destroying
the old buffer, thus avoiding an untimely invalidation.  Indeed, with
insert it is required because of the pass by const T&.

Therefore if resize() is simply implemented in terms of insert, then you
could pass T by const T&, and invalidation would not be possible
(because insert isn't permitted to allow it).

-Howard

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "Lance Diduck" <lancediduck@nyc.rr.com>
Date: Mon, 25 Dec 2006 12:37:40 CST
Raw View
Howard Hinnant wrote:
> Yes, in that I'm representing only myself.  However I'm attempting to
> speak from the perspective of the compiler/language.  An ES has an
> effect, and that effect is to generate code to enforce a run time check
> that the ES is respected, else unexpected() is called.  That run time
> check may or may not be optimized back out during a later phase of
> compilation.
>
> Yes, there is a documentation aspect to ES, in the same way that every
> part of the language documents the programmer's intent (as opposed to
> say machine language).  E.g. the declaration of a function documents its
> purpose, arguments and return types.
>
> If you want the documentation aspect of ES, but not the associated run
> time check, better to simply use a comment.  If you want the run time
> check, then ES is your tool.  If you want a compile time check of the
> exception specification, write a paper. ;-)  The only compile time
> checking I would personally be in favor of is a no-throw spec.
Speaking from the perspective of an applications designer, ESes create
way more problems than they solve.
First:
A common idiom for an app writer is placing catch blocks like so:
//in a library header
namespace SomeLibrary{
struct Err1;
void use()throw(Err1);
}
//user writes
try{
  SomeLibrary::use();
}catch(SomeLibrary::Err1 const&){/*...*/}
}catch(std::exception const&){/*...*/}
}catch(...){/*...*/}
So an exception specification anywhere in SomeLibrary is useless -- the
user is already very likely to build the "firewall" where it belongs,
which is after the function exits, not before.

Second:
Many newcomers to C++ approach it from Java. Java ES has a somewhat
different semantic than C++, and at least in Java you can get compiler
support to make sure the ES is correct. Getting them to unlearn this
semantic, and then tell them they have to play with global handlers
and/or do full coverage regression tests (Oh yeah that is going to
happen) to make sure that you got the exception return paths perfect on
pain of death....

Third:
Every non-trivial library is going to inherit their exceptions from
std::exception. So we typically have this case
namespace SomeLibrary{
struct Err1:std::logic_error{
        Err1(std::string const&_a):std::logic_error(_a){}
}
void use()
throw (Err1)//Thinking in Java
{
     std::vector <int> j(20);//Init somehow
     // buried deep in bowels of the library
     // using Configuraton 15, j is resized to 5
     if(j.at(19)==1)//Oops!!! we didn't convert to an Err1
           throw Err1("Too bad");
}
}//namespace

You can just imagine what the user of SomeLibrary thinks now--
everytime he calls the library in a particular configuration, his app
just crashes. Either a) we tell the user that he should have
set_unexpected() to make the library work, or b) we maintain complete
coverage regression tests hoping that we catch these errors, which add
significantly to the cost of the library.
And for what? The user is already catching std::exception just by habit
anyway. He would have just reported a bug rather than now preparing
litigation.
The natural progression is of course
namespace SomeLibrary{
void use()thow(Err1,std::exception)
//documents nothing a user isn't already doing, but at least works!!
}
and then the ES comes out altogether like a bad appendix.


So whether there is a documentation effect, a performance hit, some
code bloat or what not, C++ ES has the notable effect of making a
program less fault-tolerant and more error-prone. Which is exactly the
opposite effect that you intended. Perhaps there is a little value in
throw() as a optimizing hint to the compiler, however in practice
virtually all functions that use nothrow are inlined anyway.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: usenet@aristeia.com (Scott Meyers)
Date: Thu, 21 Dec 2006 07:03:20 GMT
Raw View
I recently noticed a couple of things about vector's interface I don't
understand.

1.  vector::at has no exception specification.  Why not?

2.  vector::resize has this signature:

       void resize(size_type, T = T());

     Why is the second argument passed by value instead of by
     ref-to-const?  The analogous constructor uses ref-to-const:

       explicit vector(size_type, const T& = T(), [...] );

Thanks for any clarification of these design decisions.

Scott

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]