Topic: STL list<>.size()


Author: Christopher Eltschka <celtschk@web.de>
Date: Mon, 30 Jul 2001 20:43:35 GMT
Raw View
Gawain Bolton <gbolton@wanadoo.fr> writes:

> Christopher Eltschka wrote:
>
> > Howard Hinnant <hinnant@antispam.metrowerks.com> writes:
> >
> > > In article <dil3d7wixpm.fsf@isolde.research.att.com>, Matthew Austern
> > > <austern@research.att.com> wrote:
> > >
> > > | No.  Some implementations compute the size each time,
> > > | others maintain a member variable that contains the
> > > | size.
> > >
> > > The standardeze that allows this is under 23.1/5:  size() has a
> > > non-normative note:
> > >
> > > | Notes: Those entries marked ``(Note A)'' should have constant complexity.
> > >
> > > Not only is this statement non-normative but "should" doesn't mean
> > > "must".
> > >
> > > | > The STL with gcc 3.0 does this, is there a reason it ?
> > > | > Sure a list with a number_of_members is preferable ?
> > > |
> > > | Depends on what you're trying to optimize.  Maintaining
> > > | a member variable that contains the size makes size()
> > > | faster, but it increases the size of a list<> object
> > > | and it makes other operations (such as some forms of
> > > | list<>::splice) slower.  It's a tradeoff either way.
> > > |
> > > | The argument that finally convinced me it was better
> > > | not to have such a member variable is that a user who
> > > | wants to keep track of a list's size can just do it
> > > | by hand; there's nothing the list can do internally
> > > | that a user can't do just as well.  If a list<>
> > > | implementation does have that extra member variable,
> > > | however, there's no way for a user who doesn't need
> > > | that extra overhead to avoid it.
> > >
> > > Just for diversity, I hold the opposite view of Matt. :-)
> > >
> > > The only operation I know of that an O(1) size makes slower is /one/
> > > form of splice.  Overall, there are 5 variations on splice implemented
> > > by 3 overloaded methods.  Here is their complexity assuming an O(1)
> > > size:
> > >
> > > splice         |       from self    |    from other
> > > ---------------+--------------------+------------------
> > > 1 element      |     O(1)           |    O(1)
> > > some elements  |     O(1)           |    O(some)
> > > all elements   |  not valid         |    O(1)
> > >
> > > All list::methods besides splice(some elements, from other) can update
> > > a size member variable in constant time; or in some methods that have a
> > > linear complexity of K*size(), updating size() may take k*size() time
> > > where k is much much smaller than K.  That is, updating size may be
> > > linear, but is in the noise level with respect to the work that the
> > > method has to do anyway.
> > >
> > > Furthermore, there are several list methods that can take advantage of
> > > an O(1) size to /increase/ their performance:
> > >
> > > *  Obviously with an O(1) size, size() itself is faster.
> > >
> > > *  When a list::method must iterate to an indexed element in the list,
> > > an O(1) size can be used to quickly find out whether it would be
> > > quicker to iterate forwards from begin(), or backwards from end().
> > > This technique can be used to speed up *operator=*, *resize*, and all
> > > forms of *assign* except the variation taking input iterators (if the
> > > iterators are not just input, then assign(it, it) can be sped up with
> > > this technique too).
> > >
> > > *  The operators == and != are more efficient with an O(1) size since
> > > you can check for equal size()'s before you start the element by
> > > element compare.
> > >
> > > *  For implementing a traditional merge sort (member sort method of
> > > list), it is easier (faster?) to cut the list in half when you have an
> > > O(1) size.
> > >
> > > Yes, an O(1) size is going to cost an extra word of memory.  But the
> > > Metrowerks std::list has an O(1) size, and the overhead is only 3
> > > words.  Most list implementations have an overhead greater than this,
> > > even if they have an O(N) size.  One word overhead on the stack plus
> > > padded(2*sizeof(void*)+sizeof(T)) bytes of overhead on the heap is
> > > common.  And I've seen bigger than that.
> > >
> > > One could get the overhead of list down to 2 words with an O(N) size.
> > > But because of the adverse performance impact on several list methods,
> > > while benefitting just one method, I believe that the O(N) size is not
> > > a good deal from a technical standpoint.
> > >
> > > Additionally the standard "encourages" implementors to have an O(1)
> > > size:
> > >
> > > *  23.1/5:  size() /should/ be O(1)
> > > *  23.2.2.4/14: Complexity (of splice(some)): Constant time if &x ==
> > > this; otherwise, linear time
> > > *  Overall design that eliminates "inefficient" operations such as
> > > push_front on vector, or operator[] on list.
> > >
> > > Combine the performance analysis, with the "encouragement" of the
> > > standard, and I believe that size() should be O(1).
> >
> > Let me add a third possibility:
> >
> > Store the size, but allow it to be marked invalid, maybe through the
> > value size_type(-1). Now all O(1) operation update the size if it is
> > valid, all O(n) operations update the size in any case, and the
> > "offending splice" invalidates the size. If the size has been
> > invalidated (this can happen only by calling the "offending splice"),
> > size() is O(N), otherwise it's O(1). Users who care about size()
> > always being O(1) can just call size() directly after each offending
> > splice. The pair offending splice+size would be O(N), and all "normal"
> > calls to size() would be O(1). Users who care about fast splice would
> > not do that call, and therefore get O(1) splice in every case.
> >
>
> Yes but to mark the size as "invalid" you may need to another member variable
> and developpers out there seem to be very frugal when it comes to having a
> member variable for the container size...

No, see above. "Unknown" has the value size_type(-1).

>
> Unless, of course, you designate a value of 0 to indicate the size may not be
> known.  Calculating the size again if it really is zero would be negligible.

That's of course another option. But with size_type(-1), this check
isn't even necessary.

>
> What disappoints me most with this is that the C++ Standard Library is not
> intuitive.  Why should the size be available in constant time for some
> containers and not for others?  The size() function is far too important not to
> be optimized.

Because of the logic of the container?

What is the value of splice() if it is not O(1)? After all, you get
the same behaviour with ordinary copy/destroy.

>
> Saying that users can keep track of size not the issue.  The problem is, it's
> not obvious that this may need to be done.

OK, there's a point in not definintg size() for std::list at all (just
as forward iterators don't define operator+ due to bad
efficiency). Then one would have to use distance(l.begin(), l.end()),
which is obviously slow.

>
> I thought the philosophy of the standard library containers was to provide
> functions which were efficient.  The idea being you are forced to do the
> inefficient things in your code and so these are exposed rather then being
> hidden in a standard library function.

The problem with list:size is that there are two functions which
cannot be made efficient at the same time. One of them is size, and
the other is a certain case of splice. You can't have your cake and
eat it, too.

However, the "almost always O(1)" would IMHO be a sufficient solution.

>
> For this reason, either:
>
> 1) The performance of size() should be worded "shall be O(1) (constant time)"
> for all containers except perhaps after calling "offending" splice() function
> (see above) in which case it may be O(n) (linear time).
>
> or
>
> 2) The size() function should be removed from std::list's interface and users
> forced to calculate the size by iterating through the list.
>
> It truly saddens me that so much time is wasted with details like this.  It's
> amazing everyone can't agree on this!

Now that we have a standard, and we have a size member, 2 is not
really an option.

Indeed, I have the impression that the set of containers is not as
well designed as the set of iterators (possibly because the notion of
them being merely examples).  For example, we have iterator_traits and
iterator_category, but we don't have a similar mechanism for
containers. We have the fast member/slow general function pairs for
iterators, but generally not for containers (and in the cases where we
have such members - like sort -, there is no algorithm which can use
them when appropriate, and use the general algorithm otherwise).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Howard Hinnant <hinnant@antispam.metrowerks.com>
Date: Tue, 31 Jul 2001 11:13:48 GMT
Raw View
In article <t3d76itqw6.fsf@watts.itp.tuwien.ac.at>, Christopher
Eltschka <celtschk@web.de> wrote:

| What is the value of splice() if it is not O(1)? After all, you get
| the same behaviour with ordinary copy/destroy.

I continually see the adverse performance effects of O(1) size
exaggerated for splice.  I believe this is yet another case.

When you have O(1) size, splice(iterator position, list& x, iterator
first, iterator last) is distinctly not the same behavior (or
performance) as a copy/destroy algorithm.  splice still /moves/ nodes.
It *does* *not* create new nodes and destroy old ones (assuming equal
allocators here) which /would/ be much more expensive.  The only extra
thing this variation of splice must do is:

      size_type delta = std::distance(first, last);

That is, it must count the number of increments it takes to get from
first to last.  That's it!  The constant on this O(some) operation is
/very/ small compared to copying and destroying list elements.

I am somewhat amazed at the interest this one line of code generates.
:-)

--
Howard Hinnant
Metrowerks

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Tue, 31 Jul 2001 11:14:12 GMT
Raw View
"Christopher Eltschka" <celtschk@web.de> wrote in message news:t3d76itqw6.fsf@watts.itp.tuwien.ac.at...

> What is the value of splice() if it is not O(1)? After all, you get
> the same behaviour with ordinary copy/destroy.

Time complexity isn't the whole story. Copy/destroy requires just that,
one copy construction and one destruction per element moved. Not to
mention a probable node allocation and freeing, plus a lot of link
restitching per element along the way. Counting nodes by chaining along
links, OTOH, is rather inexpensive.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Gawain Bolton <gbolton@wanadoo.fr>
Date: Sat, 28 Jul 2001 19:32:33 GMT
Raw View
Christopher Eltschka wrote:

> Howard Hinnant <hinnant@antispam.metrowerks.com> writes:
>
> > In article <dil3d7wixpm.fsf@isolde.research.att.com>, Matthew Austern
> > <austern@research.att.com> wrote:
> >
> > | No.  Some implementations compute the size each time,
> > | others maintain a member variable that contains the
> > | size.
> >
> > The standardeze that allows this is under 23.1/5:  size() has a
> > non-normative note:
> >
> > | Notes: Those entries marked ``(Note A)'' should have constant complexity.
> >
> > Not only is this statement non-normative but "should" doesn't mean
> > "must".
> >
> > | > The STL with gcc 3.0 does this, is there a reason it ?
> > | > Sure a list with a number_of_members is preferable ?
> > |
> > | Depends on what you're trying to optimize.  Maintaining
> > | a member variable that contains the size makes size()
> > | faster, but it increases the size of a list<> object
> > | and it makes other operations (such as some forms of
> > | list<>::splice) slower.  It's a tradeoff either way.
> > |
> > | The argument that finally convinced me it was better
> > | not to have such a member variable is that a user who
> > | wants to keep track of a list's size can just do it
> > | by hand; there's nothing the list can do internally
> > | that a user can't do just as well.  If a list<>
> > | implementation does have that extra member variable,
> > | however, there's no way for a user who doesn't need
> > | that extra overhead to avoid it.
> >
> > Just for diversity, I hold the opposite view of Matt. :-)
> >
> > The only operation I know of that an O(1) size makes slower is /one/
> > form of splice.  Overall, there are 5 variations on splice implemented
> > by 3 overloaded methods.  Here is their complexity assuming an O(1)
> > size:
> >
> > splice         |       from self    |    from other
> > ---------------+--------------------+------------------
> > 1 element      |     O(1)           |    O(1)
> > some elements  |     O(1)           |    O(some)
> > all elements   |  not valid         |    O(1)
> >
> > All list::methods besides splice(some elements, from other) can update
> > a size member variable in constant time; or in some methods that have a
> > linear complexity of K*size(), updating size() may take k*size() time
> > where k is much much smaller than K.  That is, updating size may be
> > linear, but is in the noise level with respect to the work that the
> > method has to do anyway.
> >
> > Furthermore, there are several list methods that can take advantage of
> > an O(1) size to /increase/ their performance:
> >
> > *  Obviously with an O(1) size, size() itself is faster.
> >
> > *  When a list::method must iterate to an indexed element in the list,
> > an O(1) size can be used to quickly find out whether it would be
> > quicker to iterate forwards from begin(), or backwards from end().
> > This technique can be used to speed up *operator=*, *resize*, and all
> > forms of *assign* except the variation taking input iterators (if the
> > iterators are not just input, then assign(it, it) can be sped up with
> > this technique too).
> >
> > *  The operators == and != are more efficient with an O(1) size since
> > you can check for equal size()'s before you start the element by
> > element compare.
> >
> > *  For implementing a traditional merge sort (member sort method of
> > list), it is easier (faster?) to cut the list in half when you have an
> > O(1) size.
> >
> > Yes, an O(1) size is going to cost an extra word of memory.  But the
> > Metrowerks std::list has an O(1) size, and the overhead is only 3
> > words.  Most list implementations have an overhead greater than this,
> > even if they have an O(N) size.  One word overhead on the stack plus
> > padded(2*sizeof(void*)+sizeof(T)) bytes of overhead on the heap is
> > common.  And I've seen bigger than that.
> >
> > One could get the overhead of list down to 2 words with an O(N) size.
> > But because of the adverse performance impact on several list methods,
> > while benefitting just one method, I believe that the O(N) size is not
> > a good deal from a technical standpoint.
> >
> > Additionally the standard "encourages" implementors to have an O(1)
> > size:
> >
> > *  23.1/5:  size() /should/ be O(1)
> > *  23.2.2.4/14: Complexity (of splice(some)): Constant time if &x ==
> > this; otherwise, linear time
> > *  Overall design that eliminates "inefficient" operations such as
> > push_front on vector, or operator[] on list.
> >
> > Combine the performance analysis, with the "encouragement" of the
> > standard, and I believe that size() should be O(1).
>
> Let me add a third possibility:
>
> Store the size, but allow it to be marked invalid, maybe through the
> value size_type(-1). Now all O(1) operation update the size if it is
> valid, all O(n) operations update the size in any case, and the
> "offending splice" invalidates the size. If the size has been
> invalidated (this can happen only by calling the "offending splice"),
> size() is O(N), otherwise it's O(1). Users who care about size()
> always being O(1) can just call size() directly after each offending
> splice. The pair offending splice+size would be O(N), and all "normal"
> calls to size() would be O(1). Users who care about fast splice would
> not do that call, and therefore get O(1) splice in every case.
>

Yes but to mark the size as "invalid" you may need to another member variable
and developpers out there seem to be very frugal when it comes to having a
member variable for the container size...

Unless, of course, you designate a value of 0 to indicate the size may not be
known.  Calculating the size again if it really is zero would be negligible.

What disappoints me most with this is that the C++ Standard Library is not
intuitive.  Why should the size be available in constant time for some
containers and not for others?  The size() function is far too important not to
be optimized.

Saying that users can keep track of size not the issue.  The problem is, it's
not obvious that this may need to be done.

I thought the philosophy of the standard library containers was to provide
functions which were efficient.  The idea being you are forced to do the
inefficient things in your code and so these are exposed rather then being
hidden in a standard library function.

For this reason, either:

1) The performance of size() should be worded "shall be O(1) (constant time)"
for all containers except perhaps after calling "offending" splice() function
(see above) in which case it may be O(n) (linear time).

or

2) The size() function should be removed from std::list's interface and users
forced to calculate the size by iterating through the list.

It truly saddens me that so much time is wasted with details like this.  It's
amazing everyone can't agree on this!


Gawain

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: jdavison@speakeasy.org (John Michael Davison)
Date: Sun, 29 Jul 2001 11:41:12 GMT
Raw View
I wrote:

> "...[STL implementations] certainly have the right to [calculate the size()
> is a list<> via [a theta(n) call].  If they are to conform to ISO/IEC
> 14882:1998, size() must be theta(1).

I was wrong: Howard Hinnant, Matvei Brodski, and Matthew Austern are correct.
The standard does indeed use the word "should," not "shall" or "must."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Christopher Eltschka <celtschk@web.de>
Date: Wed, 25 Jul 2001 21:05:56 GMT
Raw View
Howard Hinnant <hinnant@antispam.metrowerks.com> writes:

> In article <dil3d7wixpm.fsf@isolde.research.att.com>, Matthew Austern
> <austern@research.att.com> wrote:
>
> | No.  Some implementations compute the size each time,
> | others maintain a member variable that contains the
> | size.
>
> The standardeze that allows this is under 23.1/5:  size() has a
> non-normative note:
>
> | Notes: Those entries marked ``(Note A)'' should have constant complexity.
>
> Not only is this statement non-normative but "should" doesn't mean
> "must".
>
> | > The STL with gcc 3.0 does this, is there a reason it ?
> | > Sure a list with a number_of_members is preferable ?
> |
> | Depends on what you're trying to optimize.  Maintaining
> | a member variable that contains the size makes size()
> | faster, but it increases the size of a list<> object
> | and it makes other operations (such as some forms of
> | list<>::splice) slower.  It's a tradeoff either way.
> |
> | The argument that finally convinced me it was better
> | not to have such a member variable is that a user who
> | wants to keep track of a list's size can just do it
> | by hand; there's nothing the list can do internally
> | that a user can't do just as well.  If a list<>
> | implementation does have that extra member variable,
> | however, there's no way for a user who doesn't need
> | that extra overhead to avoid it.
>
> Just for diversity, I hold the opposite view of Matt. :-)
>
> The only operation I know of that an O(1) size makes slower is /one/
> form of splice.  Overall, there are 5 variations on splice implemented
> by 3 overloaded methods.  Here is their complexity assuming an O(1)
> size:
>
> splice         |       from self    |    from other
> ---------------+--------------------+------------------
> 1 element      |     O(1)           |    O(1)
> some elements  |     O(1)           |    O(some)
> all elements   |  not valid         |    O(1)
>
> All list::methods besides splice(some elements, from other) can update
> a size member variable in constant time; or in some methods that have a
> linear complexity of K*size(), updating size() may take k*size() time
> where k is much much smaller than K.  That is, updating size may be
> linear, but is in the noise level with respect to the work that the
> method has to do anyway.
>
> Furthermore, there are several list methods that can take advantage of
> an O(1) size to /increase/ their performance:
>
> *  Obviously with an O(1) size, size() itself is faster.
>
> *  When a list::method must iterate to an indexed element in the list,
> an O(1) size can be used to quickly find out whether it would be
> quicker to iterate forwards from begin(), or backwards from end().
> This technique can be used to speed up *operator=*, *resize*, and all
> forms of *assign* except the variation taking input iterators (if the
> iterators are not just input, then assign(it, it) can be sped up with
> this technique too).
>
> *  The operators == and != are more efficient with an O(1) size since
> you can check for equal size()'s before you start the element by
> element compare.
>
> *  For implementing a traditional merge sort (member sort method of
> list), it is easier (faster?) to cut the list in half when you have an
> O(1) size.
>
> Yes, an O(1) size is going to cost an extra word of memory.  But the
> Metrowerks std::list has an O(1) size, and the overhead is only 3
> words.  Most list implementations have an overhead greater than this,
> even if they have an O(N) size.  One word overhead on the stack plus
> padded(2*sizeof(void*)+sizeof(T)) bytes of overhead on the heap is
> common.  And I've seen bigger than that.
>
> One could get the overhead of list down to 2 words with an O(N) size.
> But because of the adverse performance impact on several list methods,
> while benefitting just one method, I believe that the O(N) size is not
> a good deal from a technical standpoint.
>
> Additionally the standard "encourages" implementors to have an O(1)
> size:
>
> *  23.1/5:  size() /should/ be O(1)
> *  23.2.2.4/14: Complexity (of splice(some)): Constant time if &x ==
> this; otherwise, linear time
> *  Overall design that eliminates "inefficient" operations such as
> push_front on vector, or operator[] on list.
>
> Combine the performance analysis, with the "encouragement" of the
> standard, and I believe that size() should be O(1).

Let me add a third possibility:

Store the size, but allow it to be marked invalid, maybe through the
value size_type(-1). Now all O(1) operation update the size if it is
valid, all O(n) operations update the size in any case, and the
"offending splice" invalidates the size. If the size has been
invalidated (this can happen only by calling the "offending splice"),
size() is O(N), otherwise it's O(1). Users who care about size()
always being O(1) can just call size() directly after each offending
splice. The pair offending splice+size would be O(N), and all "normal"
calls to size() would be O(1). Users who care about fast splice would
not do that call, and therefore get O(1) splice in every case.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Matthew Austern <austern@research.att.com>
Date: Mon, 16 Jul 2001 19:25:13 GMT
Raw View
Daniel Frey <daniel.frey@aixigo.de> writes:

> Matthew Austern wrote:
> >
> > "Fray Bentos" <fraybentos@bigfoot.com> writes:
> >
> > > Do all STL implementations calculate the size() is a list<> via :
> > >
> > > distance() call ...
> >
> > No.  Some implementations compute the size each time,
> > others maintain a member variable that contains the
> > size.
>
> IIRC the standard says that size() is an O(1)-function, right?

Not quite.  It says that size() "should" be O(1), not "shall".
The official meaning of "should" is exactly the same as the
meaning in ordinary English: the standard says that it's a
good idea for it to be O(1), but it does not say that an
implementation where size() is O(N) is nonconforming.

This may seem like a delicate distinction, but it was quite
deliberate.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: "Fray Bentos" <fraybentos@phreaker.net>
Date: Tue, 17 Jul 2001 20:31:34 GMT
Raw View
Thanks to all of you that replied.
I had my suspicions that this would be the case.

Using the same sample code as b4

int func (int i)
{
  mylist.push_back(i);
  return mylist.size()-1;
}

for(int i=0; i<1000; i++)
   func(i);

it is actually faster to use a vector<> and suffer the reallocation
evertime !

vector<>.size() therefore O(const) whereas list<>.size() has O(n).

One more question OT, is this likely to be standardised ?

Later

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: jdavison@speakeasy.org (John Michael Davison)
Date: Tue, 17 Jul 2001 22:38:45 GMT
Raw View
Fray Bentos ("fraybentos@bigfoot.com") writes:

>Do all STL implementations calculate the size() is a list<> via [a theta(n)
>call]?

        No, but they certainly have the right to.  If they are to conform to
ISO/IEC 14882:1998, size() must be theta(1).

        From URL "http://www.stlport.org/resources/StepanovUSA.html", Alexander
Stepanov writes:

>size() used to be linear time in case of STL lists. It was the right decision
>since if a user wants to keep a count it is easy to do, but usually you do not
>need it and it makes splice linear time (it used to be constant).  But the
>standard committee insisted that I change it to constant time. I had no
>choice. We will have to change this requirement eventually.

        As you can see from URL "http://www.sgi.com/tech/stl/stl_list.h",
Stepanov et al have taken the matter into their own hands.

        ISO/IEC 14882:1998 (Section 23.1, Table 65, Note A) says that a
Container's size() method complexity must be constant time.  ISO/IEC 14882:1998
(Section 23.2.2.4, line 14) says that the complexity of splice is linear time.

        The STL permits a wider range of implementations than ISO/IEC
14882:1998.  From "http://www.sgi.com/tech/stl/List.html":

          "[list<T>::size()] Returns the size of the list. Note: you
          should not assume that this function is constant time. It
          is permitted to be O(N), where N is the number of elements
          in the list. If you wish to test whether a list is empty,
          you should write L.empty() rather than L.size() == 0.

        The documentation for the STL-specific "slist" template says the same
thing.

        I know nothing about the current goings-on in the ISO committee, so I
can't offer a meaningful opinion on exactly how this is going to end up.  The
best I can say is that Stepanov has his reasons, articulated in the above
interview.  If you want your code to have similar performance characteristics
across various C++ environments, you must provide for the possibility that your
list class will have a slow size() operation _or_ a slow splice() operation.
The workaround may involve writing your own custom template class.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Matvei Brodski <mbrodski@bear.nospam.com>
Date: Wed, 18 Jul 2001 16:52:06 GMT
Raw View
John Michael Davison wrote:

> Fray Bentos ("fraybentos@bigfoot.com") writes:
>
> >Do all STL implementations calculate the size() is a list<> via [a theta(n)
> >call]?
>
>         No, but they certainly have the right to.  If they are to conform to
> ISO/IEC 14882:1998, size() must be theta(1).

[snip]

>         ISO/IEC 14882:1998 (Section 23.1, Table 65, Note A) says that a
> Container's size() method complexity must be constant time.

Excuse me, it does not say that. Quote: "entries marked '(Note A)'
*should* have constant complexity". This is rather different from
"must be constant time".

The word "should" is used rather infrequenctly in the Standard to assume
it is to be understood as "must". Consider 15.2[3]: "... destructors should
generally catch exceptions and not let them propagate out..."


Matvei.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: "Fray Bentos" <fraybentos@bigfoot.com>
Date: Mon, 16 Jul 2001 17:03:11 GMT
Raw View
Do all STL implementations calculate the size() is a list<> via :

distance() call ...

int s=0;
while(iterator i=begin(); i!=end(); i++,)
 s++;

-----

The STL with gcc 3.0 does this, is there a reason it ?
Sure a list with a number_of_members is preferable ?

I knwo Im probably missing somthing here but it really bies for code
such as :

int func (int i)
{
  mylist.push_back(i);
  return mylist.size()-1;
}

for(int i=0; i<1000; i++)
   func(i);


O(n)

Thanks

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Matthew Austern <austern@research.att.com>
Date: Mon, 16 Jul 2001 17:38:14 GMT
Raw View
"Fray Bentos" <fraybentos@bigfoot.com> writes:

> Do all STL implementations calculate the size() is a list<> via :
>
> distance() call ...

No.  Some implementations compute the size each time,
others maintain a member variable that contains the
size.

> The STL with gcc 3.0 does this, is there a reason it ?
> Sure a list with a number_of_members is preferable ?

Depends on what you're trying to optimize.  Maintaining
a member variable that contains the size makes size()
faster, but it increases the size of a list<> object
and it makes other operations (such as some forms of
list<>::splice) slower.  It's a tradeoff either way.

The argument that finally convinced me it was better
not to have such a member variable is that a user who
wants to keep track of a list's size can just do it
by hand; there's nothing the list can do internally
that a user can't do just as well.  If a list<>
implementation does have that extra member variable,
however, there's no way for a user who doesn't need
that extra overhead to avoid it.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Daniel Frey <daniel.frey@aixigo.de>
Date: Mon, 16 Jul 2001 18:42:46 GMT
Raw View
Matthew Austern wrote:
>=20
> "Fray Bentos" <fraybentos@bigfoot.com> writes:
>=20
> > Do all STL implementations calculate the size() is a list<> via :
> >
> > distance() call ...
>=20
> No.  Some implementations compute the size each time,
> others maintain a member variable that contains the
> size.

IIRC the standard says that size() is an O(1)-function, right?

Regards, Daniel

--
Daniel Frey

aixigo AG - financial training, research and technology
Schlo=DF-Rahe-Stra=DFe 15, 52072 Aachen, Germany
fon: +49 (0)241 936737-42, fax: +49 (0)241 936737-99
eMail: daniel.frey@aixigo.de, web: http://www.aixigo.de

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Howard Hinnant <hinnant@antispam.metrowerks.com>
Date: Mon, 16 Jul 2001 19:10:13 GMT
Raw View
In article <dil3d7wixpm.fsf@isolde.research.att.com>, Matthew Austern
<austern@research.att.com> wrote:

| No.  Some implementations compute the size each time,
| others maintain a member variable that contains the
| size.

The standardeze that allows this is under 23.1/5:  size() has a
non-normative note:

| Notes: Those entries marked ``(Note A)'' should have constant complexity.

Not only is this statement non-normative but "should" doesn't mean
"must".

| > The STL with gcc 3.0 does this, is there a reason it ?
| > Sure a list with a number_of_members is preferable ?
|
| Depends on what you're trying to optimize.  Maintaining
| a member variable that contains the size makes size()
| faster, but it increases the size of a list<> object
| and it makes other operations (such as some forms of
| list<>::splice) slower.  It's a tradeoff either way.
|
| The argument that finally convinced me it was better
| not to have such a member variable is that a user who
| wants to keep track of a list's size can just do it
| by hand; there's nothing the list can do internally
| that a user can't do just as well.  If a list<>
| implementation does have that extra member variable,
| however, there's no way for a user who doesn't need
| that extra overhead to avoid it.

Just for diversity, I hold the opposite view of Matt. :-)

The only operation I know of that an O(1) size makes slower is /one/
form of splice.  Overall, there are 5 variations on splice implemented
by 3 overloaded methods.  Here is their complexity assuming an O(1)
size:

splice         |       from self    |    from other
---------------+--------------------+------------------
1 element      |     O(1)           |    O(1)
some elements  |     O(1)           |    O(some)
all elements   |  not valid         |    O(1)

All list::methods besides splice(some elements, from other) can update
a size member variable in constant time; or in some methods that have a
linear complexity of K*size(), updating size() may take k*size() time
where k is much much smaller than K.  That is, updating size may be
linear, but is in the noise level with respect to the work that the
method has to do anyway.

Furthermore, there are several list methods that can take advantage of
an O(1) size to /increase/ their performance:

*  Obviously with an O(1) size, size() itself is faster.

*  When a list::method must iterate to an indexed element in the list,
an O(1) size can be used to quickly find out whether it would be
quicker to iterate forwards from begin(), or backwards from end().
This technique can be used to speed up *operator=*, *resize*, and all
forms of *assign* except the variation taking input iterators (if the
iterators are not just input, then assign(it, it) can be sped up with
this technique too).

*  The operators == and != are more efficient with an O(1) size since
you can check for equal size()'s before you start the element by
element compare.

*  For implementing a traditional merge sort (member sort method of
list), it is easier (faster?) to cut the list in half when you have an
O(1) size.

Yes, an O(1) size is going to cost an extra word of memory.  But the
Metrowerks std::list has an O(1) size, and the overhead is only 3
words.  Most list implementations have an overhead greater than this,
even if they have an O(N) size.  One word overhead on the stack plus
padded(2*sizeof(void*)+sizeof(T)) bytes of overhead on the heap is
common.  And I've seen bigger than that.

One could get the overhead of list down to 2 words with an O(N) size.
But because of the adverse performance impact on several list methods,
while benefitting just one method, I believe that the O(N) size is not
a good deal from a technical standpoint.

Additionally the standard "encourages" implementors to have an O(1)
size:

*  23.1/5:  size() /should/ be O(1)
*  23.2.2.4/14: Complexity (of splice(some)): Constant time if &x ==
this; otherwise, linear time
*  Overall design that eliminates "inefficient" operations such as
push_front on vector, or operator[] on list.

Combine the performance analysis, with the "encouragement" of the
standard, and I believe that size() should be O(1).

--
Howard Hinnant
Metrowerks

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]