Topic: list::size complexity (again)


Author: jpotter@falcon.lhup.edu (John Potter)
Date: Fri, 3 Jan 2003 16:44:38 +0000 (UTC)
Raw View
On Thu, 2 Jan 2003 20:30:08 +0000 (UTC), hinnant@metrowerks.com (Howard
Hinnant) wrote:

> Another possibility would be to introduce a new signature designed to
> reclaim a guaranteed O(1) splice-some-from-other:

> void splice(iterator position, list& x, size_type some,
>             iterator first, iterator last);

> Pre:  some == distance(first, last)

> When computing first and last, it is not uncommon that the client code
> will have the opportunity to compute "some" at no added cost

Of course, the ctors for all sequences should also have this to be
consistent.  Likely the insert operations also.

   Seq (size_type some, Iiter first, Iiter past);
   void insert (iterator pos, size_type some, Iiter first, Iiter past);

Yes, I know that they are O(some) anyway, but would be an improvement.

Having used the O(1) splice members and never used the O(N) member, I
wonder if the member is only there because it could be?  Any algorithm
where it makes sense?  Looking for education here.

John

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Fri, 3 Jan 2003 18:48:14 +0000 (UTC)
Raw View
In article <3e1595c7.948765@news.earthlink.net>, John Potter
<jpotter@falcon.lhup.edu> wrote:

| On Thu, 2 Jan 2003 20:30:08 +0000 (UTC), hinnant@metrowerks.com (Howard
| Hinnant) wrote:
|
| > Another possibility would be to introduce a new signature designed to
| > reclaim a guaranteed O(1) splice-some-from-other:
|
| > void splice(iterator position, list& x, size_type some,
| >             iterator first, iterator last);
|
| > Pre:  some == distance(first, last)
|
| > When computing first and last, it is not uncommon that the client code
| > will have the opportunity to compute "some" at no added cost
|
| Of course, the ctors for all sequences should also have this to be
| consistent.  Likely the insert operations also.
|
|    Seq (size_type some, Iiter first, Iiter past);
|    void insert (iterator pos, size_type some, Iiter first, Iiter past);
|
| Yes, I know that they are O(some) anyway, but would be an improvement.

I can't tell if you forgot the similey face or not.  Oh, well.  It was
just a suggestion.

| Having used the O(1) splice members and never used the O(N) member, I
| wonder if the member is only there because it could be?  Any algorithm
| where it makes sense?  Looking for education here.

I also mainly use the O(1) variants.  But any algorithm that performs a
"divide and conquer" strategy over the list might need this.  An
obvious candidate would be sort, but list already provides that one
anyway.  But imagine that you need to take a list and perform some
operation on it.  And the easiest way you can think of is to perform
the operation recursively on the first half and the last half of the
list, and then somehow splice the results back together.  A partition
algorithm maybe?  Here is what it might look like:

template <class T, class A>
void
op(std::list<T, A>& c)
{
    if (c.empty())
        return;
    using std::list;
    typedef typename list<T,A>::size_type size_type;
    // next 5 lines sub-optimal if size() is O(N)
    size_type size = c.size();
    size_type lower_size = size / 2;
    size_type upper_size = size - lower_size;
    typename list<T,A>::iterator i = c.begin();
    std::advance(i, lower_size);
    //
    list<T, A> upper_half(c.get_allocator());
    upper_half.splice(upper_half.end(), c, /*proposed*/upper_size, i,
c.end());
    op(c);
    op(upper_half);
    c.merge(upper_half, some_predicate);
}

--
Howard Hinnant
Metrowerks

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: dietmar_kuehl@yahoo.com (Dietmar Kuehl)
Date: Thu, 2 Jan 2003 19:21:51 +0000 (UTC)
Raw View
hinnant@metrowerks.com (Howard Hinnant) wrote:
[rather thorough and fair analysis removed]

Like so often it comes down to choosing what the average user will use if we
have just one "list" class. What you presented looks to me like a strong
argument for making 'size()', if present, a constant time operation. On the
other hand, if splicing lists is needed heavily in an application, this
operation should be constant time - in all cases, too.

Obviously, the "correct" approach is to have [at least] two "list" classes:
one with a constant time 'size()' operation, one with a constant time
'splice()' operation. Whether these are just two different instantiations of
some class template (using some kind of policy based design), completely
separate classes provided by the standard library, classes shipped by some
third party, or whatever is another matter, though. Fortunately, the STL
relieves us from writing algorithms one these classes and all what is needed
is to write the mere data structure: writing a list tailored to the specific
needs of an application should be a pretty simple task.

We should always remember that the container classes in the standard C++
library are a reasonable selection of containers to start with (although adding
a hash based associative container is probably a reasonable addition). However,
it is pretty easy to add new containers with new properties to fit the need of
an application. The standard containers are just quite useful examples for
containers. Actually, I like the idea of using simple container wrappers which
are initially implemented in terms of the standard containers and replaced by
better suited ones if profiling shows that the current choice produces a
bottleneck (I learned about this from James Kanze). However, I have to admit
that I don't life this idea: I'm normally just using a container type I think
is appropriate...
--
<mailto:dietmar_kuehl@yahoo.com> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Thu, 2 Jan 2003 20:30:08 +0000 (UTC)
Raw View
In article <5b15f8fd.0301020708.b2f54b9@posting.google.com>, Dietmar
Kuehl <dietmar_kuehl@yahoo.com> wrote:

| Obviously, the "correct" approach is to have [at least] two "list" classes:
| one with a constant time 'size()' operation, one with a constant time
| 'splice()' operation. Whether these are just two different instantiations of
| some class template (using some kind of policy based design), completely
| separate classes provided by the standard library, classes shipped by some
| third party, or whatever is another matter, though.

Yeah.  I'd like to restate your paragraph above from a slightly
different viewpoint:

      A list with an O(1) size, and a list with an O(N) size
      are two different types of containers.  We shouldn't
      pretend otherwise.

Another possibility would be to introduce a new signature designed to
reclaim a guaranteed O(1) splice-some-from-other:

void splice(iterator position, list& x, size_type some,
            iterator first, iterator last);

Pre:  some == distance(first, last)

When computing first and last, it is not uncommon that the client code
will have the opportunity to compute "some" at no added cost (for
example to cut a list in half by splicing the latter half out).  It is
a shame that the client must throw this information away only to have
it (possibly) recomputed by the list::splice implementation.

The precondition introduces an opportunity for a runtime error.  But it
is an error that could easily be guarded against in a debug build.

--
Howard Hinnant
Metrowerks

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Thu, 2 Jan 2003 06:46:43 +0000 (UTC)
Raw View
In article <3e0de8e4.98554109@news.earthlink.net>, John Potter
<jpotter@falcon.lhup.edu> wrote:

| On Fri, 27 Dec 2002 21:18:50 +0000 (UTC), hinnant@metrowerks.com (Howard
| Hinnant) wrote:
|
| > 23.1/5 has this to say about the complexity of size():
|
| > | should have constant complexity
|
| > Here, "should" does not mean "must".  It means "may or may not".  In
| > practice, all of the std::containers have a constant complexity size,
| > except for std::list.  On this container some implementations have a
| > linear complexity size (implemented as if std::distance(c.begin(),
| > c.end())), while other implementations (including Metrowerks) have a
| > constant complexity size.
|
| Here the time tradeoff is simple.  We get O(1) for either size() or
| void list<T>::splice (iter i, list<T>& d, iter f, iter p)
|
| Since I have found no use for either of them, I would need to decide in
| total ignorance of usage.  If I make size constant time, there is no way
| for the user to make splice constant time.  If I make splice constant
| time, the user can keep track of size as well as the implementation.  It
| seems a no brainer; however, Metrowerks and others made the other
| choice.  I guess that it was based on the average user not using splice
| and using size when it was not needed.  Having seen many search bound
| applications use list, I guess that does make sense for the average
| user.

I think it is a no brainer too. ;-)

Just kidding, there are good arguments on both sides.  What you have
stated are the usual arguments.  But the usual arguments are not the
complete story.

In addition to what you have said, I would like to add:

There are 5 variations of list::splice:  You can splice one, some, or
all elements from one list to another, or from within a single list
from one place to another.  The combination of "all" and "from self" is
not valid.  One of the five remaining combinations is O(some) if size
is O(1):

   list::splice complexity with O(1) size
+------+-----------------+-----------------+
|      |     from self   |     from other  |
+------+-----------------+-----------------+
| one  |      O(1)       |      O(1)       |
+------+-----------------+-----------------+
| some |      O(1)       |     O(some)     |
+------+-----------------+-----------------+
| all  |    not valid    |      O(1)       |
+------+-----------------+-----------------+

So saying:

| splice is not constant time if size is

is an oft quoted statement that is an exaggeration.  And in the case of
splicing some from other, the constant on O(some) is pretty small.  All
it has to do is something like:

    delta_size = distance(first, last);

Everything else done while splicing "some from other" is constant time.

That is the full story as far as splice is concerned.  But it is not
yet the full story of list::size.  Having an O(1) size can also
positively effect some of the other list functions:

resize:

This function is more efficient if it has an O(1) size to work with.
The function must compute or retrieve the list's size before doing any
work.  Then based on the current size and the new size, the list will
either do an erase or an insert.  In general this does not change the
complexity of resize.  But in some specific cases, the performance is
very much improved with an O(1) size:

                          O(1) size         O(N) size
                          ---------         ---------
c.resize(c.size());         O(1)            O(size())
c.resize(c.size()-1);       O(1)            O(size())
c.resize(c.size()+1);       O(1)            O(size())

In general, if the new size and the old size are similar, then O(1)
size can give resize a significant boost.

operator==
operator!= :

These functions can significantly benefit from an O(1) size.  Before
doing the element by element check, you simply check the sizes first.
Again, this doesn't change the complexity of these operations, but it
can be a significant performance boost.

operator=
assign:

Given an O(1) size, assignment from another list or from anything but
true input iterators can be given the strong safety exception guarantee
if T's copy constructor and assignment are no throw (deque, list and
vector all have some member functions whose exception safety guarantee
depends on this condition).  This increase in exception safety (from
basic) comes at no additional cost in cpu or memory (if size is O(1)).

None of the above benefits can be achieved by having the user keep
track of the size himself.

The final benefit of an O(1) size is harder to measure:  A consistent,
safe, and intuitive interface.

Most people expect size to be O(1) for any given container (else why
would Scott Meyers feel he needed to devote an issue to warning about
list::size).  The standard itself says size /should/ /be/ O(1).
Standardeze aside, this sounds like size *should* *be* O(1)!

If list::size is intended to be O(N), then imho it should not be part
of list's interface.  Such a size provides no benefit since client code
can just as easily write distance(c.begin(), c.end()).  Such a size can
easily fool the unwary into writing very inefficinet code:

    for (int i = 0; i < c.size(); ++i)
         // do something

Indeed the primary reason list does not have an operator[](size_type)
is so that clients are not tricked into writing inefficient code.
Implementing an operator[](size_type) for list is an easy task, it just
isn't a good idea.  Same goes for an O(N) size, imho.  I would prefer
to get a compile time error, rather than an O(N) operation if I write:

   c.size();

--
Howard Hinnant
Metrowerks

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]