Thread

Topic: Lib Issue 69 - contiguous std:vector - Why ?

Author: Thomas Matelich <tmatelich@zetec.com>
Date: 2000/01/26 Raw View

Ken Hagan wrote:

> If all parts of the system are written in shiny new ISO C++, then
> your arguments are correct,but there's a lot of software out there
> whose interfaces are defined in terms of "pointer to C-style array".
> If the std::vector was not contiguous, then we would have to
>     write a non-standard vector which is contiguous,
> or
>     go back to using C-style arrays.
>
> Since neither is very attractive, the standard needs to provide
> a contiguous O(1) container. Historically, people have expected
> the std::vector to fulfil this role. (Naturally, you are free to
> define other O(1) containers.)
>
>

I've been looking around and have been unable to find efficiency
requirements on valarray, esp. subscripting.  Could someone enlighten me?

thanks

--
Thomas O Matelich
Senior Software Designer
Zetec, Inc.
sosedada@usa.net
tmatelich@zetec.com


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Pavel Mayer" <pavel@datango.de>
Date: 2000/01/27 Raw View

Ken Hagan <K.Hagan@thermoteknix.co.uk> wrote:
>
> If all parts of the system are written in shiny new ISO C++, then
> your arguments are correct,but there's a lot of software out there
> whose interfaces are defined in terms of "pointer to C-style array".
> If the std::vector was not contiguous, then we would have to
>     write a non-standard vector which is contiguous,

Thank you for confirming my suspicion that "pointer to C-style array"
compatibility was the main reason.

However, one point I tried to make was that just the requirement of a
contiguous vector alone IMHO does not guarantee the desired safety for
"pointer to C-style array" usage, and that other requirements have to be
fullfilled to make "pointer to C-style array" of std::vector usage safe.
("Solution 2" in my original posting)

Or does the standard or the proposed clarification already say that all
elements of a vector must be initialized immediately and must be accessible
via any value_type* in the range &V[0]..&V[V.size()-1]) obtained by
outside-of-the-container pointer arithmetics instead of
operator&(vector::operator[]) ?

Wouldn't the example I gave be a conforming implementation of std::vector
and fulfill even the "contiguous allocation" requirement, but be hazardous
to use with "pointer to C-style array" interfaces ?
--
Pavel Mayer (pavel@datango.de)
 "Without chaos, nothing is created. Without order, nothing can exist."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Pavel Mayer" <pavel@datango.de>
Date: 2000/01/14 Raw View

INTRO:
--------
In a recent discussion on c.l.c.m I became aware of Library Issue 69, adding
the requirement "&V[n] == &V[0] + n" to be fullfilled by implementations of
std::vector. Everything I found on that topic was a brief notice on
http://www.comeaucomputing.com/iso/libr8.html and a December 99 posting by
someone in this group, but no replies. This together with the results of the
ongoing discussion on c.l.c.m,  left me with the feeling that the current
resolution of Library Issue 69 is bad, but I am willing to accept any
reasonable explanation why my concerns are unfounded.

Maybe that I am concerned about something that is of little practical
consequences, and I am a little bit surprised myself how much energy I have
now devoted to this tiny aspect of the library. OTOH std::vector is probably
the most heavily used class in the standard library (at least by myself),
and my desire of "getting it right" is very deep at this point.

BRIEF:
--------
I can not fully understand the rationale behind the "contiguous std::vector"
requirement. One obvious reason seems to be that the expression "&V[n]" can
be used to obtain a "T*" into a buffer that can be used in bulk transfer
operations like the following:

std::vector<char> V(9);
std::cin.read(&V[0],9);

Now, in a case where "std::vector<T>::iterator" is not implemented as T*,
there will be a good reason for that. Whatever the reason is, it will be
probably out-levered with a "&V[0]+n" usage, so it does not seem to be a
legitimate reason for requiring contiguous allocation. The same applies for
non-modifying access via const T*.

Assuming that the LWG consists of people with much more knowledge and
experience with C++ than me I would really like to know from them what led
to the "clearly yes" on the question "Must elements of a vector be
contiguous ?".

My best belief regarding the answer to this question is : Why, the hell ? It
does not help, and even may do harm because it provides false security,
while unnecessarily ruling out implementations promising better performance
in certain cases.

I do not feel very comfortable with an opinion beeing  the exact opposite of
the consensus of the LWG, and the probability is high that I am wrong, but I
can not accept the explanation "practical matters" as convincing. Please
help to shed some light on this issue.

MORE DETAILS:
-------------------

Using a a std::vector, especially of POD types seems to be a pretty safe and
convenient way to get an exception-safe buffer with easy-to-understand copy
semantics and a lot of other neat properties.

However, portability problems arise when "std::vector<T>::iterator" is not
implemented as T* and V.begin() is used to obtain a pointer to the buffer
like in this example:

std::vector<char> V(9);
std::cin.read(V.begin(),9); // does not compile if iterator is not char*,
otherwise OK

OTOH, the expression &V[n] can only be used to obtain a valid T* just for
the nth element; nothing can be deduced for the adress of any other element.

Now, the requirement "&V[n] == &V[0] + n" seems to solve that issue; if the
adress of one element is known, the adress of every other element can be
computed without any further expressions involving the std::vector or its
iterator. It seems that the pointer now can be passed somewhere and used
there as a valid pointer into a C-style array, acting like a T[] type:

std::vector<char> V(9);
char* buffer = &V[0];
std::cin.read(buffer,9);

However, given a range checking implementation of vector, we have now
out-levered this range checking, without beeing warned by the compiler.

And even worse, there might be a (hypothetical) implementation of
std::vector that chooses not to construct the elements when itself is
constructed, but to defer it until the first time the element is needed,
like the following sketch:

template <class T,..>
lazy_iterator<T> {
...
    Reference operator*() const {
        return (*container)[i];
    }
    Pointer operator->() const {
         return &(*container)[i];
    }
...
private:
    lazy_vector<T,..>* container;
    int element_number;
};

template <class T,..>
class lazy_vector<T,..> {
    ...
    typedef lazy_iterator<T,..> iterator;
    ...
    explicit lazy_vector(int n) {
        data = mem.allocate(n * sizeof(T));
    }
    ...
    T& operator[](int i) {
        if (!element_constructed[i]) {
            new (data+i) T();
            element_constructed[i] = true;
        }
        return *(data+i);
    }
    ...
private:
     std::vector<bool> element_constructed;
     T* data;
     allocator_type mem;
    ...
};

Wouldn't such an implementation of std::vector conform to the standard, or
is it already ruled out by some requirement I am not aware of ?
And wouldn't it be deadly to access an element of this vector with any form
of &V[0]+n instead of  &V[n] or through the iterator ?

Regarding the pratical issues: Even if it might not be wise to use such a
lazy_vector implementation on for std::vector<char>, it IMO might be useful
to have it allowed as a specialization on a type that takes long time to
construct, for example remote objects. (Maybe in such a case the proxy would
be a better place to implement this deferred construction behaviour, but
sometimes people like to have a choice)

Even if this example might not be a perfectly valid one, I can imagine other
examples where relying on a "&V[n] == &V[0] + n" property of std::vector
does not solve a problem, but easily introduces new trouble.

The only legitimate usage I can imagine is to use the "continuity property"
is to use it to use it as an end marker in a comparison, but I dont know why
someone might want to do that:

std::vector<T> V(10);
T* end = &V[0] + 5;
for (int i = 0; &V[i]  != end; ++i ) do_something();

CONCLUSION:
------------------
With my current state of knowledge, I can not see any benefit in the
requirement of
"&V[n] == &V[0] + n" beeing true for all elements of std::vector. Therefore
I regard it as a requirement that unnecessarily rules out implementations
that might have better performance characteristics, without actually giving
any real benefit in exchange for this restriction.

Otoh, I would like to be able to use std::vector as a substitute for C-type
arrays in as many occasions as possible, especially serving as convenient
buffer for bulk transfer interfaces that copy many elements of one type in
one call, and I would like to rely on the fact that std::vector is an
efficient and lightweight all-purpose container.

POSSIBLE SOLUTIONS:
----------------------------

Solution 1)
Lib Issue 69 is resolved by anwering the question "Must elements of a vector
be contiguous ?" with "No".

Consequences of Solution 1: &V[0]+n definitely can not be used in a safe
manner on on std::vector, which is the case with all other containers.
Instead, if V.begin() can be used to obtain a pointer in a type-safe, but
not portable manner. (if vector<T>::iterator is T*, we can be sure there are
no surprises in stock, if not the compiler doesnt let it through)

Solution 2) (maybe this all is already given, but I am not aware of it):
Something is put into the standard that makes sure that a) &V[0]+n (for all
n where 0<n<V.size()) at any time is the adress of an already initialized
element and  b.) the container implementation can not rely on that its
elements are only accessed through its member functions and through
iterators, meaning that a function returning a reference or iterator to some
element has been called prior access to this element

Solution 2a): Solution 2) is only required for vector of POD types
Solution 2b): Solution 2) is only required for vector of builtin types

Consequences of Solution 2:
std::vector<T> is considered more like a T[] and less like the other
containers; meaning that std::vector can not be easily exchanged for another
container type (e.g. deque), even if the interface of the other container
supports all necessary functions.

Consequences of Solution 2a):
Somehow seems to restricts the contiguous allocation requirement to the
useful cases, but makes all more complicated and does not increase safety of
use.

Consequences of Solution 2b):
Makes it much harder to argue against it.

Solution 3)
A function like in basic_string is added to the std::vector<T> interface:

const T* data() const;

Consequences of Solution 3:
Implementers have to extend the interface and implementation of
std::vector<>, all the documents and books have to be updated, and some
people will be confused, but typesafe and portable read access to vector
will be granted.

Solution 4)
Two functions like in basic_string are added to the std::vector<T>
interface:
T* data();
const T* data() const;

Consequences of Solution 4:
Same as Solution 3 + write access + handing out a T* into the container
might be considered as evil, but this wont be more evil than using &V[0] for
this purpose.

FINAL REMARK
--------------------
Confronted with a choice between these four solutions none of them really
satisfies me, but beeing forced to choose now I would say: As a standard
library implementer,  Solution 1 would get my vote. As a casual user, I
would prefer Solution 2a or 3. Beeing neither, number 4 would gets my vote
for now, living with Solution 1 for the next years.

Maybe there is an easy and short answer that satisfies my curiosity,
something I have not been aware of because of my incomplete knowledge of the
standard and only partial understanding of deeper coherences, but maybe I
have also shed some light onto something that might otherwise went along
unnoticed.

Thanks for reading such a long posting
--
Pavel Mayer (pavel@datango.de)
 "Without chaos, nothing is created. Without order, nothing can exist."




[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Ken Hagan" <K.Hagan@thermoteknix.co.uk>
Date: 2000/01/17 Raw View

If all parts of the system are written in shiny new ISO C++, then
your arguments are correct,but there's a lot of software out there
whose interfaces are defined in terms of "pointer to C-style array".
If the std::vector was not contiguous, then we would have to
    write a non-standard vector which is contiguous,
or
    go back to using C-style arrays.

Since neither is very attractive, the standard needs to provide
a contiguous O(1) container. Historically, people have expected
the std::vector to fulfil this role. (Naturally, you are free to
define other O(1) containers.)




[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]