Topic: std::distance


Author: Tom <spamfood@nothing.com>
Date: Fri, 19 Dec 2008 15:38:38 CST
Raw View
Hello,

Regarding the present wording of the std::distance function some issues came up.


To start with, here's the present wording (with the same issues
applying to the present draft version):

24.3.4 Iterator operations [lib.iterator.operations]

template<class InputIterator>
typename iterator_traits<InputIterator>::difference_type
distance(InputIterator first, InputIterator last)

4 Effects: Returns the number of increments or decrements needed to
get from first to last.
5 Requires: last must be reachable from first.


And here are my two questions:
Firstly, is it possible (valid code) that, for random access
iterators, first > last ?
Secondly, if the answer to the first question is true, what is the
sign of the return value if first > last ?


Here are my points why I raise this question. I will first quote all
the relevant sections of the standard (2003 version - there are again
no relevant changes in the present draft), and then explain the
oddities.


24.1 Iterator requirements [lib.iterator.requirements] /
1: "Iterators are a generalization of pointers..."
2: "Since iterators are an abstraction of pointers, their semantics is
a generalization of most of the semantics of pointers in C++..."
6: "An iterator j is called reachable from an iterator i if and only
if there is a finite sequence of applications of the expression ++i
that makes i == j."

Table 76:
Expression: b - a
return type: Distance
operational semantics: (a<b) ? distance(a,b) : -distance(b,a)
precondition: there exists a value n of Distance such that a + n == b.
b == a + (b - a)

24.3.4 Iterator operations [lib.iterator.operations] /
1: "Since only random access iterators provide + and - operators, the
library provides two function templates advance and distance. These
function templates use + and - for random access iterators (and are,
therefore, constant time for them)..."


If you read this the following inconsistencies should be noted:

24.3.4/4 says "the number of increments or decrements needed to get
from first to last".
Pay attention to the phrases 'the number' and 'decrements'. These
imply two things: Firstly, the 'the number' implies that the return
value has to be positive - after all, the number of operations is an
integer ratio scale variable with a natural zero bound. Of greater
importance is the 'decrements', which I suppose implies that under
some circumstances, last can be reached from first by applying
operator-- (or an equivalent semantic).

However  24.3.4/5 says that "last must be reachable from first", and
24.1/6 says "An iterator j is called reachable from an iterator i if
and only if there is a finite sequence of applications of the
expression ++i that makes i == j". This excludes operator-- and, thus
for std::distance the condition first <= last must hold, in which case
of course decrements are not possible (with the exception of the quite
uninteresting case of zero decrements...).

On the other hand 24.1 / 1-2 mention the similarity of iterators to
pointers, and 24.3.4 / 1 says for random access iterators, operator-
is used - which is certainly valid if first > last (and results in a
negative value). The sign of the return value of std::distance on the
other hand plays a role in conjunction with table 76. Now there are
two issues here. Firstly, as operator- semantic of random iterators is
defined via the use of std::distance, it follows that the whole
operator- semantic remains undefined as long as std::distance is not
clearly defined (and in my opinion, presently it is NOT). However,
let's assume we know what operator- shall do because we know what
pointers do. Then the second issue arises: If we assume a and b are
plain pointers, than the result of b - a should be negative (or zero)
of course if (a < b) == false, which then evaluates to -distance(b,a)
- this in turn now implies that std::distance must return a positive
value under all circumstances, which is of course at odds with general
pointer arithmetic behavior - which breaks the concept of iterators
being a generalization of pointers (at least for std::distance). After
all it would be very strange to expect that if b and a are pointers,
and a > b, that std::distance(a, b) returns a positive value, but b -
a returns a negative value (or in other words, if you have pointers as
iterators, std::distance behaves inferiorly to plain odd pointer
arithmetic as the sign information is lost. I think you get the
idea...).

So I conclude something is rotten here.


Now we move away from the Standard and take a look at some practical issues.

Regarding implementations, I tested the GNU MINGW compiler (but forgot
the version - but it was definitively a modern release) and Microsoft
Visual Studio 2005 Express Edition. Both implementations return a
negative value for std::distance if random access iterators are passed
as arguments, and first > last. If (!) it is valid that first can be >
last (that is, 24.1/6 is relaxed), than this behavior is at odds with
the verbal description of std::distance and table 76 (because of the
sign), but it is in synch with 24.1 /1-2 and 24.3.4/1 and 'common
sense / expectations' as derived from plain odd pointers.


Regarding commonly used references, results were also equivocal:

'The C++ Standard Library. A tutorial and reference' by Josuttis
(1999, 7th printing August 2001), says (7.3.2) that:
a) both iterators have to refer to elements of the same container
b) if the iterators are not random access iterators, pos2 must be
reachable from pos1; that is, it must have the same position or a
later position
c) ... For random access iterators, it simply returns pos2-pos1.

This is pretty clear, allowing first > last for random access
iterators, and returning a negative result if pos1 > pos2.


'The C++ Programming Language' by Stroustrup (2000, Special Edition.
3rd printing May 2000), states on p. 551 a general implementation of
std::distance which utilizes operator++. On p. 554 however a
specialization for random access iterators using operator- is
provided.
So Stroustrup is not completely clear as the semantic of the
non-specialized version does not fully match the semantic of the
specialized version, I would say the same conclusion as for Josuttis
applies.

On the other hand, the help file shipped with MS Visual Studion 2005
C++ Express Edition says:
a) determines the number of increments between the positions addressed
by two iterators
b) return-value: The number of times that _First must be incremented
until it equal _Last.

So this means first must be <= last, and therefore the result can of
course only be positive.

Finally, the SGI STL description says:
a) "Finds the distance between first and last, i.e. the number of
times that first must be incremented until it is equal to last" - same
conclusion as for MS Visual Studio.



What shall we conclude?
Here is my conclusion: I think it was intended that for random access
iterators first > last is possible, and under such circumstances
std::distance shall return a negative result, as we know it from
pointer arithmetic.
This however means that 24.1 / 6 must be relaxed, it means that table
76 needs to be fixed (although that table is gone anyway in the new
draft), and that std::distance needs some more words about the sign of
the return value.

Final question:
Given the oddities in the present wording of the standard, and
provided the equivocal descriptions of std::distance in very commonly
used references used by beginners and experts, is the present wording
worth a fix (DR) ?

Thanks,
Thomas


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: daniel.kruegler@googlemail.com
Date: Sat, 20 Dec 2008 09:25:59 CST
Raw View
On 19 Dez., 22:38, Tom <spamf...@nothing.com> wrote:
> Regarding the present wording of the std::distance function some issues came up.
>
> To start with, here's the present wording (with the same issues
> applying to the present draft version):
>
> 24.3.4 Iterator operations [lib.iterator.operations]
>
> template<class InputIterator>
> typename iterator_traits<InputIterator>::difference_type
> distance(InputIterator first, InputIterator last)
>
> 4 Effects: Returns the number of increments or decrements needed to
> get from first to last.
> 5 Requires: last must be reachable from first.
>
> And here are my two questions:
> Firstly, is it possible (valid code) that, for random access
> iterators, first > last ?
> Secondly, if the answer to the first question is true, what is the
> sign of the return value if first > last ?
>
> Here are my points why I raise this question. I will first quote all
> the relevant sections of the standard (2003 version - there are again
> no relevant changes in the present draft), and then explain the
> oddities.
>
> 24.1 Iterator requirements [lib.iterator.requirements] /
> 1: "Iterators are a generalization of pointers..."
> 2: "Since iterators are an abstraction of pointers, their semantics is
> a generalization of most of the semantics of pointers in C++..."
> 6: "An iterator j is called reachable from an iterator i if and only
> if there is a finite sequence of applications of the expression ++i
> that makes i == j."
>
> Table 76:
> Expression: b - a
> return type: Distance
> operational semantics: (a<b) ? distance(a,b) : -distance(b,a)
> precondition: there exists a value n of Distance such that a + n == b.
> b == a + (b - a)
>
> 24.3.4 Iterator operations [lib.iterator.operations] /
> 1: "Since only random access iterators provide + and - operators, the
> library provides two function templates advance and distance. These
> function templates use + and - for random access iterators (and are,
> therefore, constant time for them)..."
>
> If you read this the following inconsistencies should be noted:
>
> 24.3.4/4 says "the number of increments or decrements needed to get
> from first to last".
> Pay attention to the phrases 'the number' and 'decrements'. These
> imply two things: Firstly, the 'the number' implies that the return
> value has to be positive - after all, the number of operations is an
> integer ratio scale variable with a natural zero bound. Of greater
> importance is the 'decrements', which I suppose implies that under
> some circumstances, last can be reached from first by applying
> operator-- (or an equivalent semantic).
>
> However  24.3.4/5 says that "last must be reachable from first", and
> 24.1/6 says "An iterator j is called reachable from an iterator i if
> and only if there is a finite sequence of applications of the
> expression ++i that makes i == j". This excludes operator-- and, thus
> for std::distance the condition first <= last must hold, in which case
> of course decrements are not possible (with the exception of the quite
> uninteresting case of zero decrements...).

To make a long story short: If you have read the NAD issue
resolution

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#204

which I have pointed to in comp.lang.c++.mod thread

http://preview.tinyurl.com/4jjcsd

then you will see that the standard position is rather clear
*except* for the irritating reference to "decrements" in  24.3.4/4.

Please see also my final comment below.

> On the other hand 24.1 / 1-2 mention the similarity of iterators to
> pointers, and 24.3.4 / 1 says for random access iterators, operator-
> is used - which is certainly valid if first > last (and results in a
> negative value). The sign of the return value of std::distance on the
> other hand plays a role in conjunction with table 76. Now there are
> two issues here. Firstly, as operator- semantic of random iterators is
> defined via the use of std::distance, it follows that the whole
> operator- semantic remains undefined as long as std::distance is not
> clearly defined (and in my opinion, presently it is NOT). However,
> let's assume we know what operator- shall do because we know what
> pointers do. Then the second issue arises: If we assume a and b are
> plain pointers, than the result of b - a should be negative (or zero)
> of course if (a < b) == false, which then evaluates to -distance(b,a)
> - this in turn now implies that std::distance must return a positive
> value under all circumstances, which is of course at odds with general
> pointer arithmetic behavior - which breaks the concept of iterators
> being a generalization of pointers (at least for std::distance). After
> all it would be very strange to expect that if b and a are pointers,
> and a > b, that std::distance(a, b) returns a positive value, but b -
> a returns a negative value (or in other words, if you have pointers as
> iterators, std::distance behaves inferiorly to plain odd pointer
> arithmetic as the sign information is lost. I think you get the
> idea...).
>
> So I conclude something is rotten here.
>
> Now we move away from the Standard and take a look at some practical issues.
>
> Regarding implementations, I tested the GNU MINGW compiler (but forgot
> the version - but it was definitively a modern release) and Microsoft
> Visual Studio 2005 Express Edition. Both implementations return a
> negative value for std::distance if random access iterators are passed
> as arguments, and first > last. If (!) it is valid that first can be >
> last (that is, 24.1/6 is relaxed), than this behavior is at odds with
> the verbal description of std::distance and table 76 (because of the
> sign), but it is in synch with 24.1 /1-2 and 24.3.4/1 and 'common
> sense / expectations' as derived from plain odd pointers.
>
> Regarding commonly used references, results were also equivocal:
>
> 'The C++ Standard Library. A tutorial and reference' by Josuttis
> (1999, 7th printing August 2001), says (7.3.2) that:
> a) both iterators have to refer to elements of the same container
> b) if the iterators are not random access iterators, pos2 must be
> reachable from pos1; that is, it must have the same position or a
> later position
> c) ... For random access iterators, it simply returns pos2-pos1.
>
> This is pretty clear, allowing first > last for random access
> iterators, and returning a negative result if pos1 > pos2.
>
> 'The C++ Programming Language' by Stroustrup (2000, Special Edition.
> 3rd printing May 2000), states on p. 551 a general implementation of
> std::distance which utilizes operator++. On p. 554 however a
> specialization for random access iterators using operator- is
> provided.
> So Stroustrup is not completely clear as the semantic of the
> non-specialized version does not fully match the semantic of the
> specialized version, I would say the same conclusion as for Josuttis
> applies.
>
> On the other hand, the help file shipped with MS Visual Studion 2005
> C++ Express Edition says:
> a) determines the number of increments between the positions addressed
> by two iterators
> b) return-value: The number of times that _First must be incremented
> until it equal _Last.
>
> So this means first must be <= last, and therefore the result can of
> course only be positive.
>
> Finally, the SGI STL description says:
> a) "Finds the distance between first and last, i.e. the number of
> times that first must be incremented until it is equal to last" - same
> conclusion as for MS Visual Studio.
>
> What shall we conclude?
> Here is my conclusion: I think it was intended that for random access
> iterators first > last is possible, and under such circumstances
> std::distance shall return a negative result, as we know it from
> pointer arithmetic.
> This however means that 24.1 / 6 must be relaxed, it means that table
> 76 needs to be fixed (although that table is gone anyway in the new
> draft), and that std::distance needs some more words about the sign of
> the return value.
>
> Final question:
> Given the oddities in the present wording of the standard, and
> provided the equivocal descriptions of std::distance in very commonly
> used references used by beginners and experts, is the present wording
> worth a fix (DR) ?

This has already been initiated, see:

http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#940

Greetings from Bremen,

Daniel Kr   gler


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Tom <spamfood@nothing.com>
Date: Sat, 20 Dec 2008 16:57:26 CST
Raw View
(Excessive quoting somewhat trimmed -- moderator)

daniel.kruegler@googlemail.com wrote:
>
> On 19 Dez., 22:38, Tom <spamf...@nothing.com> wrote:
>>
>> Regarding the present wording of the std::distance function some issues came up.
>>
>> To start with, here's the present wording (with the same issues
>> applying to the present draft version):
>>
>> 24.3.4 Iterator operations [lib.iterator.operations]
>>
>> template<class InputIterator>
>> typename iterator_traits<InputIterator>::difference_type
>> distance(InputIterator first, InputIterator last)
>>
>> 4 Effects: Returns the number of increments or decrements needed to
>> get from first to last.
>> 5 Requires: last must be reachable from first.
>>
>> And here are my two questions:
>> Firstly, is it possible (valid code) that, for random access
>> iterators, first > last ?
>> Secondly, if the answer to the first question is true, what is the
>> sign of the return value if first > last ?
>>
>> Here are my points why I raise this question. I will first quote all
>> the relevant sections of the standard (2003 version - there are again
>> no relevant changes in the present draft), and then explain the
>> oddities.
>>
>> 24.1 Iterator requirements [lib.iterator.requirements] /
>> 1: "Iterators are a generalization of pointers..."
>> 2: "Since iterators are an abstraction of pointers, their semantics is
>> a generalization of most of the semantics of pointers in C++..."
>> 6: "An iterator j is called reachable from an iterator i if and only
>> if there is a finite sequence of applications of the expression ++i
>> that makes i == j."
>>
>> Table 76:
>> Expression: b - a
>> return type: Distance
>> operational semantics: (a<b) ? distance(a,b) : -distance(b,a)
>> precondition: there exists a value n of Distance such that a + n == b.
>> b == a + (b - a)
>>
>> 24.3.4 Iterator operations [lib.iterator.operations] /
>> 1: "Since only random access iterators provide + and - operators, the
>> library provides two function templates advance and distance. These
>> function templates use + and - for random access iterators (and are,
>> therefore, constant time for them)..."
>>
>> If you read this the following inconsistencies should be noted:
>>
>> 24.3.4/4 says "the number of increments or decrements needed to get
>> from first to last".
>> Pay attention to the phrases 'the number' and 'decrements'. These
>> imply two things: Firstly, the 'the number' implies that the return
>> value has to be positive - after all, the number of operations is an
>> integer ratio scale variable with a natural zero bound. Of greater
>> importance is the 'decrements', which I suppose implies that under
>> some circumstances, last can be reached from first by applying
>> operator-- (or an equivalent semantic).
>>
>> However  24.3.4/5 says that "last must be reachable from first", and
>> 24.1/6 says "An iterator j is called reachable from an iterator i if
>> and only if there is a finite sequence of applications of the
>> expression ++i that makes i == j". This excludes operator-- and, thus
>> for std::distance the condition first <= last must hold, in which case
>> of course decrements are not possible (with the exception of the quite
>> uninteresting case of zero decrements...).
>
> To make a long story short: If you have read the NAD issue
> resolution
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#204
>
> which I have pointed to in comp.lang.c++.mod thread
>
> http://preview.tinyurl.com/4jjcsd
>
> then you will see that the standard position is rather clear
> *except* for the irritating reference to "decrements" in  24.3.4/4.
>
> Please see also my final comment below.
>
>> On the other hand 24.1 / 1-2 mention the similarity of iterators to
>> pointers, and 24.3.4 / 1 says for random access iterators, operator-
>> is used - which is certainly valid if first > last (and results in a
>> negative value). The sign of the return value of std::distance on the
>> other hand plays a role in conjunction with table 76. Now there are
>> two issues here. Firstly, as operator- semantic of random iterators is
>> defined via the use of std::distance, it follows that the whole
>> operator- semantic remains undefined as long as std::distance is not
>> clearly defined (and in my opinion, presently it is NOT). However,
>> let's assume we know what operator- shall do because we know what
>> pointers do. Then the second issue arises: If we assume a and b are
>> plain pointers, than the result of b - a should be negative (or zero)
>> of course if (a < b) == false, which then evaluates to -distance(b,a)
>> - this in turn now implies that std::distance must return a positive
>> value under all circumstances, which is of course at odds with general
>> pointer arithmetic behavior - which breaks the concept of iterators
>> being a generalization of pointers (at least for std::distance). After
>> all it would be very strange to expect that if b and a are pointers,
>> and a > b, that std::distance(a, b) returns a positive value, but b -
>> a returns a negative value (or in other words, if you have pointers as
>> iterators, std::distance behaves inferiorly to plain odd pointer
>> arithmetic as the sign information is lost. I think you get the
>> idea...).
>>
>> So I conclude something is rotten here.
>>
[...]
>>
>> What shall we conclude?
>> Here is my conclusion: I think it was intended that for random access
>> iterators first > last is possible, and under such circumstances
>> std::distance shall return a negative result, as we know it from
>> pointer arithmetic.
>> This however means that 24.1 / 6 must be relaxed, it means that table
>> 76 needs to be fixed (although that table is gone anyway in the new
>> draft), and that std::distance needs some more words about the sign of
>> the return value.
>>
>> Final question:
>> Given the oddities in the present wording of the standard, and
>> provided the equivocal descriptions of std::distance in very commonly
>> used references used by beginners and experts, is the present wording
>> worth a fix (DR) ?
>
> This has already been initiated, see:
>
> http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#940

I have just taken a look at the proposed DR, but I think you missed
the main point. I didn't suggest fixing std::distance to cover only
increments, but to fix the other paragraphs to allow first > last in
std::distance (plus fix std::distance for the sign).
Let's assume that the semantic of operator- for random access
iterators is that of pointer arithmetic (which is very reasonable) -
then the proposed std::distance is an inflated version of operator-,
because it unnecessarily restricts the user to the case first <= last.
Do we REALLY want that ? Do we really want to end up having to avoid
std::distance if the iterators are random access, because operator- is
of higher quality, giving us also the sign information ?

However, I admit the proposed std::distance also has advantages. It
reliefs the programmer from worrying about special cases, that is all
iterator categories are treated equally. And for random access
iterators, the relative positions of first / last can be trivially
determined using operator< (which also has the neat advantage of
resulting in a compile-time error if the iterator is of some other
category). On the other hand all the people who gained their knowledge
from Jousutti's book now become trapped.

Given this, I think the best solution is that before ANY new wording
is suggested, the Committee first has to decide what the function is
actually supposed to do - shall it allow first > last for random
access iterators, or not ? If it allows that, what sign shall the
return value have in that case? I think you will agree with me that we
cannot make this decision (at least for my part I do not dare to move
myself into that position), so please change the DR to something
pointing out the present inconsistencies (which I think have been
stated in every detail already in the news groups), and then let the
Committee decide what std::distance is actually supposed to do,
including the required wording.

Thomas

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]