Topic: ? problem with standard getline ( istream &, string &, const char )


Author: panic@global.co.za (John Anderson)
Date: 1997/10/01
Raw View
OK, Ok. So I skipped all the template bits. You don't seriously expect
all that to fit in a subject line, do you?

For starters, the 2 December 1996 version of the ANSI standard
document, in section 21.3.79.5 (I hope I got the numbering right) says


template<class charT, class traits, class Allocator>
basic_istream<charT,traits>& getline(basic_istream<charT,traits>& is,
basic_string<charT,traits,Allocator>& str, charT delim);

5 Effects: Begins by constructing a sentry object k as if by
basic_istream<charT,traits>::sentry k( is). If bool( k) is true, it
calls str.erase() and then extracts characters from is and appends
them to str as if by calling str.append(1, c) until any of the
following occurs:
  end-of-file occurs on the input sequence (in which case, the getline
function calls is.setstate(ios_base::eofbit)).
  c == delim for the next available input character c (in which case,
c is extracted but not appended) (27.4.5.3)
  str.max_size() characters are stored (in which case, the function
calls is.setstate(ios_base::failbit) (27.4.5.3)

6 The conditions are tested in the order shown. In any case, after the
last character is extracted, the sentry object kis destroyed.

7 If the function extracts no characters, it calls
is.setstate(ios_base::failbit) which may throw ios_base::failure
(27.4.5.3).

8 Returns: is.

Take a closer look at the following, especially the part about
extracting but not appending

  c == delim for the next available input character c (in which case,
c is extracted but not appended) (27.4.5.3)

The problem is that an attempt to _extract_ another character may
block, depending on the device serving as the source for some
streambuf or descendant. In spite of having received all the data I'm
interested in, _and_ discarded the delimiter.

I ran into this problem with a descendant of streambuf for a tcp/ip
socket. Say the remote machine sends a sequence of characters with a
\n on the end. And then waits for a response before sending anything
more.

The local machine reads in the characters using getline. When the \n
is read in, it's discarded. This is as it should be. Then, the next
character is extracted, in a manner dependant on the particular
implementation. But if an attempt is made to read the next character
from the network socket, the call will block waiting for more data
from the remote machine. But I've already received everything I need,
and I can't reply to the remote machine until the call to getline has
returned. Which it's not going to do until the remote machine has
received a reply and sent more data. The remote machine's not going to
receive a reply until the call to getline has returned... erm. Uh-oh.
The processor gets on with doing nothing very fast.

I copied the source for getline (MSVC 5.0) and made the following
modifications, marked by *:

template<class _E, class _Tr, class _A>
inline basic_istream<_E, _Tr>& __cdecl
getline
(
 basic_istream<_E, _Tr>& _I
 ,basic_string<_E, _Tr, _A>& _X
 , const _E _D
)
{
 typedef basic_istream<_E, _Tr> _Myis;
 ios_base::iostate _St = ios_base::goodbit;
 bool _Chg = false;
 _X.erase();
 const _Myis::sentry _Ok(_I, true);
 if (_Ok)
 {
  _TRY_IO_BEGIN
  _Tr::int_type _C = _I.rdbuf()->sgetc();
  for (; ; _C = _I.rdbuf()->snextc())
   if (_Tr::eq_int_type(_Tr::eof(), _C))
   {
    _St |= ios_base::eofbit;
    break;
   }
   else if (_Tr::eq(_C, _D))
   {
    _Chg = true;
*    // original code
*    // _I.rdbuf()->snextc();
*    // modification
*    _I.rdbuf()->sbumpc();
*    // end of modification
    break;
   }
   else if (_X.max_size() <= _X.size())
   {
    _St |= ios_base::failbit;
    break;
   }
   else
    _X += _Tr::to_char_type(_C), _Chg =
true;
  _CATCH_IO_(_I);
 }
 if (!_Chg)
  _St |= ios_base::failbit;
 _I.setstate(_St);
 return (_I);
}

All I did was change snextc() to sbumpc(). This works OK. I'm not sure
what happens to gcount() though, or what other side-effects (if any)
there may be. Still investigating ;-)

Using sbumpc like this means:
- the delimiter character is discarded, as it should be according to
the standard
- by implication the input stream position is advanced
- BUT, no attempt is yet made to actually read from the sequence
controlled by the streambuf. That'll happen on the next call to sgetc
or snextc. Or one of the other extraction functions.

So the call doesn't block and the conversation continues happily.

So whaddy'all think about that?

bye, have a nice...
John Anderson
Writer / Lecturer / OO DesignerProgrammer and general all around
useful person.

The best sig I ever saw was: "I'm working at getting my opinions to
agree with me."
------------------------------------------------
OO & UI Consultant, Writer, Lecturer

panic@global.co.za

Paradoxes make sense, not logic.
Good Things (tm) require a precise lack of control.
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]





Author: kanze@gabi-soft.fr (J. Kanze)
Date: 1997/10/03
Raw View
panic@global.co.za (John Anderson) writes:

 |> For starters, the 2 December 1996 version of the ANSI standard
 |> document, in section 21.3.79.5 (I hope I got the numbering right) says
 |>
 |>
 |> template<class charT, class traits, class Allocator>
 |> basic_istream<charT,traits>& getline(basic_istream<charT,traits>& is,
 |> basic_string<charT,traits,Allocator>& str, charT delim);
 |>
 |> 5 Effects: Begins by constructing a sentry object k as if by
 |> basic_istream<charT,traits>::sentry k( is). If bool( k) is true, it
 |> calls str.erase() and then extracts characters from is and appends
 |> them to str as if by calling str.append(1, c) until any of the
 |> following occurs:
  end-of-file occurs on the input sequence (in which case, the getline
 |> function calls is.setstate(ios_base::eofbit)).
  c == delim for the next available input character c (in which case,
 |> c is extracted but not appended) (27.4.5.3)
  str.max_size() characters are stored (in which case, the function
 |> calls is.setstate(ios_base::failbit) (27.4.5.3)
 |>
 |> 6 The conditions are tested in the order shown. In any case, after the
 |> last character is extracted, the sentry object kis destroyed.
 |>
 |> 7 If the function extracts no characters, it calls
 |> is.setstate(ios_base::failbit) which may throw ios_base::failure
 |> (27.4.5.3).
 |>
 |> 8 Returns: is.
 |>
 |> Take a closer look at the following, especially the part about
 |> extracting but not appending
 |>
  c == delim for the next available input character c (in which case,
 |> c is extracted but not appended) (27.4.5.3)
 |>
 |> The problem is that an attempt to _extract_ another character may
 |> block, depending on the device serving as the source for some
 |> streambuf or descendant. In spite of having received all the data I'm
 |> interested in, _and_ discarded the delimiter.

Reread the above.  Once delim has been seen, getline extracts it, but
nothing else.  Any other behavior is an error in the implementation.

 |> I ran into this problem with a descendant of streambuf for a tcp/ip
 |> socket. Say the remote machine sends a sequence of characters with a
 |> \n on the end. And then waits for a response before sending anything
 |> more.
 |>
 |> The local machine reads in the characters using getline. When the \n
 |> is read in, it's discarded. This is as it should be. Then, the next
 |> character is extracted, in a manner dependant on the particular
 |> implementation. But if an attempt is made to read the next character
 |> from the network socket, the call will block waiting for more data
 |> from the remote machine. But I've already received everything I need,
 |> and I can't reply to the remote machine until the call to getline has
 |> returned. Which it's not going to do until the remote machine has
 |> received a reply and sent more data. The remote machine's not going to
 |> receive a reply until the call to getline has returned... erm. Uh-oh.
 |> The processor gets on with doing nothing very fast.
 |>
 |> I copied the source for getline (MSVC 5.0) and made the following
 |> modifications, marked by *:
 |>
 |> template<class _E, class _Tr, class _A>
 |> inline basic_istream<_E, _Tr>& __cdecl
 |> getline
 |> (
 |>  basic_istream<_E, _Tr>& _I
 |>  ,basic_string<_E, _Tr, _A>& _X
 |>  , const _E _D
 |> )
 |> {
 |>  typedef basic_istream<_E, _Tr> _Myis;
 |>  ios_base::iostate _St = ios_base::goodbit;
 |>  bool _Chg = false;
 |>  _X.erase();
 |>  const _Myis::sentry _Ok(_I, true);
 |>  if (_Ok)
 |>  {
 |>   _TRY_IO_BEGIN
 |>   _Tr::int_type _C = _I.rdbuf()->sgetc();
 |>   for (; ; _C = _I.rdbuf()->snextc())
 |>    if (_Tr::eq_int_type(_Tr::eof(), _C))
 |>    {
 |>     _St |= ios_base::eofbit;
 |>     break;
 |>    }
 |>    else if (_Tr::eq(_C, _D))
 |>    {
 |>     _Chg = true;
 |> *    // original code
 |> *    // _I.rdbuf()->snextc();
 |> *    // modification
 |> *    _I.rdbuf()->sbumpc();
 |> *    // end of modification

Moral: you have corrected an error in the MSVC implementation.

My interpretation of the standard is that the original implementation is
not conforming, however, this is arguable, since it definitly doesn't
extract the character returned by the snextc.  It does, however, require
the character to be present -- if this doesn't violate the letter of the
law, it certainly violates the spirit.

This is a subtle error, and I can easily understand both how it was
introduced in the first place, and how it slipped through testing.  I'm
sure that if you point it out to Microsoft, they will accept it as such,
correct it in the next release, and add a test for it in their
regression tests.

 |>     break;
 |>    }
 |>    else if (_X.max_size() <= _X.size())
 |>    {
 |>     _St |= ios_base::failbit;
 |>     break;
 |>    }
 |>    else
 |>     _X += _Tr::to_char_type(_C), _Chg =
 |> true;
 |>   _CATCH_IO_(_I);
 |>  }
 |>  if (!_Chg)
 |>   _St |= ios_base::failbit;
 |>  _I.setstate(_St);
 |>  return (_I);
 |> }
 |>
 |> All I did was change snextc() to sbumpc(). This works OK. I'm not sure
 |> what happens to gcount() though, or what other side-effects (if any)
 |> there may be. Still investigating ;-)

I didn't find where gcount is being modified in a quick scan of the
code, but I don't logically see how your change could affect it.  Gcount
is NOT maintained by the streambuf.

 |> Using sbumpc like this means:
 |> - the delimiter character is discarded, as it should be according to
 |> the standard
 |> - by implication the input stream position is advanced
 |> - BUT, no attempt is yet made to actually read from the sequence
 |> controlled by the streambuf. That'll happen on the next call to sgetc
 |> or snextc. Or one of the other extraction functions.
 |>
 |> So the call doesn't block and the conversation continues happily.
 |>
 |> So whaddy'all think about that?

That you've found a very subtle error in one particular implementation.
I'm not really surprised, I wouldn't be surprised if other
implementations had similar errors.

--
James Kanze    +33 (0)1 39 23 84 71    mailto: kanze@gabi-soft.fr
GABI Software, 22 rue Jacques-Lemercier, 78000 Versailles, France
        I'm looking for a job -- Je recherche du travail
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]