Thread

Topic: while(someIOStream) "What does epressi

Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1996/07/01 Raw View

In article 96Jul1144412@gabi.gabi-soft.fr,  kanze@gabi-soft.fr
(J.  Kanze) writes:
>In article <4r1tvo$e1h@netlab.cs.rpi.edu> clamage@Eng.Sun.COM (Steve
>Clamage) writes:
>
>   [The initial article was in comp.lang.c++.moderated, but since my
>question is purely a standards one...]
>
>|> >2) Are these tests the same (both will indicate a read past EOF)?:
>|> >    someStream.fail()
>|> >    someStream.eof()
>
>|> No. The "fail" and "eof" tests are not equivalent. "eof" is true if
>|> EOF has been reached, where the definition of "reached" is a bit
>|> slippery. "fail" means the last attempted operation did not succeed,
>|> for whatever reason. The two functions can return the same or
>|> opposite states.
>
>|> Example: If you attempt to read an integer, but the next available
>|> input characters are "abc", the operation fails, but eof is false.
>|> But if the file contained just "123" with no trailing whitespace,
>|> the operation succeeds, but you have also reached eof. You should
>                                                              ^^^^^^
>|> find "fail" is false and "eof" is true.
>
>Should, or might?  It was my impression that whether eof was set or not
>in this case was not defined by the standard.  (When inputting integers,
>I cannot imagine an implementation in which it wasn't set, because of
>the look-ahead needed.  But I didn't think that the standard would
>require it.)

I not sure the standard requires it, but I think the various rules
about eof make it almost certain that eof would be set in this case.
That is, I think an implementation would have to go to a lot of
trouble not to set eof in this case, yet set it in other cases where
it clearly must be set.

>|> If you then attempt an
>|> additional input operation, it will fail (no more characxters to read)
>|> and "eof" will still be true.
>
>|> You can test "eof" before trying an operation. If it returns "true",
>|> you have reached eof and no characters will be read. If "eof" is false
>|> before attempting an input operation, the operation might succeed and
>|> might fail; there might or might not actually be any characters left.
>|> I would not test "eof" just after an input operation, since whether it
>|> is true or false, the input might or might not have succeeded, as in
>|> the examples above.
>
>Question: what happens if the formatting fails?  Is the implementation
>still allowed to set EOF?

 [ example deleted ]

As I noted above, eof and fail are almost independent. They can both be
true, both be false, or have opposite states. If both are true after an
input, you have an input failure of some kind, and have also reached eof.
That is, you could clear the "fail" bit to attempt further input, but
the next attempt will not succeed, because you are at eof.

An input formatting or range error could be detected simultaneously
with detecting eof. Example: a file ends with " 123e" with floating-
point input. The input is malformed, but you don't discover that until
you have read every character of the file.

---
Steve Clamage, stephen.clamage@eng.sun.com
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 1996/07/03 Raw View

In article <4r8rpf$dvd@engnews1.Eng.Sun.COM> clamage@Eng.Sun.COM (Steve
Clamage) writes:

|> As I noted above, eof and fail are almost independent. They can both be
|> true, both be false, or have opposite states. If both are true after an
|> input, you have an input failure of some kind, and have also reached eof.
|> That is, you could clear the "fail" bit to attempt further input, but
|> the next attempt will not succeed, because you are at eof.

|> An input formatting or range error could be detected simultaneously
|> with detecting eof. Example: a file ends with " 123e" with floating-
|> point input. The input is malformed, but you don't discover that until
|> you have read every character of the file.

This was more or less my impression.  But it means that there is no way
of reliably distinguishing between failure due to no input, and failure
due to malformed input.

BTW: is "123e" malformed for floating point input, or is it the same as
"123"?  I cannot find (after a very quick search) anything which
specifies one way or another: the text of istream simply says that
locale::num_get does the conversion, and the text of locale:num_get
doesn't seem to say much at all.

Along the same lines: given a string like "1000000000000000" (which
will cause overflow on my machine), is it guaranteed that all of the
digits will be extracted, or can the implementation stop as soon as it
detects the overflow?

Is there any way of determining whether an input failure was due to
malformed data, or to overflow?
--
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils, itudes et rialisations en logiciel orienti objet --
                -- A la recherche d'une activiti dans une region francophone
---
[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]

Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1996/07/03 Raw View

In article 96Jul2131503@slsvgqt.lts.sel.alcatel.de, kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) writes:
>In article <4r8rpf$dvd@engnews1.Eng.Sun.COM> clamage@Eng.Sun.COM (Steve
>Clamage) writes:
>
>|> An input formatting or range error could be detected simultaneously
>|> with detecting eof. Example: a file ends with " 123e" with floating-
>|> point input. The input is malformed, but you don't discover that until
>|> you have read every character of the file.
>
>This was more or less my impression.  But it means that there is no way
>of reliably distinguishing between failure due to no input, and failure
>due to malformed input.

In most cases, it seems to me, if you get a failure and eof simultaneously,
you could treat it as eof. If you expecting more input but it was
malformed and also at eof, you do know you didn't get enough input.

If you really want to distinguish between no input and failure, you can
skip whitespace (assuming you are skipping whitespace) then read one
more character. If that fails, you have no more input. If it succeeds,
put the character back and do your normal input operation. If it fails,
the input was malformed (possibly truncated).
>
>BTW: is "123e" malformed for floating point input, or is it the same as
>"123"?  I cannot find (after a very quick search) anything which
>specifies one way or another: the text of istream simply says that
>locale::num_get does the conversion, and the text of locale:num_get
>doesn't seem to say much at all.

The text for num_get says it follows the scanf rules for conversions.

>Along the same lines: given a string like "1000000000000000" (which
>will cause overflow on my machine), is it guaranteed that all of the
>digits will be extracted, or can the implementation stop as soon as it
>detects the overflow?

The scanf rules say that the longest sequence matching the conversion
type is extracted. One could argue that for the file ending in "123e"
the "123" is the longest matching sequence, but I don't think that was
the intended meaning. Scanf is not supposed to require more than one
character of lookahead or putback. Consider "123e+a". Up until the "a"
it looks like it will be a valid floating-point number. The longest valid
prefix is "123", which would require 3 characters of lookahead and putback.

For decimal integer input, there is no question. "123e" is a valid "123"
followed by an "e" which is left unread.

For the huge integer, there is also no question. All the digits match
the type, so all must be extracted. The scanf rules say that if the
value cannot be represented in the object provided, the results are
undefined. That means the implementation could truncate, abort,
throw an exception, overflow storage, or most anything else. Floating-
point has more potential options for values that don't fit: infinity,
denormalized, or NaN for example.

---
Steve Clamage, stephen.clamage@eng.sun.com
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Date: 1996/07/04 Raw View

In article <4rem1i$ca6@engnews1.Eng.Sun.COM> clamage@Eng.Sun.COM (Steve
Clamage) writes:

|> In article 96Jul2131503@slsvgqt.lts.sel.alcatel.de, kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) writes:
|> >In article <4r8rpf$dvd@engnews1.Eng.Sun.COM> clamage@Eng.Sun.COM (Steve
|> >Clamage) writes:
|> >
|> >|> An input formatting or range error could be detected simultaneously
|> >|> with detecting eof. Example: a file ends with " 123e" with floating-
|> >|> point input. The input is malformed, but you don't discover that until
|> >|> you have read every character of the file.
|> >
|> >This was more or less my impression.  But it means that there is no way
|> >of reliably distinguishing between failure due to no input, and failure
|> >due to malformed input.

|> In most cases, it seems to me, if you get a failure and eof simultaneously,
|> you could treat it as eof. If you expecting more input but it was
|> malformed and also at eof, you do know you didn't get enough input.

It depends.  If you are reading a file containing floating point
numbers, one per line, you may want to stop and not process the file if
you detect an error.

My experience was that you could not do robust input with scanf, and I
know of no one who used it much.  The general procedure in C was to
read a line using fgets, and parse it by hand (using functions like
strtod, and maybe even sscanf).

In general, I find that input error detection in istream is far
superior than that of scanf.  Maybe I'm asking too much of it, however.

|> If you really want to distinguish between no input and failure, you can
|> skip whitespace (assuming you are skipping whitespace) then read one
|> more character. If that fails, you have no more input. If it succeeds,
|> put the character back and do your normal input operation. If it fails,
|> the input was malformed (possibly truncated).
|> >
|> >BTW: is "123e" malformed for floating point input, or is it the same as
|> >"123"?  I cannot find (after a very quick search) anything which
|> >specifies one way or another: the text of istream simply says that
|> >locale::num_get does the conversion, and the text of locale:num_get
|> >doesn't seem to say much at all.

|> The text for num_get says it follows the scanf rules for conversions.

Nothing like passing the buck:-).  (Actually, I think that it is more a
case of the principles of code reuse being applied to standards.  For
the man in the trenches, of course, commonality is definitly
preferrable to the subtle differences that are likely to occur when two
groups wordsmith something independantly.)

|> >Along the same lines: given a string like "1000000000000000" (which
|> >will cause overflow on my machine), is it guaranteed that all of the
|> >digits will be extracted, or can the implementation stop as soon as it
|> >detects the overflow?

|> The scanf rules say that the longest sequence matching the conversion
|> type is extracted. One could argue that for the file ending in "123e"
|> the "123" is the longest matching sequence, but I don't think that was
|> the intended meaning. Scanf is not supposed to require more than one
|> character of lookahead or putback. Consider "123e+a". Up until the "a"
|> it looks like it will be a valid floating-point number. The longest valid
|> prefix is "123", which would require 3 characters of lookahead and putback.

Assuming this interpretation (which seems perfectly reasonable to me):
does the standard (or will the standard) guarantee that all characters
will be extracted up to but *NOT* including the "a" in this example.
Or does it leave the implementation some freedom in this respect.

|> For decimal integer input, there is no question. "123e" is a valid "123"
|> followed by an "e" which is left unread.

|> For the huge integer, there is also no question. All the digits match
|> the type, so all must be extracted. The scanf rules say that if the
|> value cannot be represented in the object provided, the results are
|> undefined. That means the implementation could truncate, abort,
|> throw an exception, overflow storage, or most anything else. Floating-
|> point has more potential options for values that don't fit: infinity,
|> denormalized, or NaN for example.

Earlier drafts (including, I think, the last public draft) required
detection of overflow.  Has this been dropped?  (Earlier versions of
the draft also required failure in case of inexact conversions.  This
is, of course, unreasonable for floating point.  It would mean that the
string "1.2" would cause failure on any machine with binary floating
point, since it cannot be represented exactly.  Perhaps the detection
of overflow was deleted accidentally in correcting this.)
--
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils, itudes et rialisations en logiciel orienti objet --
                -- A la recherche d'une activiti dans une region francophone
---
[ comp.std.c++ is moderated.  To submit articles: Try just posting with your
                newsreader.  If that fails, use mailto:std-c++@ncar.ucar.edu
  comp.std.c++ FAQ: http://reality.sgi.com/austern/std-c++/faq.html
  Moderation policy: http://reality.sgi.com/austern/std-c++/policy.html
  Comments? mailto:std-c++-request@ncar.ucar.edu
]

Author: clamage@Eng.su.com (Steve Clamage)
Date: 1996/07/04 Raw View

kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) writes:

>My experience was that you could not do robust input with scanf, and I
>know of no one who used it much.  The general procedure in C was to
>read a line using fgets, and parse it by hand (using functions like
>strtod, and maybe even sscanf).

>In general, I find that input error detection in istream is far
>superior than that of scanf.  Maybe I'm asking too much of it, however.

Considering input for predefined types, istream error detection is
comparable to scanf. In fact, the current draft standard says input
uses the scanf rules. Neither one is very robust for reading
human-typed input, since all you get is a good/fail indication. If
there is an error, you don't get very much information about what
is wrong, or exactly where the error occurred.

Example:
 int x, y, z;
and the available input looks like
 123 4r6 789

If you write
 scanf("%d%d%d", &x, &y, &z);
scanf will return 2 to indicate two successful conversions. The value
of z will be unchanged, x will contain 123, and y will contain 4.

If you write
 cin >> x >> y >> z;
you will find out only that not all the conversions succeeded.

You can test each input individually:
 if( ! (cin >> x) )
     // error reading x
 if( ! (cin >> y) )
     // error reading y
and get the same information as with scanf.

With either istream or scanf the input is left with the 'r' unread.

Istreams have a definite type-safety advantage, of course. You can't
accidently store a float value into a double variable or vice-versa:
 double f;
 scanf("%f", &d); /* error: float conversion stored in double */
This error is common in C, but can't happen with istreams.

For interactive input, you often want to read a line of text
at a time and parse it by hand so you can give helpful error
messages.

Other times, all you care about is whether a given line of input
is valid or not valid. In that case, code like this
 if( ! (cin >> x >> y >> z) )
     // error in the line
is adequate.
--
Steve Clamage, stephen.clamage@eng.sun.com

[ comp.std.c++ is moderated.  To submit articles: try just posting with      ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu         ]
[ FAQ:      http://reality.sgi.com/employees/austern_mti/std-c++/faq.html    ]
[ Policy:   http://reality.sgi.com/employees/austern_mti/std-c++/policy.html ]
[ Comments? mailto:std-c++-request@ncar.ucar.edu                             ]