Topic: VS2005/2008; istream; "\r\n


Author: Frank <fschaef@googlemail.com>
Date: Wed, 10 Dec 2008 14:32:43 CST
Raw View
In the frame of my lexical analyzer project quex, I found a strange
behavior of the Microsoft imlementation of std-stream lib. The
question
is if this behavior is admissible, or Microsoft may need some advice.

If an input stream contains a DOS style '\r\n' (carriage
return, newline), i.e. 0x0D0A, then Microsoft's istream reads solely
a 0x0A in place of 0x0D0A, but increases the stream position by two.
To prove this load the following program into VS 2008 (VS 2005 same
thing):

#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <cstdio>

int _tmain(int argc, _TCHAR* argv[])
{
 using namespace std;
 char         buffer[11] = { 0,0,0,0,0,0,0,0,0,0,0 };
 ifstream     dmp("test.txt");

 cout << dmp.tellg() << endl;
 dmp.read(buffer, 10);

 for(int i=0; i<11; ++i) printf("%02X.", (int)buffer[i]);
 cout << endl;
 cout << dmp.tellg() << endl;
 return 0;
}

Then, create a file with the following content
----------------------------------------
012345
6789
----------------------------------------
That is: The numbers 0 to 5 followed by a DOS style newline (0x0D,
0x0A)
as any Windows text editor does, then the numbers from 6 to 9.

Run the program and you will get the following output:

0
30.31.32.33.34.35.0A.36.37.38.00.
11

You can see that the DOS style newline (0xD, 0xA) is replaced by
0A, and the stream position, instead of being '10' is internally
set to 11.

!! Thus: With the Microsoft implementation of istream, the
number     !!
!! of bytes being read is **not equal** the stream position
increment !!
!! during the read
process.                                           !!

Now, is such a behavior allowed according to standard?

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: James Kanze <james.kanze@gmail.com>
Date: Thu, 11 Dec 2008 10:41:42 CST
Raw View
On Dec 10, 9:32 pm, Frank <fsch...@googlemail.com> wrote:
> In the frame of my lexical analyzer project quex, I found a
> strange behavior of the Microsoft imlementation of std-stream
> lib. The question is if this behavior is admissible, or
> Microsoft may need some advice.

> If an input stream contains a DOS style '\r\n' (carriage
> return, newline), i.e. 0x0D0A, then Microsoft's istream reads
> solely a 0x0A in place of 0x0D0A, but increases the stream
> position by two.

I really doubt that.  You mean that you don't see the character
following the '\n' (after the translation)?  Or what?

> To prove this load the following program into VS 2008 (VS 2005
> same thing):

> #include "stdafx.h"
> #include <iostream>
> #include <fstream>
> #include <cstdio>

> int _tmain(int argc, _TCHAR* argv[])

If you're talking about standard C++, you'd best avoid
non-standard extensions.  "stdafx.h", _tmain and _TCHAR aren't
C++, and Microsoft would be within its rights to change the
definition of istream when they're used.  (It doesn't, of
course.)

> {
>         using namespace std;
>         char         buffer[11] = { 0,0,0,0,0,0,0,0,0,0,0 };
>         ifstream     dmp("test.txt");

>         cout << dmp.tellg() << endl;

What's the above line supposed to do?  istream::tellg returns a
ios::pos_type, and there's no << defined for it.  You're getting
some sort of unspecified conversion.

>         dmp.read(buffer, 10);

>         for(int i=0; i<11; ++i) printf("%02X.", (int)buffer[i]);
>         cout << endl;
>         cout << dmp.tellg() << endl;
>         return 0;
> }

> Then, create a file with the following content
> ----------------------------------------
> 012345
> 6789
> ----------------------------------------
> That is: The numbers 0 to 5 followed by a DOS style newline
> (0x0D, 0x0A) as any Windows text editor does, then the numbers
> from 6 to 9.

> Run the program and you will get the following output:

> 0
> 30.31.32.33.34.35.0A.36.37.38.00.
> 11

So.  The standard doesn't even say that the code outputting the
first and last line will compile, much less what it will
display.

> You can see that the DOS style newline (0xD, 0xA) is replaced
> by 0A, and the stream position, instead of being '10' is
> internally set to 11.

> !! Thus: With the Microsoft implementation of istream, the number     !!
> !! of bytes being read is **not equal** the stream position increment !!
> !! during the read process.                                           !!

> Now, is such a behavior allowed according to standard?

Obviously, since the position isn't even required to be an
integral type.

There are only a very few things you can legally do with the
position: you are guaranteed that you can convert an integer to
a position, and that if that integer is 0, it will correspond to
the start of the file.  You can add or subtract an integer, to
the position, but it only has a meaning if the file is opened in
binary mode.  And any results you get from converting an
position to an integral type should be considered magic cookies:
all that's guaranteed with the results of tellg() is that you
can seekg() back to that position, and end up at the same place
in the file.

--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: google@dalvander.com
Date: Thu, 11 Dec 2008 10:41:16 CST
Raw View
On Dec 10, 9:32 pm, Frank <fsch...@googlemail.com> wrote:
> !! Thus: With the Microsoft implementation of istream, the
> number     !!
> !! of bytes being read is **not equal** the stream position
> increment !!
> !! during the read
> process.                                           !!

This isn't specific to the Microsoft implementation, and I think your
conclusion is flawed, it reads 11 bytes, and translates them into 10
bytes.

> Now, is such a behavior allowed according to standard?

Yes, it is according to the standard, as you open the file in text
mode. If you open the file in binary mode you'll get an untranslated
view of the file.

ifstream     dmp("test.txt", ios_base::in | ios_base::binary);

Regards,
Anders Dalvander

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: James Kuyper <jameskuyper@verizon.net>
Date: Thu, 11 Dec 2008 10:42:21 CST
Raw View
Frank wrote:
> In the frame of my lexical analyzer project quex, I found a strange
> behavior of the Microsoft imlementation of std-stream lib. The
> question
> is if this behavior is admissible, or Microsoft may need some advice.
>
> If an input stream contains a DOS style '\r\n' (carriage
> return, newline), i.e. 0x0D0A, then Microsoft's istream reads solely
> a 0x0A in place of 0x0D0A, but increases the stream position by two.

When you open a file in text mode (the default), istream is supposed to
convert whatever system-specific method is used to indicate end-of-line
into '\n'. The stream position identifies the place within the file
where the next character will be read. That position has increased by 2;
since the system you're using uses two characters to identify end-of-line.

If you don't like this behavior, open the stream in binary mode instead
of text mode. However, if you do that, you'll have to take care of the
'\r' yourself; and you'll have to handle the portability issues if you
port your code to systems where other methods are used to indicate
end-of-line.

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Yechezkel Mett <ymett.on.usenet@gmail.com>
Date: Thu, 11 Dec 2008 11:45:06 CST
Raw View
On Dec 10, 10:32 pm, Frank <fsch...@googlemail.com> wrote:
> In the frame of my lexical analyzer project quex, I found a strange
> behavior of the Microsoft imlementation of std-stream lib. The
> question
> is if this behavior is admissible, or Microsoft may need some advice.
>
> If an input stream contains a DOS style '\r\n' (carriage
> return, newline), i.e. 0x0D0A, then Microsoft's istream reads solely
> a 0x0A in place of 0x0D0A, but increases the stream position by two.
...
> Now, is such a behavior allowed according to standard?

This is exactly what one would expect in text mode. If you want the
actual bytes use binary mode. If the difference of two rather than one
is what is bothering you, note that (if I understand correctly) the
value returned by tellg is only usable as an argument to seekg, so you
have no guarantees what subtracting two values will give.

Yechezkel Mett


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: James Kanze <james.kanze@gmail.com>
Date: Fri, 12 Dec 2008 10:55:31 CST
Raw View
On Dec 11, 5:42 pm, James Kuyper <jameskuy...@verizon.net> wrote:
> Frank wrote:
> > In the frame of my lexical analyzer project quex, I found a
> > strange behavior of the Microsoft imlementation of
> > std-stream lib. The question is if this behavior is
> > admissible, or Microsoft may need some advice.

> > If an input stream contains a DOS style '\r\n' (carriage
> > return, newline), i.e. 0x0D0A, then Microsoft's istream
> > reads solely a 0x0A in place of 0x0D0A, but increases the
> > stream position by two.

> When you open a file in text mode (the default), istream is
> supposed to convert whatever system-specific method is used to
> indicate end-of-line into '\n'. The stream position identifies
> the place within the file where the next character will be
> read. That position has increased by 2; since the system
> you're using uses two characters to identify end-of-line.

The stream position is supposed to contain whatever information
the system might need to be able to reseek to that position, in
whatever format the implementation thinks is appropriate.  On a
system with fixed length records, it might contain the block
number and the offset into the block---with the offset in the
high order bits.  Trying to convert it into an integral type is
not really well defined.

> If you don't like this behavior, open the stream in binary
> mode instead of text mode.

That will work with Microsoft's implementation, but the standard
still doesn't guarantee anything concerning the results of
converting the position into an integral type.

--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@netlab.cs.rpi.edu]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]