Topic: iostream::getline() delimiter
Author: David R Tribble <david@tribble.com>
Date: 1999/08/30 Raw View
"James Russell Kuyper Jr." wrote:
>
> Michael Dennis wrote:
> ....
>> Just for interest sake for anyone who was following this thread...
>> I did a quick test and it appears that the VC++ low level text
>> routines translate binary CR/LF pair to \n but do NOT translate an
>> isolated CR to \n. This means that if you want to be able to handle
>> both UNIX and DOS text files from the same app you have to implement
>> this yourself.
>
> Oddly enough, Unix programs leave both character sequences unchanged
> :-)
> When you try to work with files created by multiple file systems,
> you're going to have problems. Keep in mind that text mode ftp
> usually does the appropriate conversion for you, and that many
> systems have utilities such as unix2dos and dos2unix. If they don't,
> it's trivial to write them yourself.
Where I work it's fairly common to parse text files on a system that
may have originated on a different system. I therefore go out of
my way to handle newlines of various flavors:
CR LF -> newline
CR -> newline
LF -> newline
It's rare to see isolated CR or LF characters in text files on
systems that employ CR/LF pairs (such as DOS), just as it's fairly
rare to see isolated CR characters on LF-newline systems (such as
Unix), so the strategy above works for almost all text cases.
(Many programs can be written to take CR or LF in any order and don't
care whether a CR is followed by an LF or not, but programs employing
parsers generally need to keep track of line numbers, so such a
scheme works better for them.)
-- David R. Tribble, david@tribble.com --
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Martijn Lievaart" <nobody@orion.nl>
Date: 1999/08/25 Raw View
Michael Dennis wrote in message <37C19657.86B48991@kerneltech.com>...
>
>"P.J. Plauger" wrote:
>
>> Probably, but I'd test it before coming to depend heavily on it.
>
>Just for interest sake for anyone who was following this thread... I did a quick
>test and it appears that the VC++ low level text routines translate binary CR/LF
>pair to \n but do NOT translate an isolated CR to \n. This means that if you want
>to be able to handle both UNIX and DOS text files from the same app you have to
>implement this yourself.
>
Uhhh, the unix convention is to have a line ended by LF, not CR. You are
confused with the Mac ;^>. I never had any problem on dos/Wintel with
Unix files (i.e. lf terminated lines), but YMMV.
Martijn
--
Please post replies to this newsgroup. If you must reach me by email,
use <newsgroup-name> at greebo.orion in nl.
Senders of unsolicited bulk or commercial email will be prosecuted to
the maximal extent possible by law and any other means.
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "James Russell Kuyper Jr." <kuyper@wizard.net>
Date: 1999/08/24 Raw View
Michael Dennis wrote:
....
> Just for interest sake for anyone who was following this thread... I did a quick
> test and it appears that the VC++ low level text routines translate binary CR/LF
> pair to \n but do NOT translate an isolated CR to \n. This means that if you want
> to be able to handle both UNIX and DOS text files from the same app you have to
> implement this yourself.
Oddly enough, Unix programs leave both character sequences unchanged :-)
When you try to work with files created by multiple file systems, you're
going to have problems. Keep in mind that text mode ftp usually does the
appropriate conversion for you, and that many systems have utilities
such as unix2dos and dos2unix. If they don't, it's trivial to write them
yourself.
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Michael Dennis <miked@kerneltech.com>
Date: 1999/08/23 Raw View
"P.J. Plauger" wrote:
> Probably, but I'd test it before coming to depend heavily on it.
Just for interest sake for anyone who was following this thread... I did a quick
test and it appears that the VC++ low level text routines translate binary CR/LF
pair to \n but do NOT translate an isolated CR to \n. This means that if you want
to be able to handle both UNIX and DOS text files from the same app you have to
implement this yourself.
Mike
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Paul Baxter" <nospam@paje.globalnet.co.uk>
Date: 1999/08/23 Raw View
If you read/write to a text stream there is an automatic conversion to and
from the two forms.
If you read as a binary memory dump you'd have to do it yourself.
Paul
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Francis Glassborow <francis@robinton.demon.co.uk>
Date: 1999/08/23 Raw View
In article <37BC74A1.2D4A42F8@kerneltech.com>, Michael Dennis
<miked@kerneltech.com> writes
>Correct me if I'm wrong but according to my VC++ documentation it says
>that istream::getline() takes a delimiter argument that is a single
>character that is defaulted to '\n'. Does this mean if I have a text
>file with DOS line termination (\r\n) it's going to leave '\r' in my
>string? Or if it automatically assumes that the line is terminated in
>DOS form then will it not terminate properly on Unix text files unless I
>specify \r as my delimiter? Or is it smart enough to figure this all
>out and I should just leave my delimiter as the default and it will
>automatically figure it out?
It should compile correctly for the target platform, just as it outputs
'\n' as appropriate for the target.
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: James Kuyper <kuyper@wizard.net>
Date: 1999/08/23 Raw View
Michael Dennis wrote:
...
> Correct me if I'm wrong but according to my VC++ documentation it says
> that istream::getline() takes a delimiter argument that is a single
> character that is defaulted to '\n'. Does this mean if I have a text
> file with DOS line termination (\r\n) it's going to leave '\r' in my
> string? Or if it automatically assumes that the line is terminated in
> DOS form then will it not terminate properly on Unix text files unless I
> specify \r as my delimiter? Or is it smart enough to figure this all
Typical C/C++ compilers in DOS translate input "\r\n" into "\n", and do
the reverse on output, unless you opened the file in binary mode. The
same code will work under unix with no problem, because the translation
is done by getline() itself, and not by your code. The unix version of
getline() won't need to bother with the translation.
...
> I know the standard C function fgets() has the smarts to do it. How
> about istream::getline()?
No less smart.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "P.J. Plauger" <pjp@plauger.com>
Date: 1999/08/23 Raw View
Michael Dennis wrote in message <37BD6B25.5AE9B428@kerneltech.com>...
>Yes, your right now that I think about it. It's the fact that you open the
>file in text mode that ensures that \r\n is translated to \n on input and
>that \n are translated to \r\n on output. To me that makes perfect sense but
>if on input if your mixing and matching Unix terminated and DOS terminated
>text files is the standard required to handle that? Or is that up to the
>implementor?
It's up to the implementor/OS. Usually, the C runtime matches the convention
of the most widely used text editors, assemblers, compilers in how they
represent text in a file. If you contrive to create a file with a different line
termination convention, then all sorts of interesting things happen with the
above tools, not to mention the C runtime. My preference is to make the C
translations as robust as possible, but under VC++ I use the Microsoft
read/write code. I can't predict its behavior for various fruity cases.
>I pretty sure I saw in your code (Standard C Library) that you put in support
>for reading both UNIX terminated and DOS terminated text files. This is a
>great feature and saves alot of worries from a text parsing stand-point.
>Since I'm talking to the man who would know :-), how about the VC++
><iostream> does it handle things the same way? Am I safe to use
>istream::getline() in text mode on a UNIX terminated file under VC++?
Probably, but I'd test it before coming to depend heavily on it.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Michael Dennis <miked@kerneltech.com>
Date: 1999/08/19 Raw View
I was just in the process of writing some code that parsed a text file
using the <iostream> standard C++ header. I don't typically use C++ for
parsing text files (I usually use the Standard C functions instead
because I find them easier) but decided I should brush up on my
<iostream> skills before I forgot how to do it.
Correct me if I'm wrong but according to my VC++ documentation it says
that istream::getline() takes a delimiter argument that is a single
character that is defaulted to '\n'. Does this mean if I have a text
file with DOS line termination (\r\n) it's going to leave '\r' in my
string? Or if it automatically assumes that the line is terminated in
DOS form then will it not terminate properly on Unix text files unless I
specify \r as my delimiter? Or is it smart enough to figure this all
out and I should just leave my delimiter as the default and it will
automatically figure it out?
I know the standard C function fgets() has the smarts to do it. How
about istream::getline()?
Mike
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "P.J. Plauger" <pjp@plauger.com>
Date: 1999/08/20 Raw View
Michael Dennis wrote in message <37BC74A1.2D4A42F8@kerneltech.com>...
>Correct me if I'm wrong but according to my VC++ documentation it says
>that istream::getline() takes a delimiter argument that is a single
>character that is defaulted to '\n'. Does this mean if I have a text
>file with DOS line termination (\r\n) it's going to leave '\r' in my
>string? Or if it automatically assumes that the line is terminated in
>DOS form then will it not terminate properly on Unix text files unless I
>specify \r as my delimiter? Or is it smart enough to figure this all
>out and I should just leave my delimiter as the default and it will
>automatically figure it out?
>
>I know the standard C function fgets() has the smarts to do it. How
>about istream::getline()?
Both C and C++ I/O are obliged to map between newline-terminated
text lines within the program and whatever the OS finds most convenient
externally. Thus, fgets in C and/or istream::getline() in C++ don't see
\r\n terminations, or record counts, or what have you. They don't have
to be smart, they just have to depend on some other part of the I/O
system being smart.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: Michael Dennis <miked@kerneltech.com>
Date: 1999/08/20 Raw View
"P.J. Plauger" wrote:
> Both C and C++ I/O are obliged to map between newline-terminated
> text lines within the program and whatever the OS finds most convenient
> externally. Thus, fgets in C and/or istream::getline() in C++ don't see
> \r\n terminations, or record counts, or what have you. They don't have
> to be smart, they just have to depend on some other part of the I/O
> system being smart.
Yes, your right now that I think about it. It's the fact that you open the
file in text mode that ensures that \r\n is translated to \n on input and
that \n are translated to \r\n on output. To me that makes perfect sense but
if on input if your mixing and matching Unix terminated and DOS terminated
text files is the standard required to handle that? Or is that up to the
implementor?
I pretty sure I saw in your code (Standard C Library) that you put in support
for reading both UNIX terminated and DOS terminated text files. This is a
great feature and saves alot of worries from a text parsing stand-point.
Since I'm talking to the man who would know :-), how about the VC++
<iostream> does it handle things the same way? Am I safe to use
istream::getline() in text mode on a UNIX terminated file under VC++?
Mike
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]