Topic: Streams: Creating a larger buffer for input.
Author: AllanW@my-dejanews.com
Date: 1998/09/10 Raw View
In article <6t4pes$g75$1@nnrp1.dejanews.com>,
AllanW@my-dejanews.com wrote:
>
> [ moderator's note: This discussion has strayed into programming
> techniques unrelated to the C++ standard. Followups on the topic
> of how to write faster I/O will be rejected. You can take that
> subject to comp.lang.c++ or comp.lang.c++.moderated. -sdc ]
My point was that Lars Hammer had perhaps unrealistic expectations
of I/O throughput, and that his entire design should perhaps be
evaluated rather than attacking the C++ library I/O routines.
Thus, I *briefly* mentioned some of the alternatives available.
Re-reading what I wrote, I can see that I focused more strongly
on these alternatives than I meant to.
In any case, the iostream library is what it is. I doubt that his
implementation is particularly faulty; I think he's simply using
the wrong tool for the job. I don't believe that there is any need
for a standards change to address his problem.
--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Lars Hammer" <lch@elsamprojekt.dk>
Date: 1998/09/10 Raw View
AllanW@my-dejanews.com wrote in message <6t72r2$omt$1@nnrp1.dejanews.com>...
>In article <6t4pes$g75$1@nnrp1.dejanews.com>,
> AllanW@my-dejanews.com wrote:
>>
>> [ moderator's note: This discussion has strayed into programming
>> techniques unrelated to the C++ standard. Followups on the topic
>> of how to write faster I/O will be rejected. You can take that
>> subject to comp.lang.c++ or comp.lang.c++.moderated. -sdc ]
I'm sorry that I have led the discussion away from what is considered a C++
Standard relating question. My impression of the iostream library was that
it was (compared to stdio):
1. Type safe.
2. Easily extensible
3. Faster.
1. and 2. are certainly fulfilled.
The reason that the ostream operator << should be faster than printf() was
that printf needs to parse the format string. The same goes for the istream
operator >> and scanf.
>In any case, the iostream library is what it is. I doubt that his
>implementation is particularly faulty; I think he's simply using
>the wrong tool for the job. I don't believe that there is any need
>for a standards change to address his problem.
I do not consider the iostream implementation faulty, because it does the
job and does it correctly.
>My point was that Lars Hammer had perhaps unrealistic expectations
>of I/O throughput, and that his entire design should perhaps be
>evaluated rather than attacking the C++ library I/O routines.
The design is based on a costumer wish. In my previous post, I described how
I change my program to use basic operating system calls to read the files
into a buffer, and then use the buffer in an istringstream using the istream
>> operators to extract from the buffer into my variables. Reading the files
are very fast totalling less than half a second, but extracting using >>
took almost 5 seconds. The extraction is in-memory operations!!! After
dealing with the istream's, I just feel that I can conclude that the >>
operators are very slow.
Even though performance is not a standard C++ issue, it is a demand on any
decent application.
[ moderator's note: Performance of C++ implementations is a suitable
issue for this newsgroup, and good runtime performance is a stated
goal of C++ as a programming language. Programming techniques to
accompish specific tasks does not ordinarily belong in this
newsgroup. See the FAQ for more details. -sdc ]
I will rewrite my application to use safe and fast standard C I/O. I hope
this will conclude the discussion.
Yours
Lars Hammer
lch@elsamprojekt.dk
www.elsamprojekt.dk
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: sbnaran@fermi.ceg.uiuc.edu (Siemel Naran)
Date: 1998/09/10 Raw View
On 10 Sep 1998 16:39:35 GMT, Lars Hammer <lch@elsamprojekt.dk> wrote:
>The reason that the ostream operator << should be faster than printf() was
>that printf needs to parse the format string. The same goes for the istream
>operator >> and scanf.
True. But I think many implementations use stdio inside. That is, they
construct the format string and then call sprintf!
>I will rewrite my application to use safe and fast standard C I/O. I hope
>this will conclude the discussion.
No, write your own specialized ostream. It will take a while to figure
this thing out, but it will be worth it. It should be faster than stdio,
and still be exception and type safe. For specific questions on this
topic, set the followup to comp.lang.c++ (I don't know how to do this).
--
----------------------------------
Siemel B. Naran (sbnaran@uiuc.edu)
----------------------------------
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Lars Hammer" <lch@elsamprojekt.dk>
Date: 1998/09/08 Raw View
Thanks for the explanation and response. I have fiddled a bit more around
and tried to locate the source of the performance problem. I'd like to be a
bit more detailed in my problem description: I'm reading predefined readable
ASCII-files using ifstream's and the read values are stored in a vector of
two-dimensional points (the 2-dimensional points is defined as a class). I
read approximately 40 files. The largest file is 40K and the average size is
12K. Reading the files and inserting the values into the vector (standard
library vector) takes about 5 seconds on a 133 MHz Pentium. The program was
compiled using Inprise C++Builder 3.0. There seems to be no change in
performance wether the program is compiled with or without debug info. I
consider the 5 seconds to be slow.
To speed up things without using extra buffering, I tried to read all of the
file into a buffer, and then create an istringstream so I could use the code
used for extraction of the values from the files again. See the following
code:
// CODE SNIPPET BEGINS
#include <fstream.h>
#include <sstream.h>
#include <stdio.h>
#include <sys/stat.h>
...
struct stat stat_buf;
int handle;
// This code is very fast
stat(buf,&stat_buf);
buffer = new char [stat_buf.st_size+1];
handle = open(buf,O_RDONLY | O_TEXT);
read(handle,buffer,stat_buf.st_size);
close(handle);
buffer[stat_buf.st_size] = '\0';
istringstream *ifs = new istringstream(buffer);
// This code is very slow.
*ifs >> ...;
*ifs >> ....;
...
// CODE SNIPPET ENDS
This had no effect on the performance problem. Then I commented out all of
the vector insertion, and the performance was the same. My conclusion is,
that it is the iostream extractors (>>) that really are the performance
problem. I'm quite sure, that I could rewrite the code in plain C and speed
up with a factor 5. I would like to hear if anybody can help me speeding
this up and still be using C++.
Thanks
Lars Hammer
lch@elsamprojekt.dk
www.elsamprojekt.dk
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: AllanW@my-dejanews.com
Date: 1998/09/09 Raw View
[ moderator's note: This discussion has strayed into programming
techniques unrelated to the C++ standard. Followups on the topic
of how to write faster I/O will be rejected. You can take that
subject to comp.lang.c++ or comp.lang.c++.moderated. -sdc ]
In article <6t3bfb$l42$1@dalen.get2net.dk>,
"Lars Hammer" <lch@elsamprojekt.dk> wrote:
> Thanks for the explanation and response. I have fiddled a bit more around
> and tried to locate the source of the performance problem. I'd like to be a
> bit more detailed in my problem description: I'm reading predefined readable
> ASCII-files using ifstream's and the read values are stored in a vector of
> two-dimensional points (the 2-dimensional points is defined as a class). I
> read approximately 40 files. The largest file is 40K and the average size is
> 12K. Reading the files and inserting the values into the vector (standard
> library vector) takes about 5 seconds on a 133 MHz Pentium. The program was
> compiled using Inprise C++Builder 3.0. There seems to be no change in
> performance wether the program is compiled with or without debug info. I
> consider the 5 seconds to be slow.
You're opening and closing 40 files, and reading 1/2 megabyte of data,
in 5 seconds on a Pentium 133! Depending on what type of disk drive
you use and what OS you use, this is probably already pretty fast. There
may well be a way to speed this up, but I'll be surprised if you do
better than 4 seconds.
Consider reorganizing your data. At the very least, try to combine it
into one file. Instead of maintaining it in ASCII, convert it to binary
and read it into memory directly. Or, is it possible to move the data
to a "Ram drive", AKA solid-state drive, one that is implemented in
pure RAM so it never needs to wait for a "seek" to complete?
If the input files must be ASCII, but they rarely change, then consider
a "conversion" utility. (You could also call this a "compile" or an
"initialize".) Whenever the ASCII files change, the utility would read
the ASCII files (taking five seconds to do so) and would write out a
binary file (which could take an additional 2-3 seconds). The main
program would never read the ASCII files directly, but would instead
read the binary files, and the performance problem would be solved.
> To speed up things without using extra buffering, I tried to read all of the
> file into a buffer, and then create an istringstream so I could use the code
> used for extraction of the values from the files again. See the following
> code:
[snip]
> This had no effect on the performance problem. Then I commented out all of
> the vector insertion, and the performance was the same. My conclusion is,
> that it is the iostream extractors (>>) that really are the performance
> problem.
Possibly. The more general conclusion is, I/O is slow for some reason.
This could be due to iostream, but it could also be some environmental
issue.
> I'm quite sure, that I could rewrite the code in plain C and speed
> up with a factor 5. I would like to hear if anybody can help me speeding
> this up and still be using C++.
That would be a useful exercise. If you really can read that data, in
that form, in one second using C routines then you can either try to
do the C++ equivalent, or you can simply use your C routines to read
the data (and call the problem solved). OTOH, you may find that your
expectations have been unrealistic.
--
AllanW@my-dejanews.com is a "Spam Magnet" -- never read.
Please reply in USENET only, sorry.
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: sbnaran@localhost.localdomain (Siemel Naran)
Date: 1998/09/09 Raw View
On 08 Sep 98 16:19:19 GMT, Lars Hammer <lch@elsamprojekt.dk> wrote:
>To speed up things without using extra buffering, I tried to read all of the
>file into a buffer, and then create an istringstream so I could use the code
>used for extraction of the values from the files again. See the following
>code:
It seems unlikely that istringstream will make things any faster.
Please show more of your code next time. I want to know what you mean
by vector insertion, and I'd like also to try the code on my computer.
Also, what is the "..." in the code below. What are you trying to
read -- int, double?
BTW, are you aware that you can say
istringstream ifs(buffer);
which eliminates the need to call "delete ifs". Also, objects
encourage more optimization than pointers/references, but I doubt
this is the problem here.
-----
> istringstream *ifs = new istringstream(buffer);
>
> // This code is very slow.
> *ifs >> ...;
> *ifs >> ....;
> ...
>// CODE SNIPPET ENDS
-----
Note that istream does formatted input. It consults the locale for
its formatting, sets iostream flags, etc.
For specialized input/output, write your own istream ostream class.
Writing your own istream/ostream class is not that hard. The hard
part is writing the streambuf or filebuf that does the raw reading
and writing. For an operator<< of your ostream class, format the
number for output and them simply call
basic_streambuf<...>::sputn(const char *, streamsize)
to write the raw butes to the stream.
I would try this, but some of your code above is missing.
If you are writing the data in binary format, you don't have to do
any formatting. But as Dietmar Kuehl pointed out, binary formats
are not portable. In fact, even though sizeof(double)==8 on
INTEL and SGI and HP, the meaning of the bits, the representation
of the real number, is different on all three architectures.
Only 0.0 seems to be the same universally -- it has all 8 bits
equal to zero.
--
----------------------------------
Siemel B. Naran (sbnaran@uiuc.edu)
----------------------------------
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: "Lars Hammer" <lch@elsamprojekt.dk>
Date: 1998/09/03 Raw View
I want to make a larger input buffer for faster file input. My code is
somewhat like this:
...
ifstream *ifs = new ifstream("myfile");
ifs >> count;
for (int i = 0; i < count; i++)
{
ifs >> a;
ifs >> b;
ifs >> c;
...
ifs >> z;
}
delete ifs;
My initial idea was to change the buffer size like this:
...
ifstream *ifs = new ifstream("myfile");
// This code added
char buffer[64*1024];
filebuf *FileBuf = ifstream->rdbuf();
FileBuf->pubsetbuf(buffer,sizeof(buffer));
// Added code above
ifs >> count;
for (int i = 0; i < count; i++)
{
ifs >> a;
ifs >> b;
ifs >> c;
...
ifs >> z;
}
delete ifs;
The added code seems to work, but when it comes to reading, I get an error
message for access violation.
Is there any way to increase the istream buffer?
Yours
Lars Hammer
lch@elsamprojekt.dk
www.elsamprojekt.dk
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: James Kuyper <kuyper@wizard.net>
Date: 1998/09/03 Raw View
Lars Hammer wrote:
> I want to make a larger input buffer for faster file input. ...
[ moderator's note: Excessive quoting deleted. -sdc ]
> ...
> ifstream *ifs = new ifstream("myfile");
> char buffer[64*1024];
> filebuf *FileBuf = ifstream->rdbuf();
> FileBuf->pubsetbuf(buffer,sizeof(buffer));
> ifs >> count;
> for (int i = 0; i < count; i++)
> {
> ifs >> a;
> ifs >> b;
> ifs >> c;
> ...
> ifs >> z;
> }
> delete ifs;
> ... but when it comes to reading, I get an error
> message for access violation.
pubsetbuf() is defined as calling setbuf(). The effects of calling
setbuf() with non-zero arguments are implementation-defined. You'll need
to check your implementation's documentation to find out what's supposed
to happen, and complain to the implementor if it doesn't do what it says
should happen.
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]
Author: dietmar.kuehl@claas-solutions.de
Date: 1998/09/04 Raw View
Hi,
Lars Hammer (lch@elsamprojekt.dk) wrote:
> I want to make a larger input buffer for faster file input.
Well, I would expect that you won't get faster input by replacing the
buffer... Here is why:
1. The behavior of 'pubsetbuf()' is mainly implementation defined. The
only required behavior is to turn off output buffering if this
function is called with the arguments '0, 0'. The effect of calling
'pubsetbuf()' after the first I/O was done is undefined.
2. The implementation knows damn well which buffer size is the most
efficient to be transferred from disk to memory. I doubt that it
would be reasonably faster when the buffer size is increased...
Actually I can imagine implementations where things become slower
if the buffer is increased! Here is how this might happen: The
implementation reads a buffer, transfers it to an auxiliary buffer
(using 'codecvt()' this is almost certainly what is happening
anyway), and calls a function for asynchronous read to fill the buffer
again. When the auxiliary buffer is drained, there is a fair chance
that the read buffer is already filled again. However, I haven't
tested this appropach (yet).
3. Implementing 'basic_filebuf' is, well, damn complex. It can be done
relatively easy by processing only one character at a time but this
would slow down the standard extractors a lot (it is much more
efficient to read from a buffer than calling 'underflow()' or
'uflow()' for each character). Allowing the user to fiddle around
with buffers would make things even more complex. Thus,
I would expect that all implementations restrict the effect of
'pubsetbuf()' to what is required.
If the read performance is crucial for you (normally other factors are
the bottleneck), you might want to consider a different format than a
human readable: Binary formats are often much faster to process.
However, they also have tendency to become platform dependent or to be
less efficient.
--
<mailto:dietmar.kuehl@claas-solutions.de>
homepage: <http://www.informatik.uni-konstanz.de/~kuehl>
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]