Topic: The fastest way to count no of lines in a text file


Author: fred@genesis.demon.co.uk (Lawrence Kirby)
Date: Wed, 9 Feb 1994 01:27:47 +0000
Raw View
In article <2in80m$8g9@netaxs.com> peb@access.netaxs.com "Paul Begley" writes:

>Lawrence Kirby (fred@genesis.demon.co.uk) wrote:
>: In article <1994Jan28.022111.68@lynx.com> hiro@lynx.com "Hiro Sugawara"
> writes:
>: >So, allocating a buffer as big as the file itself and reading the entire
>: >file with a single read() call will be the fastest because the number
>: >of system calls is minimal.
>
>: Allocating very large buffers is often counter-productive. At best it kills
>: the efficiency of CPU caches and at worst it causes excessive system paging
>: with an enormous hit in efficiency. I did some tests on this a little while
>: back and comclused that on the particular system the optimum buffer size
>: was 10K. Anything above that tended to reduce efficiency.
>
>I can't believe this is question has lingered.
>
>How about reading the file as a stream and counting the EOL
>characters?  There have to be hundreds of examples of this in various
>UNIX texts.  I am an engineer, not a comp sci major, but I have to
>believe this is an entry level problem..

<I've quoted the whole message since it appears to have taken about a week
to get through>

The problem is that streams buffer management overhead becomes significant
when you are doing simple operations on the data like this. Given the emphasis
on speed it is much better to do your own buffering (assuming lower level
calls such as open/read/close are available). You still have to pick a
sensible buffer size.

-----------------------------------------
Lawrence Kirby | fred@genesis.demon.co.uk
Wilts, England | 70734.126@compuserve.com
-----------------------------------------




Author: nikki@trmphrst.demon.co.uk (Nikki Locke)
Date: Mon, 31 Jan 1994 20:34:20 +0000
Raw View
In article <759852773snz@genesis.demon.co.uk> fred@genesis.demon.co.uk writes:
> In article <1994Jan28.022111.68@lynx.com> hiro@lynx.com "Hiro Sugawara" writes:
> >So, allocating a buffer as big as the file itself and reading the entire
> >file with a single read() call will be the fastest because the number
> >of system calls is minimal.
>
> Allocating very large buffers is often counter-productive. At best it kills
> the efficiency of CPU caches and at worst it causes excessive system paging
> with an enormous hit in efficiency. I did some tests on this a little while
> back and comclused that on the particular system the optimum buffer size
> was 10K. Anything above that tended to reduce efficiency.

The fastest way is, of course, to use a memory mapped file.

--
Nikki Locke,Trumphurst Ltd.(PC & Unix consultancy) nikki@trmphrst.demon.co.uk
trmphrst.demon.co.uk is NOT affiliated with ANY other sites at demon.co.uk.




Author: peb@access.netaxs.com (Paul Begley)
Date: 2 Feb 1994 03:51:50 GMT
Raw View
Lawrence Kirby (fred@genesis.demon.co.uk) wrote:
: In article <1994Jan28.022111.68@lynx.com> hiro@lynx.com "Hiro Sugawara" writes:
: >So, allocating a buffer as big as the file itself and reading the entire
: >file with a single read() call will be the fastest because the number
: >of system calls is minimal.

: Allocating very large buffers is often counter-productive. At best it kills
: the efficiency of CPU caches and at worst it causes excessive system paging
: with an enormous hit in efficiency. I did some tests on this a little while
: back and comclused that on the particular system the optimum buffer size
: was 10K. Anything above that tended to reduce efficiency.

I can't believe this is question has lingered.

How about reading the file as a stream and counting the EOL
characters?  There have to be hundreds of examples of this in various
UNIX texts.  I am an engineer, not a comp sci major, but I have to
believe this is an entry level problem..

--
Paul Begley
peb@netaxs.com




Author: hiro@lynx.com (Hiro Sugawara)
Date: Fri, 28 Jan 1994 02:21:11 GMT
Raw View
In article <CJvvCF.CAr@world.std.com> jmochel@world.std.com (Jim S Mochel) writes:
>Hiro Sugawara (hiro@lynx.com) wrote:
>: In article <CI7FC7.LLD@druid.com> darcy@druid.com (D'Arcy J.M. Cain) writes:
>: >>>PS: reading the lines one by one is too slow in my opinion
>: >
>: >>I would allocate a rather large buffer, say 32K, read in a chunk of the file
>: >>into the buffer at a time, and then do a quick sequential search through the
>: >>buffer.  Since the searching will be in fast RAM it shouldn't be too bad.
>
>If you choose to use the ANSI file I/O routines you can allocate a buffer
>so that your function calls are operating on in-core memory rather than
>file storage. It allows you to use standard calls on RAM for a speed
>that is significantly better than normal file I/O and less fuss than
>allocating the buffer yourself and using your own parsing routines.
>
>I remember reading that C++ streams are buffered in somewhat the same way
>but I am not usre if you have control over the size of the buffer.

I suggested to allocate a big buffer, ideally as big as the whole file, in
the appliaction and to read as many bytes as possible at once because the
original poster was seeking the *fastest* way. The ANSI file I/O routines
in general are not as fast as you might think. This is due to the
buffer size, which in most cases just 512 bytes or so. So, if the file
size were 1MB, your stream I/O (getc()) would have to call read() 2048
times. In most implementations, all stream I/O read routines such as
fread() and fgets() internally use getc().

Since disk data bytes are read by sector and most disks have 512 byte
sectors, they are buffered by the same amount as stream I/O and you will
eventually read "in-core" data even without stream I/O routines unless
you read always starts at sector boundaries. So, what makes the
difference? The answer is the expense you have to pay for each system
call. System calls are generally very expensive (slow) because the
kernel execute a lot of instructions to deal with the trap and the
context switches back and forth between user and kernel spaces. This
is why stream I/O was invented. For example,

 for (i = 1; i < 512; i++)
  read(fd, p++, 1);

is much slower than

 for (i = 1; i < 512; i++)
  *p++ = getc(fp);

even though they buffer data by the same amount (512 bytes) and getc()
may use "double buffering" (once in the kernel and again in the user
space).

So, allocating a buffer as big as the file itself and reading the entire
file with a single read() call will be the fastest because the number
of system calls is minimal. This can be considered merely simulation of
stream I/O with a bigger buffer size. Depending on the operating system,
you may be able to prevent the disk driver from using intermediate
(internal) sector buffers so the operation will be even faster.

I don't know much about C++'s stream I/O but I think the same logic
applies.

hiro@lynx.com




Author: fred@genesis.demon.co.uk (Lawrence Kirby)
Date: Sat, 29 Jan 1994 14:12:53 +0000
Raw View
In article <1994Jan28.022111.68@lynx.com> hiro@lynx.com "Hiro Sugawara" writes:

  .
  .
>So, allocating a buffer as big as the file itself and reading the entire
>file with a single read() call will be the fastest because the number
>of system calls is minimal.

Allocating very large buffers is often counter-productive. At best it kills
the efficiency of CPU caches and at worst it causes excessive system paging
with an enormous hit in efficiency. I did some tests on this a little while
back and comclused that on the particular system the optimum buffer size
was 10K. Anything above that tended to reduce efficiency.

-----------------------------------------
Lawrence Kirby | fred@genesis.demon.co.uk
Wilts, England | 70734.126@compuserve.com
-----------------------------------------




Author: jmochel@world.std.com (Jim S Mochel)
Date: Wed, 19 Jan 1994 15:36:14 GMT
Raw View
Hiro Sugawara (hiro@lynx.com) wrote:
: In article <CI7FC7.LLD@druid.com> darcy@druid.com (D'Arcy J.M. Cain) writes:
: >>>PS: reading the lines one by one is too slow in my opinion
: >
: >>I would allocate a rather large buffer, say 32K, read in a chunk of the file
: >>into the buffer at a time, and then do a quick sequential search through the
: >>buffer.  Since the searching will be in fast RAM it shouldn't be too bad.

If you choose to use the ANSI file I/O routines you can allocate a buffer
so that your function calls are operating on in-core memory rather than
file storage. It allows you to use standard calls on RAM for a speed
that is significantly better than normal file I/O and less fuss than
allocating the buffer yourself and using your own parsing routines.

I remember reading that C++ streams are buffered in somewhat the same way
but I am not usre if you have control over the size of the buffer.

Jim Mochel
jmochel@world.std.com






Author: rdobbins@marvin.ag.uidaho.edu (Robert Dobbins)
Date: 27 Dec 1993 23:18:04 GMT
Raw View
>you might expect.
>
>I agree that finctions/macros such as getchar can be greatly improved upon
              ^^^^^^^^^  can U spell ok?

>in terms of speed. They're fast enough for most situations though.
>
>-----------------------------------------
>Lawrence Kirby | fred@genesis.demon.co.uk
>Wilts, England | 70734.126@compuserve.com

And the news headline is:

Lawrence Kirby Wilts England (with his breath!)

>-----------------------------------------


------------------------------------------------------------------
Robert Dobbins            Internet: rdobbins@marvin.ag.uidaho.edu
------------------------------------------------------------------
Any similarity between the opinions expressed herein and any other
opinions, either living or dead, is purely coincidental.





Author: pete@borland.com (Pete Becker)
Date: Mon, 3 Jan 1994 19:25:48 GMT
Raw View
In article <2fd4h4$9vb@tcsi.tcs.com>,
Sriram Srinivasan <sriram@glock.tcs.com> wrote:
>If you are on a Unix or Windows NT system, mmap the file, and count the
>number of newline characters. This avoids the buffered reads. There's no way anything
>else is going to be faster.
>

 It doesn't avoid the buffered reads. It just hides them from you in
a different way than using stdio does. The file is still located on disk and
has to be read into RAM somehow. By passing responsibility for doing this to
the OS you assume that the OS implementors have done it more efficiently than
you could do it yourself. Whether this assumption is valid depends on the
exact details of what you're doing. In the case of counting newlines I suspect
that it's valid, but if performance is really critical it's easy enough to
write several versions using different strategies and compare them. Anything
else is just handwaving.
 -- Pete




Author: dave@echologic.com (Dave Hayden 908- 946-1107)
Date: Mon, 20 Dec 1993 13:26:56 GMT
Raw View
glenn@catt.citri.edu.au (Glenn Jayaputera) writes:

>Hi....I am would like to count the number of lines in a text file.  The text
>file is composed of arbitrary length of lines of chars. each line is ended
>with '\n' char. I guess my question is, what is the best (in term of speed)
>way to count them...

>thanks
>  glenn jayaputera

>PS: reading the lines one by one is too slow in my opinion

I think this problem will be limited by the way you read the data, not the
way you search for the lines.  Using fread(), you'll get at least 2
copies of the data being made (one internal to fread, one that it
passes back to you).  On a pc, use _dosread() instead.  On the Sun
(and I think other UNIX boxes), you can map an input file right into
your address space - definitely the fastest way to read it.

Dave




Author: kjb@cgl.citri.edu.au (Kendall Bennett)
Date: 19 Dec 93 23:27:43 GMT
Raw View
glenn@catt.citri.edu.au (Glenn Jayaputera) writes:

>Hi....I am would like to count the number of lines in a text file.  The text
>file is composed of arbitrary length of lines of chars. each line is ended
>with '\n' char. I guess my question is, what is the best (in term of speed)
>way to count them...

Hey Glenn!

The fastest way to count the number of lines in a text file it to read the
file in large chunks (say about 20-30k in size), and scan the large buffer
in memory for the /n character. Counting these gives you the number of lines
in the file. When you go off the end of the buffer, simply toss out the
old buffer and read in a fresh one. This is _fast_ and you can even do the
/n scanning on a PC using the fast string instructions for real speed. However
the biggest bottleneck is likely to be getting the data off the disk into
memory, so ensure you use the open/read/close calls on UNIX, or if you use
the C STDIO routines turn off all input buffering (since you are doing
the buffering yourself). If you wish I can post the source code to my text
translation program which runs under both MSDOS and Unix and uses this
approach.

+-------------------------------------------+------------------------------+
| Kendall Bennett                           | Internet:   kjb@citri.edu.au |
| Software Engineer, SciTech Software       | Compuserve: 100237,2213      |
| G.P.O Box 4216NN                          | Fax:        +61 3 690 2137   |
| Melbourne 3001 AUSTRALIA                  |                              |
+-------------------------------------------+------------------------------+




Author: kpv@ulysses.att.com (Phong Vo)
Date: Mon, 20 Dec 1993 15:54:16 GMT
Raw View
In article <dave.756394016@bini> dave@echologic.com (Dave Hayden 908- 946-1107) writes:
>glenn@catt.citri.edu.au (Glenn Jayaputera) writes:
>
>>Hi....I am would like to count the number of lines in a text file.
>>  glenn jayaputera
>
>I think this problem will be limited by the way you read the data, not the
>way you search for the lines.  Using fread(), you'll get at least 2
>copies of the data being made (one internal to fread, one that it
>passes back to you).  On a pc, use _dosread() instead.  On the Sun
>(and I think other UNIX boxes), you can map an input file right into
>your address space - definitely the fastest way to read it.
>
>Dave

This is right. But if you care about portability you don't really want to
use mmap directly. If you have the sfio library, the below code will do it:
 #include <sfio.h>
 main()
 { int nlines = sfmove(sfstdin,(Sfio_t*)0,SF_UNBOUND,'\n');
  sfprintf(sfstdout,"%d\n",nlines);
 }
Below is some timing differences in using the above code (sfio.nlines),
a simple stdio's getchar() program (stdio.nlines) and /usr/ucb/wc.
This was done on a Solbourne running SUNOS.
The sfio version wins big for 2 reasons: (1) sfio uses mmap (if it is available)
for reading files, and (2) sfmove avoids the complex dereferences of getchar
to read bytes. If you must use stdio, you can achieve (2) by reading large
chunks of data with fread. For most stdio implementations
this would probably be about twice as slow as the sfio version because
fread must copy from stdin's buffer to your buffer.

g:work:work/src/lib/sfio/TEST$ time sfio.nlines < /vmunix
8379

real    0m0.56s
user    0m0.38s
sys     0m0.13s
g:work:work/src/lib/sfio/TEST$ time stdio.nlines < /vmunix
8379

real    0m1.98s
user    0m1.61s
sys     0m0.25s
g:work:work/src/lib/sfio/TEST$ time /usr/ucb/wc -l < /vmunix
    8379

real    0m2.95s
user    0m2.60s
sys     0m0.16s

Phong Vo, kpv@research.att.com




Author: richardm@cd.com (Richard Masoner)
Date: Mon, 20 Dec 1993 16:46:19 GMT
Raw View
In article <2evta3$ebq@crcnis1.unl.edu> tbills@ponca.unl.edu (Trent Bills) writes:
>In article <1993Dec18.175558.8044@dxcern.cern.ch>, danpop@cernapo.cern.ch (Dan Pop) writes:
>|> If it's not a macro at all, you have the overhead of a function call per
>|> character, which is catastrophic on a modern architecture....
>
>Since when is a function call catastrophic on a modern architecture.  Most
>modern architectures have highly optimized function call methods such as
>register windows.  The overhead of a function call on most current architectures
>can be as little as one clock cycle and usually no more than a few....

I'm reading this post in an msdos group.  Lots of  ms-dos compiler
code that I've had the pleasure of looking at does old fashioned
passing parameters on the stack.  With x86 protected mode programming,
function calls can take dozens of clock cycles, what with all the
selector registers getting loaded and all.

Also, am I mistaken in thinking that only RISC architectures use
register windows, or are other architectures borrowing this technique?

Please note that followups are to alt.msdos.programmer.
--
Richard F. Masoner   | Seen on a magazine cover:
Central Data Corporation  |
1602 Newton Dr., Champaign, IL 61821 | "Guns don't kill people,
(217) 359-8010 x251   |  television does."




Author: hiro@lynx.com (Hiro Sugawara)
Date: Thu, 23 Dec 1993 18:05:57 GMT
Raw View
In article <CI7FC7.LLD@druid.com> darcy@druid.com (D'Arcy J.M. Cain) writes:
>frampton@vicuna.ocunix.on.ca (Steve Frampton) writes:
>>glenn@catt.citri.edu.au (Glenn Jayaputera) writes:
>>> Hi....I am would like to count the number of lines in a text file.  The
>>>PS: reading the lines one by one is too slow in my opinion
>
>I'm afraid the only alternative is magic.  :-)
>
>>I would allocate a rather large buffer, say 32K, read in a chunk of the file
>>into the buffer at a time, and then do a quick sequential search through the
>>buffer.  Since the searching will be in fast RAM it shouldn't be too bad.
>
>Why get so complicated?
>
>#include <stdio.h>
>int
>main(void)
>{
>  int c;
>  int count = 0;
>
>  while ((c = getchar()) != EOF)
>    if (c == '\n')
>      count++;
>
>  printf("Total number of lines: %d\n");
>  return(0);
>}
>
>If you need a big buffer use setvbuf(3) first.  The stdio routines do all
>your buffering for you and hopefully have been optimised for your system.
>Even getchar() should be a macro in most cases.

In general, the standard stream functions (and macros) do more than you
need and are very slow. Allocating a huge buffer (ideally as big as the
file itself), reading the entire file, then counting \n's in memory is
the *fastest*.

hiro@lynx





Author: hiro@lynx.com (Hiro Sugawara)
Date: Thu, 23 Dec 1993 18:16:33 GMT
Raw View
In article <2evta3$ebq@crcnis1.unl.edu> tbills@ponca.unl.edu (Trent Bills) writes:
>In article <1993Dec18.175558.8044@dxcern.cern.ch>, danpop@cernapo.cern.ch (Dan Pop) writes:
>|> If it's not a macro at all, you have the overhead of a function call per
>|> character, which is catastrophic on a modern architecture. So, doing your own
>|> buffering might make sense, if speed is a concern. Anyway, the best solution
>|> on a certain platform can be found only by doing a little bit of benchmarking.
>
>Since when is a function call catastrophic on a modern architecture.  Most
>modern architectures have highly optimized function call methods such as
>register windows.  The overhead of a function call on most current architectures
>can be as little as one clock cycle and usually no more than a few.  I highly
>doubt that you would notice a drastic difference between the getchar method
>and doing your own buffer, but then again I could be wrong.

Avoiding function calls is always a good way to make a program run faster.

"Register windows" (I assume the second poster refers to the SPAR
architecture) is a kind of two-blade sword: If you have more windows than
your function call nesting levels, you do not need to save your register
variables across function calls and your return address is saved in one
of the general registers, thus minimal memory access and instrucntion
execution; if a window overflow occurs, however, a trap initiates a whole
bunch of instructions executed in the kernel and thus slow.

hiro@lynx.com




Author: sriram@glock.tcs.com (Sriram Srinivasan)
Date: 23 Dec 1993 22:02:12 GMT
Raw View
If you are on a Unix or Windows NT system, mmap the file, and count the
number of newline characters. This avoids the buffered reads. There's no way anything
else is going to be faster.

Sriram
(sriram@tcs.com)




Author: fred@genesis.demon.co.uk (Lawrence Kirby)
Date: Thu, 23 Dec 1993 23:32:20 +0000
Raw View
In article <1993Dec23.180557.122@lynx.com> hiro@lynx.com "Hiro Sugawara" writes:

>In general, the standard stream functions (and macros) do more than you
>need and are very slow. Allocating a huge buffer (ideally as big as the
>file itself), reading the entire file, then counting \n's in memory is
>the *fastest*.

Making the buffer too large can reduce performance. You'll do a lot
better of it fits into the CPU cache. At the extremes a large buffer will
cause paging which will *really* slow the system down. I did some performance
tests a little while back and came to the conclusion that on my particular
system the optimal buffer size is about 10-12K. Which is perhaps smaller than
you might expect.

I agree that finctions/macros such as getchar can be greatly improved upon
in terms of speed. They're fast enough for most situations though.

-----------------------------------------
Lawrence Kirby | fred@genesis.demon.co.uk
Wilts, England | 70734.126@compuserve.com
-----------------------------------------




Author: glenn@catt.citri.edu.au (Glenn Jayaputera)
Date: Thu, 16 Dec 93 12:13:11 GMT
Raw View
Hi....I am would like to count the number of lines in a text file.  The text
file is composed of arbitrary length of lines of chars. each line is ended
with '\n' char. I guess my question is, what is the best (in term of speed)
way to count them...

thanks
  glenn jayaputera

PS: reading the lines one by one is too slow in my opinion




Author: frampton@vicuna.ocunix.on.ca (Steve Frampton)
Date: Fri, 17 Dec 1993 02:14:03 GMT
Raw View
glenn@catt.citri.edu.au (Glenn Jayaputera) writes:

> Hi....I am would like to count the number of lines in a text file.  The
> text file is composed of arbitrary length of lines of chars. each line is
> ended with '\n' char. I guess my question is, what is the best (in term of
> speed) way to count them...
>
>PS: reading the lines one by one is too slow in my opinion

I would allocate a rather large buffer, say 32K, read in a chunk of the file
into the buffer at a time, and then do a quick sequential search through the
buffer.  Since the searching will be in fast RAM it shouldn't be too bad.

--< Anybody know a site where I can FTP the Pelican Brief? >--
Steve Frampton          E-mail: <frampton@vicuna.ocunix.on.ca>




Author: grumpy@cbnewse.cb.att.com (Paul J Lucas)
Date: 17 Dec 93 04:59:42 GMT
Raw View


Author: edlee@chinet.chinet.com (Edward Lee)
Date: Fri, 17 Dec 1993 05:40:22 GMT
Raw View
In article <1993Dec16.121311.8419@etrog.se.citri.edu.au>,
Glenn Jayaputera <glenn@catt.citri.edu.au> wrote:
>Hi....I am would like to count the number of lines in a text file.  The text
>file is composed of arbitrary length of lines of chars. each line is ended
>with '\n' char. I guess my question is, what is the best (in term of speed)
>way to count them...

If you are using Borland Pascal, then you can use BlockRead() to read in
up to 64512 characters in one pass into a buffer and then count the
number of times that '\n' occurs in the buffer, then do another BlockRead(),
and count the number of '\n' characters again, and keep doing this until
the parameter of BlockRead() which returns the number of bytes read
becomes zero.  Single character input/output is up to ten times slower
than multiple character reads in my experience.  You could also use
SetTextBuf() to increase the speed of input/output.

If you are using C, then you can use read() to do something similar.
The locality of references to indexes and buffers that are within the
scope of your program may be faster than using variations of getc() and
putc().


-Ed L






Author: darcy@druid.com (D'Arcy J.M. Cain)
Date: Sat, 18 Dec 1993 00:14:29 GMT
Raw View
frampton@vicuna.ocunix.on.ca (Steve Frampton) writes:
>glenn@catt.citri.edu.au (Glenn Jayaputera) writes:
>> Hi....I am would like to count the number of lines in a text file.  The
>>PS: reading the lines one by one is too slow in my opinion

I'm afraid the only alternative is magic.  :-)

>I would allocate a rather large buffer, say 32K, read in a chunk of the file
>into the buffer at a time, and then do a quick sequential search through the
>buffer.  Since the searching will be in fast RAM it shouldn't be too bad.

Why get so complicated?

#include <stdio.h>
int
main(void)
{
  int c;
  int count = 0;

  while ((c = getchar()) != EOF)
    if (c == '\n')
      count++;

  printf("Total number of lines: %d\n");
  return(0);
}

If you need a big buffer use setvbuf(3) first.  The stdio routines do all
your buffering for you and hopefully have been optimised for your system.
Even getchar() should be a macro in most cases.

--
D'Arcy J.M. Cain (darcy@druid.com)  |
D'Arcy Cain Consulting              |   There's no government
Toronto, Ontario, Canada            |   like no government!
+1 416 424 2871  (DoD#0082) (eNTP)  |




Author: danpop@cernapo.cern.ch (Dan Pop)
Date: Sat, 18 Dec 1993 17:55:58 GMT
Raw View
In <CI7FC7.LLD@druid.com> darcy@druid.com (D'Arcy J.M. Cain) writes:

>frampton@vicuna.ocunix.on.ca (Steve Frampton) writes:
>
>>I would allocate a rather large buffer, say 32K, read in a chunk of the file
>>into the buffer at a time, and then do a quick sequential search through the
>>buffer.  Since the searching will be in fast RAM it shouldn't be too bad.
>
>Why get so complicated?
>
>#include <stdio.h>
>int
>main(void)
>{
>  int c;
>  int count = 0;
>
>  while ((c = getchar()) != EOF)
>    if (c == '\n')
>      count++;
>
>  printf("Total number of lines: %d\n");
>  return(0);
>}
>
>If you need a big buffer use setvbuf(3) first.  The stdio routines do all
>your buffering for you and hopefully have been optimised for your system.
>Even getchar() should be a macro in most cases.
>
Yes, getchar is almost always a macro, defined like this:

#define getchar()       getc(stdin)

But getc may or may not be a macro, and even if it is a macro, it may be
something hairy like this:

#define getc(p)        (--(p)->__cnt < 0 ? __filbuf(p) : (int)*(p)->__ptr++)

If it's not a macro at all, you have the overhead of a function call per
character, which is catastrophic on a modern architecture. So, doing your own
buffering might make sense, if speed is a concern. Anyway, the best solution
on a certain platform can be found only by doing a little bit of benchmarking.

Just my $0.02,
Dan
--
Dan Pop
CERN, L3 Experiment
Email: danpop@cernapo.cern.ch
Mail:  CERN - PPE, Bat. 21 1-023, CH-1211 Geneve 23, Switzerland




Author: joe@bftsi0.UUCP (Joe Foster of Borg)
Date: 19 Dec 93 00:01:43 GMT
Raw View
In article <1993Dec16.121311.8419@etrog.se.citri.edu.au>, glenn@catt.citri.edu.au (Glenn Jayaputera) writes:
> Hi....I am would like to count the number of lines in a text file.  The text
> file is composed of arbitrary length of lines of chars. each line is ended
> with '\n' char. I guess my question is, what is the best (in term of speed)
> way to count them...

> thanks
>   glenn jayaputera

> PS: reading the lines one by one is too slow in my opinion

You'll just have to read the file and count the newline characters.
There's no other way save voodoo!

Sorry I posted this, but my email bounced. Please
include a valid email address in your signature!

(comp.msdos.programmer removed from Newsgroups line,
since my news software doesn't recognize that one)

--
Joe Foster (joe@bftsi0.uucp)
WARNING: I cannot be held responsible for the above        They're   coming  to
because  my cats have  apparently  learned to type.        take me away, ha ha!




Author: tbills@ponca.unl.edu (Trent Bills)
Date: 18 Dec 1993 21:39:15 GMT
Raw View
In article <1993Dec18.175558.8044@dxcern.cern.ch>, danpop@cernapo.cern.ch (Dan Pop) writes:
|> If it's not a macro at all, you have the overhead of a function call per
|> character, which is catastrophic on a modern architecture. So, doing your own
|> buffering might make sense, if speed is a concern. Anyway, the best solution
|> on a certain platform can be found only by doing a little bit of benchmarking.

Since when is a function call catastrophic on a modern architecture.  Most
modern architectures have highly optimized function call methods such as
register windows.  The overhead of a function call on most current architectures
can be as little as one clock cycle and usually no more than a few.  I highly
doubt that you would notice a drastic difference between the getchar method
and doing your own buffer, but then again I could be wrong.

|>
|> Just my $0.02,
|> Dan

Trent Bills
Graduate Slave
Department of Computer Science and Engineering
University of Nebraska--Lincoln




Author: danpop@cernapo.cern.ch (Dan Pop)
Date: Sun, 19 Dec 1993 18:30:28 GMT
Raw View
In <2evta3$ebq@crcnis1.unl.edu> tbills@ponca.unl.edu (Trent Bills) writes:

>Since when is a function call catastrophic on a modern architecture.  Most
>modern architectures have highly optimized function call methods such as
>register windows.  The overhead of a function call on most current architectures
>can be as little as one clock cycle and usually no more than a few.  I highly
>doubt that you would notice a drastic difference between the getchar method
>and doing your own buffer, but then again I could be wrong.
>
A function call has a bad effect at compile time as well as run time.

At compile time it will have a negative effect on the optimizer's
performance.

At run time it will disrupt the pipe(s) and the function call/return
overhead may be >> one clock cycle if the cpu has a limited (and not too
large) number of registers.

The "inline" feature of C++ is meant to overcome these inconveniences,
where it makes sense.

Dan
--
Dan Pop
CERN, L3 Experiment
Email: danpop@cernapo.cern.ch
Mail:  CERN - PPE, Bat. 21 1-023, CH-1211 Geneve 23, Switzerland