Repeatedly open file or save entire file to memory?

leveille · September 17, 2009, 4:49pm

I want to make sure I do what is most efficient when dealing with
multiple and potentially large files.

I need to take row(n) and row(n+1) from a file and use the data to do
things in other parts of my program. Then the program will iterate by
incrementing n. I may have up to 30 files, each having 50,000 rows.

My question is should I read row(n) and row(n+1), accessing the file
again and again on each iteration of the main program? Or should I just
read the whole file into memory (say, an array) then just grab items
from the array by index in the main program?

leveille · September 17, 2009, 6:38pm

2009/9/17 Jason L. [email protected]:

from the array by index in the main program?
Other schemes can be devised too:

read the file once remembering indexes for every file and row
(IO#tell) and then access rows via IO#seek
since you are incrementing n, read row n, remember pos, read row n

1, next time round #seek to position and continue reading

as 2 but remember line n+1 so you do not have to read it again
if the access pattern to files is not round robin but different,
you might get better results by storing more information in memory
forr least recently accessed files
read files in chunks of x lines and remember them in memory thus
reducing file accesses

…

It really depends on what you do with those files, how your access
patterns are etc.

Kind regards

robert

leveille · September 17, 2009, 8:56pm

Jason,

I want to make sure I do what is most efficient when dealing with
multiple and potentially large files.

you can use ruby-prof for profiling of your code. It’s available as
a gem.

Best regards,

Axel

leveille · September 18, 2009, 8:32am

2009/9/17 Axel E. [email protected]:

Jason,

I want to make sure I do what is most efficient when dealing with
multiple and potentially large files.

you can use ruby-prof for profiling of your code. It’s available as
a gem.

I consider Jason’s question as a design level question. That’s
nothing where a profiler can really help. Of course you can code up
alternatives and measure performance. But it can only tell you which
version of several is fastest - it cannot tell you how you should
change your design to improve it.

In this case performance bottlenecks are rather in the area of disk IO
and all a profiler can tell you is how much of your time you spend in
IO - but not how to minimize that.

Kind regards

robert

leveille · September 18, 2009, 8:37am

In this case performance bottlenecks are rather in the area of disk IO
and all a profiler can tell you is how much of your time you spend in
IO - but not how to minimize that.

I agree, although that argument doesn’t make much sense.

A profiler can never tell you how to minimize anything, it can
only show you where you should look for optimizations.
In this case of course that’s futile, since we already know where
to optimize: the IO

Greetz!

leveille · September 18, 2009, 8:35am

You could put the data into a database,
which should be performant enough
and still very easy to use, even when
your lookup pattern should change
in the future.

Greetz!

leveille · September 18, 2009, 5:54pm

Fabian S. wrote:

You could put the data into a database,
which should be performant enough
and still very easy to use, even when
your lookup pattern should change
in the future.

Greetz!

That is a good idea. Do you recommend ruby DBI or ActiveRecord? I need
ease of use and simplicity. My interface is the command line.

leveille · September 18, 2009, 6:14pm

actually I like Datamapper the most. It’s very intuitive.
You should check it out: http://datamapper.org/doku.php

I definitely like the way datamapper handles things better
than ActiveRecord, but that’s a matter of taste.

Greetz!

leveille · September 18, 2009, 11:18am

-------- Original-Nachricht --------

Datum: Fri, 18 Sep 2009 15:30:38 +0900
Von: Robert K. <shortcutter@googlema il.com>
An: [email protected]
Betreff: Re: repeatedly open file or save entire file to memory?

2009/9/17 Axel E. [email protected]:

Jason,

I want to make sure I do what is most efficient when dealing with
multiple and potentially large files.

you can use ruby-prof for profiling of your code. It’s available as
a gem.

Dear Robert,

I consider Jason’s question as a design level question. That’s
nothing where a profiler can really help. Of course you can code up
alternatives and measure performance. But it can only tell you which
version of several is fastest - it cannot tell you how you should
change your design to improve it.

I agree with you. I proposed this precisely to see how long several
alternatives take. One always has to think about design oneself

Best regards,

Axel