Correct way to process a file in Ruby?

Hi

What is the correct way to code the following in Ruby?

open file
read field from file
determine from field the type of file
case file type in
type 1) do this (lots of stuff)
type 2) do that (even more stuff)
type 3) do something else (you get the picture)
*) error
esac
close file

I presently have this coded as one big method but the number of cases
are large and the complexity of the processing increasing. This is going
to end up a hell of a larger method.

I then thought about breaking it into a number of methods but that
resulted in opening and closing the file a number of times (in each
method) which feels bad

Would it be right to use a global variable as a file descriptor and open
the file in one method which returns a fd and then having a number of
methods processing using the fd and then finally having a method that
closes the file?

Sorry if I’m seem to be grasping for the correct Ruby language to
express this - I’m very new to Ruby coming from a C and shell
background.

Daveh

Recently coded something similar.
I just slurp the file into an array (File.readlines) and then
split the array into separate arrays of each record type.

cheers
Chris

Hi Chris,

I just slurp the file into an array (File.readlines) and then
split the array into separate arrays of each record type.

I guess I should have mentioned that these files are binary format and
can be quite large - anywhere from 3M to 50MB, so I was hoping to
process them on disk.

Would reading into an array still be appropriate in Ruby?

On 1/30/07, Dave H. [email protected] wrote:

If you are on a Unix system, look into the Ruby mmap library.

http://raa.ruby-lang.org/project/mmap/0.2.6

This will only read in the parts of the file you need. There is a
similar library for the windows platform (but I have not used that
one).

My solution to handling stuff like this is to create a parser class.

class FooParser

def initialize( filename )
@mmap = Mmap.new(filename, ‘r’)
end

def close
return if @mmap.nil?
@mmap.unmap
@mmap = nil
end

def parse_info_type1
# do stuff here to parse one type of information from the mmap
object
end

def parse_info_type2
# etc …
end

end

It works very well, and mmap allows me to handle gigabyte sized files
without hogging all the system memory.

Blessings,
TwP

Dave H. wrote:

What is the correct way to code the following in Ruby?

open file
read field from file
determine from field the type of file
case file type in
type 1) do this (lots of stuff)
type 2) do that (even more stuff)
type 3) do something else (you get the picture)
*) error
esac
close file

You shouldn’t need to open/close the file more than once. Unless I’m
misunderstanding something, you should be able to just do something like
this:

File.open(‘huge.txt’) do |file|
first_line = file.gets
file_type = determine_file_type(first_line)
case file_type
when ‘csv’: process_csv(file)
when ‘tab_delimited’: process_tab(file)
else
puts “Error! OMG!”
end
end

def process_csv(file)
file.each do |line|
# do something with the line
end
end

The main File.open block handles the opening and closing of the file,
and you just pass the file handle to the methods that do the actual
processing. Nice and simple.