Forum: Ruby Correct way to process a file in Ruby?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F4c2b988ef6e2140d378000b5dc244a1?d=identicon&s=25 Dave Hatton (daveh)
on 2007-01-30 18:28
Hi

What is the correct way to code the following in Ruby?

open file
read field from file
determine from field the type of file
case file type in
   type 1) do this (lots of stuff)
   type 2) do that (even more stuff)
   type 3) do something else (you get the picture)
        *) error
esac
close file

I presently have this coded as one big method but the number of cases
are large and the complexity of the processing increasing. This is going
to end up a hell of a larger method.

I then thought about breaking it into a number of methods but that
resulted in opening and closing the file a number of times (in each
method) which feels bad

Would it be right to use a global variable as a file descriptor and open
the file in one method which returns a fd and then having a number of
methods processing using the fd and then finally having a method that
closes the file?

Sorry if I'm seem to be grasping for the correct Ruby language to
express this - I'm very new to Ruby coming from a C and shell
background.

Daveh
23172b6630dc631a134c9bad2fec2a39?d=identicon&s=25 ChrisH (Guest)
on 2007-01-30 18:35
(Received via mailing list)
Recently coded something similar.
I just slurp the file into an array (File.readlines) and then
split the array into separate arrays of each record type.

cheers
Chris
F4c2b988ef6e2140d378000b5dc244a1?d=identicon&s=25 Dave Hatton (daveh)
on 2007-01-30 18:40
Hi Chris,

> I just slurp the file into an array (File.readlines) and then
> split the array into separate arrays of each record type.

I guess I should have mentioned that these files are binary format and
can be quite large - anywhere from 3M to 50MB, so I was hoping to
process them on disk.

Would reading into an array still be appropriate in Ruby?
4d5b5dd4e263d780a5dfe7ac8b8ac98c?d=identicon&s=25 Tim Pease (Guest)
on 2007-01-30 18:49
(Received via mailing list)
On 1/30/07, Dave Hatton <rubylist@davehatton.it> wrote:
>
If you are on a Unix system, look into the Ruby mmap library.

<http://raa.ruby-lang.org/project/mmap/0.2.6>

This will only read in the parts of the file you need.  There is a
similar library for the windows platform (but I have not used that
one).

My solution to handling stuff like this is to create a parser class.


class FooParser

  def initialize( filename )
    @mmap = Mmap.new(filename, 'r')
  end

  def close
    return if @mmap.nil?
    @mmap.unmap
    @mmap = nil
  end

  def parse_info_type1
    # do stuff here to parse one type of information from the mmap
object
  end

  def parse_info_type2
    # etc ...
  end

end


It works very well, and mmap allows me to handle gigabyte sized files
without hogging all the system memory.

Blessings,
TwP
De271a04fe7a67b884ce75404c1dcc61?d=identicon&s=25 Chris Gernon (kabigon)
on 2007-01-30 18:49
Dave Hatton wrote:
> What is the correct way to code the following in Ruby?
>
> open file
> read field from file
> determine from field the type of file
> case file type in
>    type 1) do this (lots of stuff)
>    type 2) do that (even more stuff)
>    type 3) do something else (you get the picture)
>         *) error
> esac
> close file

You shouldn't need to open/close the file more than once. Unless I'm
misunderstanding something, you should be able to just do something like
this:

File.open('huge.txt') do |file|
  first_line = file.gets
  file_type = determine_file_type(first_line)
  case file_type
  when 'csv': process_csv(file)
  when 'tab_delimited': process_tab(file)
  else
    puts "Error! OMG!"
  end
end

def process_csv(file)
  file.each do |line|
    # do something with the line
  end
end

The main File.open block handles the opening and closing of the file,
and you just pass the file handle to the methods that do the actual
processing. Nice and simple.
This topic is locked and can not be replied to.