File open, read and store in Hash, efficient?

Kev · March 9, 2007, 10:15am

Hello,

I am writing a class and I require it to open a file, and store the
contents in key, value pairs.
This is my first

def initialize()
@@store = Hash.new
end

def read_file
if File.exists?(“LocationCopy.csv”)
f = File.open(“LocationCopy.csv”,“r”)
f.each do |line|
temp = line.split(",")
@@store[temp[0]] = temp[1]
end
f.close
end
#puts @@store
end

Kev · March 9, 2007, 10:20am

On 9 Mar, 09:12, “Kev” [email protected] wrote:

def read_file
if File.exists?(“LocationCopy.csv”)
f = File.open(“LocationCopy.csv”,“r”)
f.each do |line|
temp = line.split(“,”)
@@store[temp[0]] = temp[1]
end
f.close
end
#puts @@store
end

Unfortunately thats what I call finger trouble, as I was saying this
is my first attempt at a Ruby application and was wondering if there
is a more efficient method for what I am trying to achieve. Would
using f.each_line and using a block be better?

Thanks,
Kev

Kev · March 9, 2007, 10:47am

2007/3/9, Kev [email protected]:

end

Unfortunately thats what I call finger trouble, as I was saying this
is my first attempt at a Ruby application and was wondering if there
is a more efficient method for what I am trying to achieve. Would
using f.each_line and using a block be better?

Efficiency is ok. Using the block form of File.open is safer, i.e.
the file is always closed - even in case of error. But you should not
use a class variable, use @store instead.

And you can make your life easier by using CSV lib. Then it becomes a
one liner:

10:41:07 [~]: cat x
a,b
d,b;c

10:41:08 [~]: ruby -r csv -r enumerator -e ‘p CSV.to_enum(:open, “x”,
“r”, “;”).inject({}) {|h,(k,v)| h[k]=v; h}’
{“a,b”=>nil, “d,b”=>“c”}

10:41:32 [~]: ruby -r csv -r enumerator -e ‘p CSV.to_enum(:open, “x”,
“r”, “,”).inject({}) {|h,(k,v)| h[k]=v; h}’
{“a”=>“b”, “d”=>“b;c”}

CSV.foreach uses “,” as default separator:

10:41:49 [~]: ruby -r csv -r enumerator -e ‘p CSV.to_enum(:foreach,
“x”).inject({}) {|h,(k,v)| h[k]=v; h}’
{“a”=>“b”, “d”=>“b;c”}

Explanation: CSV.foreach yiels every record to the block. By using
to_enum (which is part of “enumerator”) you can treat the CSV reader
like any Enumerable. With #inhect, a value is passed as first
parameter to the block and the block result is passed to the next
invocation to the block. In this case the hash which is stuffed into
#inject is simply passed on and on and is ultimately the result of
#inject. “p” then prints it.

Kind regards

robert

Kev · March 9, 2007, 11:25am

Excellent.

Thank you Robert.

Kev · March 9, 2007, 11:30am

Well, your code is more or less okay. It may be buggy in that you are
also storing the \n (end of line) character. You probably need
something like:
@@store[temp[0]] = temp[1].chomp
to remove the it.

You can avoid checking if the file exists (if it does not, an Errno
exception will be raised and propagated upstream). Let the
application, instead of your class, deal with what’s probably a user
error (providing a missing file).
You can also avoid the file close by doing it in a block (let ruby’s C
code automatically do the file close) and you can use IO#foreach
(File#foreach) for iterating thru each line more easily.
If you know you won’t have files that won’t fit in memory, you can
read all your text into a string or array in a single go (this is
usually called slurping), which can also speed things up a little in
some cases.

Here are some examples of doing the same thing written in different
ways:

require ‘yaml’

class ReaderYAML
def initialize(file)
# slurp the whole file into a string
lines = File.read(file)
# change commas to : (yaml hash representation)
lines.gsub!(/,/, ‘:’)
# create the hash thru yaml
@h = YAML::load(lines)
end
end

require ‘csv’

class ReaderCSV
def initialize(file)
# read the file as a CSV file, flatten the resulting array and
# make it a hash
@h = Hash[*(CSV.read(file).flatten)]
end
end

class ReaderCommas
def initialize(file)
@h = {}
# slurp the file into an array
lines = File.readlines(file)
# process each line
lines.each { |line|
key, value = line.chomp.split(’,’)
@h[key] = value
}
end
end

class ReaderCommasBigFile
def initialize(file)
@h = {}
File.foreach(file) do |line|
key, val = line.chomp.split(’,’)
@h[key] = val
end
end
end

h = ReaderYAML.new(‘csv.txt’)
p h

h2 = ReaderCSV.new(‘csv.txt’)
p h2

h3 = ReaderCommas.new(‘csv.txt’)
p h3

h4 = ReaderCommasBigFile.new(‘csv.txt’)
p h4

require ‘benchmark’

n = 5000
Benchmark.bm(5) do |b|
b.report(‘big’) { n.times do ReaderCommasBigFile.new(‘csv.txt’);
end }
b.report(‘file’) { n.times do ReaderCommas.new(‘csv.txt’); end }
b.report(‘csv’) { n.times do ReaderCSV.new(‘csv.txt’); end }
b.report(‘yaml’) { n.times do ReaderYAML.new(‘csv.txt’); end }
end

The YAML version does not do exactly the same as the others, but
depending on your data, it might still be what you want. It also
works for a very simple key/value pair per line. Albeit YAML involves
a little bit more work, it is still pretty optimized and will turn
numeric data automatically into the appropriate ruby numeric class.
CSV automatically deals with comma separated files for you, albeit it
is somewhat slow.

Anyway, hope that gives you some ideas. Overall, unless you are
dealing with huge files, you should not worry too much about speed
while writing your class.

Kev · March 9, 2007, 11:55am

gga,

Thank you for the code,
I will go away and digest.

Cheers,
Kev