Forum: Ruby File open, read and store in Hash, efficient?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0f4b3fe3db6fe0c27150b32713f334dc?d=identicon&s=25 Kev (Guest)
on 2007-03-09 10:15
(Received via mailing list)
Hello,

  I am writing a class and I require it to open a file, and store the
contents in key, value pairs.
This is my first

  def initialize()
    @@store = Hash.new
  end

  def read_file
    if File.exists?("LocationCopy.csv")
      f = File.open("LocationCopy.csv","r")
      f.each do |line|
        temp = line.split(",")
        @@store[temp[0]] = temp[1]
      end
      f.close
    end
    #puts @@store
  end
0f4b3fe3db6fe0c27150b32713f334dc?d=identicon&s=25 Kev (Guest)
on 2007-03-09 10:20
(Received via mailing list)
On 9 Mar, 09:12, "Kev" <griffin....@gmail.com> wrote:
>   def read_file
>     if File.exists?("LocationCopy.csv")
>       f = File.open("LocationCopy.csv","r")
>       f.each do |line|
>         temp = line.split(",")
>         @@store[temp[0]] = temp[1]
>       end
>       f.close
>     end
>     #puts @@store
>   end

Unfortunately thats what I call finger trouble, as I was saying this
is my first attempt at a Ruby application and was wondering if there
is a more efficient method for what I am trying to achieve. Would
using f.each_line and using a block be better?

Thanks,
Kev
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-03-09 10:47
(Received via mailing list)
2007/3/9, Kev <griffin.kev@gmail.com>:
> >
> >   end
>
> Unfortunately thats what I call finger trouble, as I was saying this
> is my first attempt at a Ruby application and was wondering if there
> is a more efficient method for what I am trying to achieve. Would
> using f.each_line and using a block be better?

Efficiency is ok.  Using the block form of File.open is safer, i.e.
the file is always closed - even in case of error. But you should not
use a class variable, use @store instead.

And you can make your life easier by using CSV lib.  Then it becomes a
one liner:

10:41:07 [~]: cat x
a,b
d,b;c

10:41:08 [~]: ruby -r csv -r enumerator -e 'p CSV.to_enum(:open, "x",
"r", ";").inject({}) {|h,(k,v)| h[k]=v; h}'
{"a,b"=>nil, "d,b"=>"c"}

10:41:32 [~]: ruby -r csv -r enumerator -e 'p CSV.to_enum(:open, "x",
"r", ",").inject({}) {|h,(k,v)| h[k]=v; h}'
{"a"=>"b", "d"=>"b;c"}

CSV.foreach uses "," as default separator:

10:41:49 [~]: ruby -r csv -r enumerator -e 'p CSV.to_enum(:foreach,
"x").inject({}) {|h,(k,v)| h[k]=v; h}'
{"a"=>"b", "d"=>"b;c"}

Explanation: CSV.foreach yiels every record to the block.  By using
to_enum (which is part of "enumerator") you can treat the CSV reader
like any Enumerable.  With #inhect, a value is passed as first
parameter to the block and the block result is passed to the next
invocation to the block. In this case the hash which is stuffed into
#inject is simply passed on and on and is ultimately the result of
#inject. "p" then prints it.

Kind regards

robert
0f4b3fe3db6fe0c27150b32713f334dc?d=identicon&s=25 Kev (Guest)
on 2007-03-09 11:25
(Received via mailing list)
Excellent.

Thank you Robert.
B74f9ac58c7a0b80d877470198e1a472?d=identicon&s=25 gga (Guest)
on 2007-03-09 11:30
(Received via mailing list)
Well, your code is more or less okay.  It may be buggy in that you are
also storing the \n (end of line) character.  You probably need
something like:
 @@store[temp[0]] = temp[1].chomp
to remove the it.

You can avoid checking if the file exists (if it does not, an Errno
exception will be raised and propagated upstream).  Let the
application, instead of your class, deal with what's probably a user
error (providing a missing file).
You can also avoid the file close by doing it in a block (let ruby's C
code automatically do the file close) and you can use IO#foreach
(File#foreach) for iterating thru each line more easily.
If you know you won't have files that won't fit in memory, you can
read all your text into a string or array in a single go (this is
usually called slurping), which can also speed things up a little in
some cases.

Here are some examples of doing the same thing written in different
ways:


require 'yaml'

class ReaderYAML
  def initialize(file)
    # slurp the whole file into a string
    lines = File.read(file)
    # change commas to : (yaml hash representation)
    lines.gsub!(/,/, ':')
    # create the hash thru yaml
    @h = YAML::load(lines)
  end
end

require 'csv'

class ReaderCSV
  def initialize(file)
    # read the file as a CSV file, flatten the resulting array and
    # make it a hash
    @h = Hash[*(CSV.read(file).flatten)]
  end
end

class ReaderCommas
  def initialize(file)
    @h = {}
    # slurp the file into an array
    lines = File.readlines(file)
    # process each line
    lines.each { |line|
      key, value = line.chomp.split(',')
      @h[key] = value
    }
  end
end

class ReaderCommasBigFile
  def initialize(file)
    @h = {}
    File.foreach(file) do |line|
      key, val = line.chomp.split(',')
      @h[key] = val
    end
  end
end

h = ReaderYAML.new('csv.txt')
p h

h2 = ReaderCSV.new('csv.txt')
p h2

h3 = ReaderCommas.new('csv.txt')
p h3

h4 = ReaderCommasBigFile.new('csv.txt')
p h4


require 'benchmark'

n = 5000
Benchmark.bm(5) do |b|
  b.report('big') { n.times do ReaderCommasBigFile.new('csv.txt');
end }
  b.report('file') { n.times do ReaderCommas.new('csv.txt'); end }
  b.report('csv')  { n.times do ReaderCSV.new('csv.txt'); end }
  b.report('yaml') { n.times do ReaderYAML.new('csv.txt'); end }
end


The YAML version does not do exactly the same as the others, but
depending on your data, it might still be what you want.  It also
works for a very simple key/value pair per line.  Albeit YAML involves
a little bit more work, it is still pretty optimized and will turn
numeric data automatically into the appropriate ruby numeric class.
CSV automatically deals with comma separated files for you, albeit it
is somewhat slow.

Anyway, hope that gives you some ideas.  Overall, unless you are
dealing with huge files, you should not worry too much about speed
while writing your class.
0f4b3fe3db6fe0c27150b32713f334dc?d=identicon&s=25 Kev (Guest)
on 2007-03-09 11:55
(Received via mailing list)
gga,

Thank you for the code,
I will go away and digest.

Cheers,
Kev
This topic is locked and can not be replied to.