Seeking the Ruby way


#1

I’m just getting my feet wet with Ruby and would like some advice on how
you
“old-timers” would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields
per
row, sorted on the second field. There may be multiple rows for field 2.
I
want to get a list of all of the unique values of field2 that has more
than
1 value for the 1st 6 characters of field 1.

Here’s what I did:

require ‘csv’

last_account_id = ‘’
last_adv_id = ‘’
parent_co_ids = []
cntr = 0
first = true
CSV::Reader.parse(File.open(‘e:\tmp\20060201\bsa.csv’, ‘r’)) do |row|
if row[1] == last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts “#{last_account_id} - (#{parent_co_ids.join(’,’)})”
cntr = cntr + 1
end
parent_co_ids.clear
else
first = false
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts “Found #{cntr} accounts with multiple parent companies”

Thanks in advance!

Todd B.


#2

On Fri, 3 Feb 2006, Todd B. wrote:

require ‘csv’
else
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts “Found #{cntr} accounts with multiple parent companies”

Thanks in advance!

Todd B.

harp:~ > cat a.rb
require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

harp:~ > cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

harp:~ > ruby a.rb in.csv

aaaaaa: 2
aaabbb: 3

hth. regards.

-a


#3

On 2/2/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

I’m curious why you decided to make count its own lambda when:

  1. It’s only ever used once
  2. The block that uses it has only one statement, namely the call to
    count
  3. count and the block to CSV::open have the same signature

I think at a minimum, given 2) and 3), I’d just replace the block to
CSV::open with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”, &count)

Then, since count isn’t used anywhere else, I’d join those together:

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

After those transformations:

galadriel:~ lukfugl$ cat a.rb
require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}
y sum.delete_if{|k,v| v == 1}

galadriel:~ lukfugl$ cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

galadriel:~ lukfugl$ ruby a.rb in.csv

aaaaaa: 2
aaabbb: 3

Just seems a little clearer to me over having an extra one-time use
lambda.

Jacob F.


#4

Jacob F. wrote:

sum = Hash::new{|h,k| h[k] = 0}

And for some reason, I tend to write

sum = Hash.new(0)

when dealing with an immediate value. (But maybe it’s a better practice
to use Ara’s form, so that if you ever replace 0 with, say, a matrix,
you don’t reuse the same object for each key in the hash.)


#5

On Fri, 3 Feb 2006, Jacob F. wrote:

I’m curious why you decided to make count its own lambda when:

  1. It’s only ever used once
  2. The block that uses it has only one statement, namely the call to count
  3. count and the block to CSV::open have the same signature

it’s for abstraction only. i wrote how to count before writing the csv
open
line. when i wrote it ended up with something like

CSV::open(path,“r”){|row| p row; count[row]}

during editing - as i always seem to for debugging :wink:

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it’s rare
that it
acutally ends up being the the only thing left as in this case - but
here you
are quite right that it can be compacted.

I think at a minimum, given 2) and 3), I’d just replace the block to
CSV::open with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”, &count)

Then, since count isn’t used anywhere else, I’d join those together:

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say -
what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

… count[row] …

is pretty clear. i often us variable as comments to others and myself.
eg.
what does this do:

password = “#{ sifname }_#{ eval( ((0…256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
) }”

hard to say huh?

how about this?

four_random_printable_chars = eval( ((0…256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
)
password = “#{ sifname }_#{ four_random_printable_chars }”

ugly (yes i’m hacking like crazy today) but at least anyone reading it
(most
importantly me) knows what i’m trying to do if not how!

anyhow - same goes with ‘count’: it’s all good until you start cutting
and
pasting - then you want vars not wicked expressions to move around.

Just seems a little clearer to me over having an extra one-time use lambda.

iff you are good at reading ruby :wink:

cheers.

-a


#6

On 2/2/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

On Fri, 3 Feb 2006, Jacob F. wrote:

I’m curious why you decided to make count its own lambda when:

  1. It’s only ever used once
  2. The block that uses it has only one statement, namely the call to count
  3. count and the block to CSV::open have the same signature

it’s for abstraction only.

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it’s rare that it
acutally ends up being the the only thing left as in this case - but here you
are quite right that it can be compacted.

Yeah, I agree. I often use similar abstraction techniques for
readability. My brain just has the tendency to refactor code inwards
as well as outwards when an abstraction seems extraneous.

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say - what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

… count[row] …

is pretty clear. i often us variable as comments to others and myself.

Again, agreed. In this case though I don’t think the abstraction of
naming sum[…] += 1 as count is a necessary one. If I were to
refactor part of the complex expression

sum[row.last.to_s[0,6]] += 1

to improve readability, it would be the index:

identifier_prefix = lambda{ |row| row.last.to_s[0,6] }
… sum[identifier_prefix[row]] += 1 …

what does this do:

password = “#{ sifname }_#{ eval( ((0…256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect ) }”

hard to say huh?

Ick, yes, I’d definitely split that into chunks. :slight_smile:

how about this?

four_random_printable_chars = eval( ((0…256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect )
password = “#{ sifname }_#{ four_random_printable_chars }”

ugly (yes i’m hacking like crazy today) but at least anyone reading it (most
importantly me) knows what i’m trying to do if not how!

If you say so… :wink:

Jacob F.


#7

Todd B. wrote:

I’m just getting my feet wet with Ruby and would like some advice on how you
“old-timers” would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

— input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
— end of input -----

— Using a hash of arrays:

require ‘csv’

h = Hash.new{ [] }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>[“123456”, “012345”]}
— end of output -----

— Using a hash of hashes:

require ‘csv’

h = Hash.new{|h,k| h[k] = {} }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last][ row.first[0,6] ] = 8 }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>{“012345”=>8, “123456”=>8}}
— end of output -----


#8

William J. wrote:

h = Hash.new{ [] }

I wonder how this works since the Hash never stores these arrays.

CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>[“123456”, “012345”]}
— end of output -----

Is this really the output of the script above?

robert

#9

Robert K. wrote:

There are two possible interpretations of what you state here:

  1. You want all values for row2 that occur more than once.

Just remembered that the file is sorted. Then this implementation of
case
1 is even more efficient as it does not store values in mem and works on
arbitrary large files:

require ‘csv’

last = nil
CSV::Reader.parse(ARGF) do |row|
last, k = row[1], last
puts k if last == k
end

Kind regards

robert

#10

Todd B. wrote:

I’m just getting my feet wet with Ruby and would like some advice on
how you “old-timers” would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values of
field2 that has more than 1 value for the 1st 6 characters of field 1.

There are two possible interpretations of what you state here:

  1. You want all values for row2 that occur more than once.

  2. You want all values for row2 that have more than one distinct row1
    value.

Implementations:

ad 1.

require ‘csv’

h = Hash.new(0)
CSV::Reader.parse(ARGF) {|row| h[row[1]] += 1}
h.each {|k,v| puts k if v > 1}

ad 2.

require ‘csv’
require ‘set’

h = Hash.new {|h,k| h[k] = Set.new}
CSV::Reader.parse(ARGF) {|row| h[row[1]] << row[0]}
h.each {|k,v| puts k if v.size > 1}

Note: CSV::Reader can use ARGF which makes it easy to read from stdin as
well as multiple files.

Kind regards

robert