Seeking the Ruby way

toadkicker · February 2, 2006, 10:57pm

I’m just getting my feet wet with Ruby and would like some advice on how
you
“old-timers” would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields
per
row, sorted on the second field. There may be multiple rows for field 2.
I
want to get a list of all of the unique values of field2 that has more
than
1 value for the 1st 6 characters of field 1.

Here’s what I did:

require ‘csv’

last_account_id = ‘’
last_adv_id = ‘’
parent_co_ids = []
cntr = 0
first = true
CSV::Reader.parse(File.open(‘e:\tmp\20060201\bsa.csv’, ‘r’)) do |row|
if row[1] == last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts “#{last_account_id} - (#{parent_co_ids.join(’,’)})”
cntr = cntr + 1
end
parent_co_ids.clear
else
first = false
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts “Found #{cntr} accounts with multiple parent companies”

Thanks in advance!

Todd B.

toadkicker · February 2, 2006, 11:24pm

On Fri, 3 Feb 2006, Todd B. wrote:

require ‘csv’
else
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts “Found #{cntr} accounts with multiple parent companies”

Thanks in advance!

Todd B.

harp:~ > cat a.rb
require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

harp:~ > cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

harp:~ > ruby a.rb in.csv

aaaaaa: 2
aaabbb: 3

hth. regards.

-a

toadkicker · February 3, 2006, 12:10am

On 2/2/06, [email protected] [email protected] wrote:

require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

I’m curious why you decided to make count its own lambda when:

It’s only ever used once
The block that uses it has only one statement, namely the call to
count
count and the block to CSV::open have the same signature

I think at a minimum, given 2) and 3), I’d just replace the block to
CSV::open with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”, &count)

Then, since count isn’t used anywhere else, I’d join those together:

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

After those transformations:

galadriel:~ lukfugl$ cat a.rb
require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}
y sum.delete_if{|k,v| v == 1}

galadriel:~ lukfugl$ cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

galadriel:~ lukfugl$ ruby a.rb in.csv

aaaaaa: 2
aaabbb: 3

Just seems a little clearer to me over having an extra one-time use
lambda.

Jacob F.

toadkicker · February 3, 2006, 12:37am

Jacob F. wrote:

sum = Hash::new{|h,k| h[k] = 0}

And for some reason, I tend to write

sum = Hash.new(0)

when dealing with an immediate value. (But maybe it’s a better practice
to use Ara’s form, so that if you ever replace 0 with, say, a matrix,
you don’t reuse the same object for each key in the hash.)

toadkicker · February 3, 2006, 12:58am

On Fri, 3 Feb 2006, Jacob F. wrote:

I’m curious why you decided to make count its own lambda when:

It’s only ever used once

The block that uses it has only one statement, namely the call to count

count and the block to CSV::open have the same signature

it’s for abstraction only. i wrote how to count before writing the csv
open
line. when i wrote it ended up with something like

CSV::open(path,“r”){|row| p row; count[row]}

during editing - as i always seem to for debugging

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it’s rare
that it
acutally ends up being the the only thing left as in this case - but
here you
are quite right that it can be compacted.

I think at a minimum, given 2) and 3), I’d just replace the block to
CSV::open with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”, &count)

Then, since count isn’t used anywhere else, I’d join those together:

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say -
what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

… count[row] …

is pretty clear. i often us variable as comments to others and myself.
eg.
what does this do:

password = “#{ sifname }_#{ eval( ((0…256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
) }”

hard to say huh?

how about this?

four_random_printable_chars = eval( ((0…256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
)
password = “#{ sifname }_#{ four_random_printable_chars }”

ugly (yes i’m hacking like crazy today) but at least anyone reading it
(most
importantly me) knows what i’m trying to do if not how!

anyhow - same goes with ‘count’: it’s all good until you start cutting
and
pasting - then you want vars not wicked expressions to move around.

Just seems a little clearer to me over having an extra one-time use lambda.

iff you are good at reading ruby

cheers.

-a

toadkicker · February 3, 2006, 1:19am

On 2/2/06, [email protected] [email protected] wrote:

On Fri, 3 Feb 2006, Jacob F. wrote:

I’m curious why you decided to make count its own lambda when:

It’s only ever used once

The block that uses it has only one statement, namely the call to count

count and the block to CSV::open have the same signature

it’s for abstraction only.

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it’s rare that it
acutally ends up being the the only thing left as in this case - but here you
are quite right that it can be compacted.

Yeah, I agree. I often use similar abstraction techniques for
readability. My brain just has the tendency to refactor code inwards
as well as outwards when an abstraction seems extraneous.

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say - what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

… count[row] …

is pretty clear. i often us variable as comments to others and myself.

Again, agreed. In this case though I don’t think the abstraction of
naming sum[…] += 1 as count is a necessary one. If I were to
refactor part of the complex expression

sum[row.last.to_s[0,6]] += 1

to improve readability, it would be the index:

identifier_prefix = lambda{ |row| row.last.to_s[0,6] }
… sum[identifier_prefix[row]] += 1 …

what does this do:

password = “#{ sifname }_#{ eval( ((0…256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect ) }”

hard to say huh?

Ick, yes, I’d definitely split that into chunks.

how about this?

four_random_printable_chars = eval( ((0…256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect )
password = “#{ sifname }_#{ four_random_printable_chars }”

ugly (yes i’m hacking like crazy today) but at least anyone reading it (most
importantly me) knows what i’m trying to do if not how!

If you say so…

Jacob F.

toadkicker · February 3, 2006, 3:41am

Todd B. wrote:

I’m just getting my feet wet with Ruby and would like some advice on how you
“old-timers” would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

— input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
— end of input -----

— Using a hash of arrays:

require ‘csv’

h = Hash.new{ [] }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>[“123456”, “012345”]}
— end of output -----

— Using a hash of hashes:

require ‘csv’

h = Hash.new{|h,k| h[k] = {} }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last][ row.first[0,6] ] = 8 }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>{“012345”=>8, “123456”=>8}}
— end of output -----

toadkicker · February 3, 2006, 11:19am

William J. wrote:

h = Hash.new{ [] }

I wonder how this works since the Hash never stores these arrays.

CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>[“123456”, “012345”]}
— end of output -----

Is this really the output of the script above?

robert

toadkicker · February 3, 2006, 11:40am

Robert K. wrote:

There are two possible interpretations of what you state here:

You want all values for row2 that occur more than once.

Just remembered that the file is sorted. Then this implementation of
case
1 is even more efficient as it does not store values in mem and works on
arbitrary large files:

require ‘csv’

last = nil
CSV::Reader.parse(ARGF) do |row|
last, k = row[1], last
puts k if last == k
end

Kind regards

robert

toadkicker · February 3, 2006, 11:25am

Todd B. wrote:

I’m just getting my feet wet with Ruby and would like some advice on
how you “old-timers” would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values of
field2 that has more than 1 value for the 1st 6 characters of field 1.

There are two possible interpretations of what you state here:

You want all values for row2 that occur more than once.
You want all values for row2 that have more than one distinct row1
value.

Implementations:

ad 1.

require ‘csv’

h = Hash.new(0)
CSV::Reader.parse(ARGF) {|row| h[row[1]] += 1}
h.each {|k,v| puts k if v > 1}

ad 2.

require ‘csv’
require ‘set’

h = Hash.new {|h,k| h[k] = Set.new}
CSV::Reader.parse(ARGF) {|row| h[row[1]] << row[0]}
h.each {|k,v| puts k if v.size > 1}

Note: CSV::Reader can use ARGF which makes it easy to read from stdin as
well as multiple files.

Kind regards

robert