Seeking the Ruby way

I’m just getting my feet wet with Ruby and would like some advice on how
you
“old-timers” would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields
per
row, sorted on the second field. There may be multiple rows for field 2.
I
want to get a list of all of the unique values of field2 that has more
than
1 value for the 1st 6 characters of field 1.

Here’s what I did:

require ‘csv’

last_account_id = ‘’
last_adv_id = ‘’
parent_co_ids = []
cntr = 0
first = true
CSV::Reader.parse(File.open(‘e:\tmp\20060201\bsa.csv’, ‘r’)) do |row|
if row[1] == last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts “#{last_account_id} - (#{parent_co_ids.join(’,’)})”
cntr = cntr + 1
end
parent_co_ids.clear
else
first = false
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts “Found #{cntr} accounts with multiple parent companies”

Thanks in advance!

Todd B.

On Fri, 3 Feb 2006, Todd B. wrote:

require ‘csv’
else
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts “Found #{cntr} accounts with multiple parent companies”

Thanks in advance!

Todd B.

harp:~ > cat a.rb
require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

harp:~ > cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

harp:~ > ruby a.rb in.csv

aaaaaa: 2
aaabbb: 3

hth. regards.

-a

On 2/2/06, [email protected] [email protected] wrote:

require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

I’m curious why you decided to make count its own lambda when:

  1. It’s only ever used once
  2. The block that uses it has only one statement, namely the call to
    count
  3. count and the block to CSV::open have the same signature

I think at a minimum, given 2) and 3), I’d just replace the block to
CSV::open with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”, &count)

Then, since count isn’t used anywhere else, I’d join those together:

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

After those transformations:

galadriel:~ lukfugl$ cat a.rb
require “csv”
require “yaml”

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}
y sum.delete_if{|k,v| v == 1}

galadriel:~ lukfugl$ cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

galadriel:~ lukfugl$ ruby a.rb in.csv

aaaaaa: 2
aaabbb: 3

Just seems a little clearer to me over having an extra one-time use
lambda.

Jacob F.

Jacob F. wrote:

sum = Hash::new{|h,k| h[k] = 0}

And for some reason, I tend to write

sum = Hash.new(0)

when dealing with an immediate value. (But maybe it’s a better practice
to use Ara’s form, so that if you ever replace 0 with, say, a matrix,
you don’t reuse the same object for each key in the hash.)

On Fri, 3 Feb 2006, Jacob F. wrote:

I’m curious why you decided to make count its own lambda when:

  1. It’s only ever used once
  2. The block that uses it has only one statement, namely the call to count
  3. count and the block to CSV::open have the same signature

it’s for abstraction only. i wrote how to count before writing the csv
open
line. when i wrote it ended up with something like

CSV::open(path,“r”){|row| p row; count[row]}

during editing - as i always seem to for debugging :wink:

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it’s rare
that it
acutally ends up being the the only thing left as in this case - but
here you
are quite right that it can be compacted.

I think at a minimum, given 2) and 3), I’d just replace the block to
CSV::open with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::open(path,“r”, &count)

Then, since count isn’t used anywhere else, I’d join those together:

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say -
what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

… count[row] …

is pretty clear. i often us variable as comments to others and myself.
eg.
what does this do:

password = “#{ sifname }_#{ eval( ((0…256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
) }”

hard to say huh?

how about this?

four_random_printable_chars = eval( ((0…256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
)
password = “#{ sifname }_#{ four_random_printable_chars }”

ugly (yes i’m hacking like crazy today) but at least anyone reading it
(most
importantly me) knows what i’m trying to do if not how!

anyhow - same goes with ‘count’: it’s all good until you start cutting
and
pasting - then you want vars not wicked expressions to move around.

Just seems a little clearer to me over having an extra one-time use lambda.

iff you are good at reading ruby :wink:

cheers.

-a

On 2/2/06, [email protected] [email protected] wrote:

On Fri, 3 Feb 2006, Jacob F. wrote:

I’m curious why you decided to make count its own lambda when:

  1. It’s only ever used once
  2. The block that uses it has only one statement, namely the call to count
  3. count and the block to CSV::open have the same signature

it’s for abstraction only.

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it’s rare that it
acutally ends up being the the only thing left as in this case - but here you
are quite right that it can be compacted.

Yeah, I agree. I often use similar abstraction techniques for
readability. My brain just has the tendency to refactor code inwards
as well as outwards when an abstraction seems extraneous.

CSV::open(path,“r”){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say - what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

… count[row] …

is pretty clear. i often us variable as comments to others and myself.

Again, agreed. In this case though I don’t think the abstraction of
naming sum[…] += 1 as count is a necessary one. If I were to
refactor part of the complex expression

sum[row.last.to_s[0,6]] += 1

to improve readability, it would be the index:

identifier_prefix = lambda{ |row| row.last.to_s[0,6] }
… sum[identifier_prefix[row]] += 1 …

what does this do:

password = “#{ sifname }_#{ eval( ((0…256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect ) }”

hard to say huh?

Ick, yes, I’d definitely split that into chunks. :slight_smile:

how about this?

four_random_printable_chars = eval( ((0…256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect )
password = “#{ sifname }_#{ four_random_printable_chars }”

ugly (yes i’m hacking like crazy today) but at least anyone reading it (most
importantly me) knows what i’m trying to do if not how!

If you say so… :wink:

Jacob F.

Todd B. wrote:

I’m just getting my feet wet with Ruby and would like some advice on how you
“old-timers” would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

— input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
— end of input -----

— Using a hash of arrays:

require ‘csv’

h = Hash.new{ [] }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>[“123456”, “012345”]}
— end of output -----

— Using a hash of hashes:

require ‘csv’

h = Hash.new{|h,k| h[k] = {} }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last][ row.first[0,6] ] = 8 }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>{“012345”=>8, “123456”=>8}}
— end of output -----

William J. wrote:

h = Hash.new{ [] }

I wonder how this works since the Hash never stores these arrays.

CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

— output -----
{“909”=>[“123456”, “012345”]}
— end of output -----

Is this really the output of the script above?

robert

Robert K. wrote:

There are two possible interpretations of what you state here:

  1. You want all values for row2 that occur more than once.

Just remembered that the file is sorted. Then this implementation of
case
1 is even more efficient as it does not store values in mem and works on
arbitrary large files:

require ‘csv’

last = nil
CSV::Reader.parse(ARGF) do |row|
last, k = row[1], last
puts k if last == k
end

Kind regards

robert

Todd B. wrote:

I’m just getting my feet wet with Ruby and would like some advice on
how you “old-timers” would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values of
field2 that has more than 1 value for the 1st 6 characters of field 1.

There are two possible interpretations of what you state here:

  1. You want all values for row2 that occur more than once.

  2. You want all values for row2 that have more than one distinct row1
    value.

Implementations:

ad 1.

require ‘csv’

h = Hash.new(0)
CSV::Reader.parse(ARGF) {|row| h[row[1]] += 1}
h.each {|k,v| puts k if v > 1}

ad 2.

require ‘csv’
require ‘set’

h = Hash.new {|h,k| h[k] = Set.new}
CSV::Reader.parse(ARGF) {|row| h[row[1]] << row[0]}
h.each {|k,v| puts k if v.size > 1}

Note: CSV::Reader can use ARGF which makes it easy to read from stdin as
well as multiple files.

Kind regards

robert

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs