Forum: Ruby Seeking the Ruby way

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Todd B. (Guest)
on 2006-02-02 23:57
(Received via mailing list)
I'm just getting my feet wet with Ruby and would like some advice on how
you
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields
per
row, sorted on the second field. There may be multiple rows for field 2.
I
want to get a list of all of the unique values of field2 that has more
than
1 value for the 1st 6 characters of field 1.

Here's what I did:

require 'csv'

last_account_id = ''
last_adv_id = ''
parent_co_ids = []
cntr = 0
first = true
CSV::Reader.parse(File.open('e:\\tmp\\20060201\\bsa.csv', 'r')) do |row|
    if row[1] == last_account_id
        parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
    else
        if !first
            parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
            if parent_co_ids.size > 1
                puts "#{last_account_id} - (#{parent_co_ids.join(',')})"
                cntr = cntr + 1
            end
            parent_co_ids.clear
        else
            first = false
        end
    end
    last_account_id = row[1]
    last_adv_id = row[0]
end
puts "Found #{cntr} accounts with multiple parent companies"

Thanks in advance!

Todd B.
unknown (Guest)
on 2006-02-03 00:24
(Received via mailing list)
On Fri, 3 Feb 2006, Todd B. wrote:

> require 'csv'
>    else
>        end
>    end
>    last_account_id = row[1]
>    last_adv_id = row[0]
> end
> puts "Found #{cntr} accounts with multiple parent companies"
>
> Thanks in advance!
>
> Todd B.

   harp:~ > cat a.rb
   require "csv"
   require "yaml"

   path = ARGV.shift
   sum = Hash::new{|h,k| h[k] = 0}
   count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
   CSV::open(path,"r"){|row| count[row]}
   y sum.delete_if{|k,v| v == 1}



   harp:~ > cat in.csv
   0,aaaaaa___
   1,aaaaaa___
   2,aaabbb___
   3,aaabbb___
   4,aaabbb___
   5,aaaccc___




   harp:~ > ruby a.rb in.csv
   ---
   aaaaaa: 2
   aaabbb: 3


hth.  regards.

-a
Jacob F. (Guest)
on 2006-02-03 01:10
(Received via mailing list)
On 2/2/06, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
>    require "csv"
>    require "yaml"
>
>    path = ARGV.shift
>    sum = Hash::new{|h,k| h[k] = 0}
>    count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
>    CSV::open(path,"r"){|row| count[row]}
>    y sum.delete_if{|k,v| v == 1}

I'm curious why you decided to make `count` its own lambda when:

  1) It's only ever used once
  2) The block that uses it has only one statement, namely the call to
`count`
  3) count and the block to CSV::open have the same signature

I think at a minimum, given 2) and 3), I'd just replace the block to
CSV::open with count itself:

  count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
  CSV::open(path,"r", &count)

Then, since count isn't used anywhere else, I'd join those together:

  CSV::open(path,"r"){|row| sum[row.last.to_s[0,6]] += 1}

After those transformations:

  galadriel:~ lukfugl$ cat a.rb
  require "csv"
  require "yaml"

  path = ARGV.shift
  sum = Hash::new{|h,k| h[k] = 0}
  CSV::open(path,"r"){|row| sum[row.last.to_s[0,6]] += 1}
  y sum.delete_if{|k,v| v == 1}

  galadriel:~ lukfugl$ cat in.csv
  0,aaaaaa___
  1,aaaaaa___
  2,aaabbb___
  3,aaabbb___
  4,aaabbb___
  5,aaaccc___

  galadriel:~ lukfugl$ ruby a.rb in.csv
  ---
  aaaaaa: 2
  aaabbb: 3

Just seems a little clearer to me over having an extra one-time use
lambda.

Jacob F.
Joel VanderWerf (Guest)
on 2006-02-03 01:37
(Received via mailing list)
Jacob F. wrote:

>   sum = Hash::new{|h,k| h[k] = 0}

And for some reason, I tend to write

sum = Hash.new(0)

when dealing with an immediate value. (But maybe it's a better practice
to use Ara's form, so that if you ever replace 0 with, say, a matrix,
you don't reuse the same object for each key in the hash.)
unknown (Guest)
on 2006-02-03 01:58
(Received via mailing list)
On Fri, 3 Feb 2006, Jacob F. wrote:

> I'm curious why you decided to make `count` its own lambda when:
>
>  1) It's only ever used once
>  2) The block that uses it has only one statement, namely the call to `count`
>  3) count and the block to CSV::open have the same signature

it's for abstraction only.  i wrote how to count before writing the csv
open
line.  when i wrote it ended up with something like

   CSV::open(path,"r"){|row| p row; count[row]}

during editing - as i always seem to for debugging ;-)

basically i find

   {{{{}}}}

tough to read sometimes and factor out things using lambda.  it's rare
that it
acutally ends up being the the only thing left as in this case - but
here you
are quite right that it can be compacted.

> I think at a minimum, given 2) and 3), I'd just replace the block to
> CSV::open with count itself:

>
>  count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
>  CSV::open(path,"r", &count)
>
> Then, since count isn't used anywhere else, I'd join those together:
>
>  CSV::open(path,"r"){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here.  people, esp nubies will look at that and say -
what?
whereas reading

   count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

   ... count[row] ...

is pretty clear.  i often us variable as comments to others and myself.
eg.
what does this do:

   password = "#{ sifname }_#{ eval( ((0...256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
) }"

hard to say huh?

how about this?

   four_random_printable_chars = eval( ((0...256).to_a.map{|c|
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect
)
   password = "#{ sifname }_#{ four_random_printable_chars }"

ugly (yes i'm hacking like crazy today) but at least anyone reading it
(most
importantly me) knows what i'm trying to do if not how!

anyhow - same goes with 'count': it's all good until you start cutting
and
pasting - then you want vars not wicked expressions to move around.

> Just seems a little clearer to me over having an extra one-time use lambda.

__iff__ you are good at reading ruby ;-)

cheers.

-a
Jacob F. (Guest)
on 2006-02-03 02:19
(Received via mailing list)
On 2/2/06, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
> On Fri, 3 Feb 2006, Jacob F. wrote:
> > I'm curious why you decided to make `count` its own lambda when:
> >
> >  1) It's only ever used once
> >  2) The block that uses it has only one statement, namely the call to `count`
> >  3) count and the block to CSV::open have the same signature
>
> it's for abstraction only.

<snip>

> basically i find
>
>    {{{{}}}}
>
> tough to read sometimes and factor out things using lambda.  it's rare that it
> acutally ends up being the the only thing left as in this case - but here you
> are quite right that it can be compacted.

Yeah, I agree. I often use similar abstraction techniques for
readability. My brain just has the tendency to refactor code inwards
as well as outwards when an abstraction seems extraneous.

> >  CSV::open(path,"r"){|row| sum[row.last.to_s[0,6]] += 1}
>
> but i disagree here.  people, esp nubies will look at that and say - what?
> whereas reading
>
>    count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
>
>    ... count[row] ...
>
> is pretty clear.  i often us variable as comments to others and myself.

Again, agreed. In this case though I don't think the abstraction of
naming sum[...] += 1 as count is a necessary one. If I were to
refactor part of the complex expression

  sum[row.last.to_s[0,6]] += 1

to improve readability, it would be the index:

  identifier_prefix = lambda{ |row| row.last.to_s[0,6] }
  ... sum[identifier_prefix[row]] += 1 ...

> what does this do:
>
>    password = "#{ sifname }_#{ eval( ((0...256).to_a.map{|c| 
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect ) }"
>
> hard to say huh?

Ick, yes, I'd definitely split that into chunks. :)

> how about this?
>
>    four_random_printable_chars = eval( ((0...256).to_a.map{|c| 
c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect )
>    password = "#{ sifname }_#{ four_random_printable_chars }"
>
> ugly (yes i'm hacking like crazy today) but at least anyone reading it (most
> importantly me) knows what i'm trying to do if not how!

If you say so... ;)

Jacob F.
William J. (Guest)
on 2006-02-03 04:41
(Received via mailing list)
Todd B. wrote:
> I'm just getting my feet wet with Ruby and would like some advice on how you
> "old-timers" would write the following script using Ruby idioms.
>
> The intent of the script is to parse a CSV file that contains 2 fields per
> row, sorted on the second field. There may be multiple rows for field 2. I
> want to get a list of all of the unique values of field2 that has more than
> 1 value for the 1st 6 characters of field 1.

---  input data  -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
---  end of input  -----

---  Using a hash of arrays:

require 'csv'

h = Hash.new{ [] }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
  h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

---  output  -----
{"909"=>["123456", "012345"]}
---  end of output  -----


---  Using a hash of hashes:

require 'csv'

h = Hash.new{|h,k| h[k] = {} }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
  h[row.last][ row.first[0,6] ] = 8 }
p h.delete_if{|k,v| v.size == 1 }

---  output  -----
{"909"=>{"012345"=>8, "123456"=>8}}
---  end of output  -----
Robert K. (Guest)
on 2006-02-03 12:19
(Received via mailing list)
William J. wrote:
>
>
> h = Hash.new{ [] }

I wonder how this works since the Hash never stores these arrays.

> CSV::Reader.parse(File.open( ARGV.first )) { |row|
>   h[row.last] |= [ row.first[0,6] ] }
> p h.delete_if{|k,v| v.size == 1 }
>
> ---  output  -----
> {"909"=>["123456", "012345"]}
> ---  end of output  -----

Is this really the output of the script above?

    robert
Robert K. (Guest)
on 2006-02-03 12:25
(Received via mailing list)
Todd B. wrote:
> I'm just getting my feet wet with Ruby and would like some advice on
> how you "old-timers" would write the following script using Ruby
> idioms.
>
> The intent of the script is to parse a CSV file that contains 2
> fields per row, sorted on the second field. There may be multiple
> rows for field 2. I want to get a list of all of the unique values of
> field2 that has more than 1 value for the 1st 6 characters of field 1.

There are two possible interpretations of what you state here:

1. You want all values for row2 that occur more than once.

2. You want all values for row2 that have more than one distinct row1
value.

Implementations:

ad 1.

require 'csv'

h = Hash.new(0)
CSV::Reader.parse(ARGF) {|row| h[row[1]] += 1}
h.each {|k,v| puts k if v > 1}


ad 2.

require 'csv'
require 'set'

h = Hash.new {|h,k| h[k] = Set.new}
CSV::Reader.parse(ARGF) {|row| h[row[1]] << row[0]}
h.each {|k,v| puts k if v.size > 1}

Note: CSV::Reader can use ARGF which makes it easy to read from stdin as
well as multiple files.

Kind regards

    robert
Robert K. (Guest)
on 2006-02-03 12:40
(Received via mailing list)
Robert K. wrote:
>
> There are two possible interpretations of what you state here:
>
> 1. You want all values for row2 that occur more than once.

Just remembered that the file is sorted.  Then this implementation of
case
1 is even more efficient as it does not store values in mem and works on
arbitrary large files:

require 'csv'

last = nil
CSV::Reader.parse(ARGF) do |row|
  last, k = row[1], last
  puts k if last == k
end

Kind regards

    robert
This topic is locked and can not be replied to.