Forum: Ruby Efficient Ruby equivalent to Excel's "SUMIF"?

Posted by Joel Pearson (virtuoso)
on 2013-03-04 15:21
I'm getting some relatively slow performance when doing calculations
against data, and I'm wondering whether there's a more efficient way to
do this.

I'm taking a multidimensional array and then totalling a "column" based
on multiple criteria. It looks like a much longer version of this:

data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

The way I'm doing this at present is to take the main key: "Ref"
and unique it, then loop through all those values and select chunks of
the data which match. In order to accelerate the process I'm also
diminishing the source array each time so by the time I get to the end
the process is much faster.

Something like this:
_________________
#Make a 2D array of unique refs (+header) to store the output
data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

#Set up a 2D array to hold the output, with 1 of each Ref + a header
output = [ data.transpose[0].uniq ].transpose

output.map.with_index { |ar,idx|
  if idx == 0 #Headers
    [ ar[0], 'Total', 'Here', 'There' ]
  else
    #Get the current set to look at
    current_data = data.select { |a| a[0] =~ /#{ ar[0] }/i }
    #Remove it from the main pool
    data -= current_data
    #Populate the current row with totals
    b = [
      ar[0],
      nil,
      current_data.select { |a| a[2] == 'Here' }.inject(0) { |sum,a| sum
+ a[1].to_i },
      current_data.select { |a| a[2] == 'There' }.inject(0) { |sum,a|
sum + a[1].to_i }
    ]
    #Add a total
    b[1] = b[2] + b[3]
    #This one's done
    b
  end
}
_________________

I'm doing this with around 9000 rows of data using over 2000 unique
references. The way I'm doing it at the moment seems quite slow. Is
there a faster way to do this?

Thanks for reading this.
Posted by Adam Prescott (Guest)
on 2013-03-04 15:43
(Received via mailing list)
I think part of the problem is you're using arrays quite heavily. You 
can
do it in one pass if you pick a Hash that you later convert to a more
appropriate structure, or simply define your data as a class and
instantiate relevant objects, etc.

Anyway, here's an example:


data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]

desired_output = [
  ["Ref", "Total", "Here", "There"],
  ["Part1", 5, 3, 2],
  ["Part2", 1, 1, 0]
]

# bump off the first element
headers = data.shift

references = {}

data.each do |reference, count, location|
  # if the references hash hasn't "seen" this reference yet,
  # create a hash to keep track of totals and locations.
  references[reference] ||= { :total => 0, :locations => {} }
  r = references[reference]

  # contribute to the total for this reference.
  r[:total] += count.to_i

  # if this location hasn't been seen for this reference yet,
  # start it off at zero. could also use Hash.new(0) in the
  # above definition of references[reference]
  r[:locations][location] ||= 0

  # now contribute to the location's count
  r[:locations][location] += count.to_i
end

# references now looks something like:
#
# {
#  "Part1" => { :total      => 5,
#               :locations => { "Here" => 3, "There" => 2 } },
#  "Part2" => { :total      => 1,
#               :locations  => { "Here" => 1 } }
# }

# This corresponds to desired_output, just in a different way.
#
# You can convert this to the original desired array if you wish, by
# iterating over references, or you can pick a more appropriate Hash
# structure and just use that.
Posted by Love U Ruby (my-ruby)
on 2013-03-04 16:09
Nice thread it is.
Posted by Joel Pearson (virtuoso)
on 2013-03-04 16:09
Thanks Adam, that's a great way of looking at this!
I've been using arrays quite a lot in Ruby since they're so much easier 
to use than in VBA. When your only tool is a hammer...
I knew the logical answer was some sort of incremental system, but I 
hadn't been able to visualise a way to implement it. Hashes look like a 
great solution.
Posted by Hans Mackowiak (hanmac)
on 2013-03-04 16:12
i made you this:


selects = ["Here", "There"]

data = [
  [ 'Ref', 'Count', 'Locn' ],
  [ 'Part1', '3', 'Here' ],
  [ 'Part2', '1', 'Here' ],
  [ 'Part1', '2', 'There' ],
]


data[1..-1].group_by(&:first).map {|k,d|
[k,d.inject(0){|s,v|s+v[1].to_i}] + selects.map {|s| 
d.select{|v|v[2]==s}.inject(0){|s,v|s+v[1].to_i} }
}

#=> [["Part1", 5, 3, 2], ["Part2", 1, 1, 0]]
Posted by Joel Pearson (virtuoso)
on 2013-03-04 16:24
Ooh, pretty. Nice one-liner Hans. I can just about follow it.
I hadn't seen "group_by" before. Looks like hashes are definitely the 
right direction to be working towards for this.
Posted by Joel Pearson (virtuoso)
on 2013-03-04 17:01
I incorporated a mixture of these suggestions into my own data 
construct, and it completes the task in a couple of seconds now rather 
than in minutes. I'll have to see if there's anywhere else I could use 
this kind of approach. Thanks for the help :)
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.