 # Efficient Ruby equivalent to Excel's "SUMIF"?

I’m getting some relatively slow performance when doing calculations
against data, and I’m wondering whether there’s a more efficient way to
do this.

I’m taking a multidimensional array and then totalling a “column” based
on multiple criteria. It looks like a much longer version of this:

data = [
[ ‘Ref’, ‘Count’, ‘Locn’ ],
[ ‘Part1’, ‘3’, ‘Here’ ],
[ ‘Part2’, ‘1’, ‘Here’ ],
[ ‘Part1’, ‘2’, ‘There’ ],
]

The way I’m doing this at present is to take the main key: “Ref”
and unique it, then loop through all those values and select chunks of
the data which match. In order to accelerate the process I’m also
diminishing the source array each time so by the time I get to the end
the process is much faster.

Something like this:

#Make a 2D array of unique refs (+header) to store the output
data = [
[ ‘Ref’, ‘Count’, ‘Locn’ ],
[ ‘Part1’, ‘3’, ‘Here’ ],
[ ‘Part2’, ‘1’, ‘Here’ ],
[ ‘Part1’, ‘2’, ‘There’ ],
]

#Set up a 2D array to hold the output, with 1 of each Ref + a header
output = [ data.transpose.uniq ].transpose

output.map.with_index { |ar,idx|
[ ar, ‘Total’, ‘Here’, ‘There’ ]
else
#Get the current set to look at
current_data = data.select { |a| a =~ /#{ ar }/i }
#Remove it from the main pool
data -= current_data
#Populate the current row with totals
b = [
ar,
nil,
current_data.select { |a| a == ‘Here’ }.inject(0) { |sum,a| sum

• a.to_i },
current_data.select { |a| a == ‘There’ }.inject(0) { |sum,a|
sum + a.to_i }
]
b = b + b
#This one’s done
b
end
}

I’m doing this with around 9000 rows of data using over 2000 unique
references. The way I’m doing it at the moment seems quite slow. Is
there a faster way to do this?

I think part of the problem is you’re using arrays quite heavily. You
can
do it in one pass if you pick a Hash that you later convert to a more
appropriate structure, or simply define your data as a class and
instantiate relevant objects, etc.

Anyway, here’s an example:

data = [
[ ‘Ref’, ‘Count’, ‘Locn’ ],
[ ‘Part1’, ‘3’, ‘Here’ ],
[ ‘Part2’, ‘1’, ‘Here’ ],
[ ‘Part1’, ‘2’, ‘There’ ],
]

desired_output = [
[“Ref”, “Total”, “Here”, “There”],
[“Part1”, 5, 3, 2],
[“Part2”, 1, 1, 0]
]

# bump off the first element

references = {}

data.each do |reference, count, location|

# create a hash to keep track of totals and locations.

references[reference] ||= { :total => 0, :locations => {} }
r = references[reference]

# contribute to the total for this reference.

r[:total] += count.to_i

# above definition of references[reference]

r[:locations][location] ||= 0

# now contribute to the location’s count

r[:locations][location] += count.to_i
end

# structure and just use that.

Thanks Adam, that’s a great way of looking at this!
I’ve been using arrays quite a lot in Ruby since they’re so much easier
to use than in VBA. When your only tool is a hammer…
I knew the logical answer was some sort of incremental system, but I
hadn’t been able to visualise a way to implement it. Hashes look like a
great solution.

selects = [“Here”, “There”]

data = [
[ ‘Ref’, ‘Count’, ‘Locn’ ],
[ ‘Part1’, ‘3’, ‘Here’ ],
[ ‘Part2’, ‘1’, ‘Here’ ],
[ ‘Part1’, ‘2’, ‘There’ ],
]

data[1…-1].group_by(&:first).map {|k,d|
[k,d.inject(0){|s,v|s+v.to_i}] + selects.map {|s|
d.select{|v|v==s}.inject(0){|s,v|s+v.to_i} }
}

#=> [[“Part1”, 5, 3, 2], [“Part2”, 1, 1, 0]]

I incorporated a mixture of these suggestions into my own data
construct, and it completes the task in a couple of seconds now rather
than in minutes. I’ll have to see if there’s anywhere else I could use
this kind of approach. Thanks for the help 