I’m getting some relatively slow performance when doing calculations
against data, and I’m wondering whether there’s a more efficient way to
do this.
I’m taking a multidimensional array and then totalling a “column” based
on multiple criteria. It looks like a much longer version of this:
data = [
[ ‘Ref’, ‘Count’, ‘Locn’ ],
[ ‘Part1’, ‘3’, ‘Here’ ],
[ ‘Part2’, ‘1’, ‘Here’ ],
[ ‘Part1’, ‘2’, ‘There’ ],
]
The way I’m doing this at present is to take the main key: “Ref”
and unique it, then loop through all those values and select chunks of
the data which match. In order to accelerate the process I’m also
diminishing the source array each time so by the time I get to the end
the process is much faster.
Something like this:
#Make a 2D array of unique refs (+header) to store the output
data = [
[ ‘Ref’, ‘Count’, ‘Locn’ ],
[ ‘Part1’, ‘3’, ‘Here’ ],
[ ‘Part2’, ‘1’, ‘Here’ ],
[ ‘Part1’, ‘2’, ‘There’ ],
]
#Set up a 2D array to hold the output, with 1 of each Ref + a header
output = [ data.transpose[0].uniq ].transpose
output.map.with_index { |ar,idx|
if idx == 0 #Headers
[ ar[0], ‘Total’, ‘Here’, ‘There’ ]
else
#Get the current set to look at
current_data = data.select { |a| a[0] =~ /#{ ar[0] }/i }
#Remove it from the main pool
data -= current_data
#Populate the current row with totals
b = [
ar[0],
nil,
current_data.select { |a| a[2] == ‘Here’ }.inject(0) { |sum,a| sum
- a[1].to_i },
current_data.select { |a| a[2] == ‘There’ }.inject(0) { |sum,a|
sum + a[1].to_i }
]
#Add a total
b[1] = b[2] + b[3]
#This one’s done
b
end
}
I’m doing this with around 9000 rows of data using over 2000 unique
references. The way I’m doing it at the moment seems quite slow. Is
there a faster way to do this?
Thanks for reading this.