I’m getting some relatively slow performance when doing calculations

against data, and I’m wondering whether there’s a more efficient way to

do this.

I’m taking a multidimensional array and then totalling a “column” based

on multiple criteria. It looks like a much longer version of this:

data = [

[ ‘Ref’, ‘Count’, ‘Locn’ ],

[ ‘Part1’, ‘3’, ‘Here’ ],

[ ‘Part2’, ‘1’, ‘Here’ ],

[ ‘Part1’, ‘2’, ‘There’ ],

]

The way I’m doing this at present is to take the main key: “Ref”

and unique it, then loop through all those values and select chunks of

the data which match. In order to accelerate the process I’m also

diminishing the source array each time so by the time I get to the end

the process is much faster.

Something like this:

#Make a 2D array of unique refs (+header) to store the output

data = [

[ ‘Ref’, ‘Count’, ‘Locn’ ],

[ ‘Part1’, ‘3’, ‘Here’ ],

[ ‘Part2’, ‘1’, ‘Here’ ],

[ ‘Part1’, ‘2’, ‘There’ ],

]

#Set up a 2D array to hold the output, with 1 of each Ref + a header

output = [ data.transpose[0].uniq ].transpose

output.map.with_index { |ar,idx|

if idx == 0 #Headers

[ ar[0], ‘Total’, ‘Here’, ‘There’ ]

else

#Get the current set to look at

current_data = data.select { |a| a[0] =~ /#{ ar[0] }/i }

#Remove it from the main pool

data -= current_data

#Populate the current row with totals

b = [

ar[0],

nil,

current_data.select { |a| a[2] == ‘Here’ }.inject(0) { |sum,a| sum

- a[1].to_i },

current_data.select { |a| a[2] == ‘There’ }.inject(0) { |sum,a|

sum + a[1].to_i }

]

#Add a total

b[1] = b[2] + b[3]

#This one’s done

b

end

}

I’m doing this with around 9000 rows of data using over 2000 unique

references. The way I’m doing it at the moment seems quite slow. Is

there a faster way to do this?

Thanks for reading this.