MAIN QUESTION
Is there a nice and fast way to do set operations on arrays of
non-trivial objects.
BACKGROUND
Set operations are beatiful:
a = [1,2,3,4,5]
b = [2,4,6,8,10]
(a & b)
=> [2,4]
When comparing lists, which is more or less what administration of
pension schemes is all about, this is very, very nice.
I very often use stuff like
(a - (a&b))
=> [1,3,5]
The array library is blazingly fast, I’m really happy with it.
The challenge arises when my lists don’t comprise of fixnums, but
instead of
more specified objects. To simplify:
class Person
def initialize(name, ssid)
@name = name
@ssid = ssid
end
attr_reader :name, :ssid
end
list = []
list << Person.new(‘Peter Zapffe’, 1)
list << Person.new(‘Peter Pan’, 2)
list << Person.new(‘Saint Peter’, 3)
Let’s say I want to UNION that to (b = [2,4,6,8,10]) from above, using
ssid
as key.It should in that case return the ‘Peter Pan’-person, since he
is the
only one with an ssid included i the list.
(list & b)
=> []
I can do stuff like:
list_ssid = list.collect{|p| p.ssid}
=> [1,2,3]
union = list_ssid & b
=> [2]
list.select{|p| union.include? p.ssid}
=> [#<Person:0x2b8b300 @name=“Peter Pan”, @ssid=2>]
This works, and correctness is always nice, but it doesn’t scale very
nice,
basicly because union.include? scans the union-list from scratch
every time.
I have implemented this before, by sorting both lists and stepping
through
them one at a time. That’s still correct and much faster, but I have
a feeling