Set operations on arrays of non-trivial objects

MAIN QUESTION

Is there a nice and fast way to do set operations on arrays of

non-trivial objects.

BACKGROUND

Set operations are beatiful:

a = [1,2,3,4,5]
b = [2,4,6,8,10]

(a & b)

=> [2,4]

When comparing lists, which is more or less what administration of

pension schemes is all about, this is very, very nice.

I very often use stuff like

(a - (a&b))

=> [1,3,5]

The array library is blazingly fast, I’m really happy with it.

The challenge arises when my lists don’t comprise of fixnums, but

instead of

more specified objects. To simplify:

class Person
def initialize(name, ssid)
@name = name
@ssid = ssid
end

attr_reader :name, :ssid
end

list = []
list << Person.new(‘Peter Zapffe’, 1)
list << Person.new(‘Peter Pan’, 2)
list << Person.new(‘Saint Peter’, 3)

Let’s say I want to UNION that to (b = [2,4,6,8,10]) from above, using

ssid

as key.It should in that case return the ‘Peter Pan’-person, since he

is the

only one with an ssid included i the list.

(list & b)

=> []

I can do stuff like:

list_ssid = list.collect{|p| p.ssid}

=> [1,2,3]

union = list_ssid & b

=> [2]

list.select{|p| union.include? p.ssid}

=> [#<Person:0x2b8b300 @name=“Peter Pan”, @ssid=2>]

This works, and correctness is always nice, but it doesn’t scale very

nice,

basicly because union.include? scans the union-list from scratch

every time.

I have implemented this before, by sorting both lists and stepping

through

them one at a time. That’s still correct and much faster, but I have

a feeling

it’s possible to do it in a much more elegant way, using some sort of

set-operations.

I would appreciate any hints and or pointers.

Thank’s for reading.

JE

Robert

Thank you, I will heed your advice. Set is a nice library, and it’s
use of Hash for blazingly fast lookup scales very well.

On 07.03.2007 08:56, Jon Egil Stand wrote:

a = [1,2,3,4,5]
(a - (a&b))
class Person
list << Person.new(‘Peter Pan’, 2)

I can do stuff like:

list_ssid = list.collect{|p| p.ssid}

=> [1,2,3]

union = list_ssid & b

=> [2]

list.select{|p| union.include? p.ssid}

=> [#<Person:0x2b8b300 @name=“Peter Pan”, @ssid=2>]

You can do this simpler:

list.select {|p| b.include? p.ssid}

Note, make b a Set for more efficiency (see below).

set-operations.

I would appreciate any hints and or pointers.

Thank’s for reading.

JE

First, when you work with sets you should probably use Set - if your
sets get large you will notice the speed difference.

irb(main):002:0> a = [1,2,3,4,5].to_set
=> #<Set: {5, 1, 2, 3, 4}>
irb(main):003:0> b = [2,4,6,8,10].to_set
=> #<Set: {6, 2, 8, 4, 10}>
irb(main):004:0> a & b
=> #<Set: {2, 4}>

Secondly, it seems you’re basically creating subsets based on some
criteria. If you just had a single key, you could define your class’s
equivalence (#hash, #eql?, #==) to reflect that and use the approach you
tried, i.e. using set unions.

The ad hoc solution is to actually use #select as you did. But if the
criteria change, you rather need a structure that resembles an RDBMS
table which can have any number of indexes. If you have to do this
multiple times probably also with varying criteria you might want to
create a wrapper around a Set that can support arbitrary number of
indexes (Hashes) and keeps those consistent much like an RDBMS does.

You might find some more information in the archives of this list.

Kind regards

robert