Forum: Ruby Set operations on arrays of non-trivial objects

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Jon Egil Stand (Guest)
on 2007-03-07 09:57
(Received via mailing list)
# MAIN QUESTION

# Is there a nice and fast way to do set operations on arrays of
# non-trivial objects.


# BACKGROUND

# Set operations are beatiful:

a = [1,2,3,4,5]
b = [2,4,6,8,10]

(a & b)
# => [2,4]


# When comparing lists, which is more or less what administration of
# pension schemes is all about, this is very, very nice.

# I very often use stuff like
(a - (a&b))

# => [1,3,5]

# The array library is blazingly fast, I'm really happy with it.


# The challenge arises when my lists don't comprise of fixnums, but
instead of
# more specified objects. To simplify:

class Person
  def initialize(name, ssid)
    @name = name
    @ssid = ssid
  end

  attr_reader :name, :ssid
end

list = []
list << Person.new('Peter Zapffe', 1)
list << Person.new('Peter Pan', 2)
list << Person.new('Saint Peter', 3)

# Let's say I want to UNION that to (b = [2,4,6,8,10]) from above, using
ssid
# as key.It should in that case return the 'Peter Pan'-person, since he
is the
# only one with an ssid included i the list.

(list & b)
# => []

# I can do stuff like:

list_ssid = list.collect{|p| p.ssid}
# => [1,2,3]

union = list_ssid & b
# => [2]

list.select{|p| union.include? p.ssid}
# => [#<Person:0x2b8b300 @name="Peter Pan", @ssid=2>]


# This works, and correctness is always nice, but it doesn't scale very
nice,
# basicly because  union.include? scans the union-list from scratch
every time.

# I have implemented this before, by sorting both lists and stepping
through
# them one at a time. That's still correct and much faster, but I have
a feeling
# it's possible to do it in a much more elegant way, using some sort of
# set-operations.

# I would appreciate any hints and or pointers.

# Thank's for reading.
# JE
Robert K. (Guest)
on 2007-03-07 10:30
(Received via mailing list)
On 07.03.2007 08:56, Jon Egil Stand wrote:
> a = [1,2,3,4,5]
> (a - (a&b))
> class Person
> list << Person.new('Peter Pan', 2)
>
> # I can do stuff like:
>
> list_ssid = list.collect{|p| p.ssid}
> # => [1,2,3]
>
> union = list_ssid & b
> # => [2]
>
> list.select{|p| union.include? p.ssid}
> # => [#<Person:0x2b8b300 @name="Peter Pan", @ssid=2>]

You can do this simpler:

list.select {|p| b.include? p.ssid}

Note, make b a Set for more efficiency (see below).

> # set-operations.
>
> # I would appreciate any hints and or pointers.
>
> # Thank's for reading.
> # JE

First, when you work with sets you should probably use Set - if your
sets get large you will notice the speed difference.

irb(main):002:0> a = [1,2,3,4,5].to_set
=> #<Set: {5, 1, 2, 3, 4}>
irb(main):003:0> b = [2,4,6,8,10].to_set
=> #<Set: {6, 2, 8, 4, 10}>
irb(main):004:0> a & b
=> #<Set: {2, 4}>

Secondly, it seems you're basically creating subsets based on some
criteria.  If you just had a single key, you could define your class's
equivalence (#hash, #eql?, #==) to reflect that and use the approach you
tried, i.e. using set unions.

The ad hoc solution is to actually use #select as you did.  But if the
criteria change, you rather need a structure that resembles an RDBMS
table which can have any number of indexes.  If you have to do this
multiple times probably also with varying criteria you might want to
create a wrapper around a Set that can support arbitrary number of
indexes (Hashes) and keeps those consistent much like an RDBMS does.

You might find some more information in the archives of this list.

Kind regards

  robert
Jon Egil Stand (Guest)
on 2007-09-26 01:04
(Received via mailing list)
Robert

 Thank you, I will heed your advice. Set is a nice library, and it's
 use of Hash for blazingly fast lookup scales very well.
This topic is locked and can not be replied to.