Set intersection with a specific criteria: a better idiom?


#1

Hi,

I have been trying to intersect 2 sets of objects that where I want the
“intersected” objects to share a common property.

This is what I came up with:

Here, I want to get all the Errors having the same file name:

def intersection_set
Set.new(event_errors).&(Set.new(socket_channel_errors)) {|x|
file_name(x)}
end

To allow that I extended the Set class with:

def &(enum, &property)
n = self.class.new
if block_given?
mapped = self.map{|e| property.call(e)}
enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
else
enum.each { |o| n.add(o) if include?(o) }
end
n
end

I have met this requirement quite some times while scripting and I may
miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Thanks,

Eric.


Eric TORREBORRE
tel: +81 (0)90 5580 3280
e-mail: removed_email_address@domain.invalid / removed_email_address@domain.invalid
blog: http://etorreborre.blogspot.com

  _____________________________________________________________________________

Ne gardez plus qu’une seule adresse mail ! Copiez vos mails vers Yahoo!
Mail


#2

Eric T. wrote:

end
end
n
end

I have met this requirement quite some times while scripting and I may miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Hm, it’s like Schwartzian transform: intersect_by instead of sort_by.

Your implementation looks good, but might be more efficient with:

   mapped = self.class.new(map{|e| property.call(e)})

which would make mapped a set, for faster lookups.

Also, using yield will probably be more efficient than instantiating a
proc and calling it:

def &(enum)
n = self.class.new
if block_given?
mapped = self.class.new(map{|e| yield e})
enum.each { |o| n.add(o) if mapped.include?(yield o) }
else
enum.each { |o| n.add(o) if include?(o) }
end
n
end

It would be more elegant to use select than to iterate through enum
using #each and #add to the set, but Set#select returns an Array. At the
cost of using two iterations instead of once, you could do this:

require ‘set’

class Set
def intersect_by(other)
props = self.class.new(other.map {|t| yield t})
self.class.new(select {|t| props.include? t.prop})
end

Actually the following is closer to the origial:

def intersect_by(other)

props = self.class.new(map {|t| yield t})

self.class.new(other.select {|t| props.include? t.prop})

end

alias old_intersect &

def &(other)
if block_given?
intersect_by(other) {|t| yield t}
else
old_intersect(other) {|t| yield t}
end
end
end

class Test
attr_accessor :prop
def initialize prop
@prop = prop
end
end

s1 = Set.new([
Test.new(1),
Test.new(2),
Test.new(3)
])

s2 = Set.new([
Test.new(2),
Test.new(3),
Test.new(4)
])

p(s1.intersect_by(s2) {|t| t.prop})
p(Set[1,2,3]&Set[2,3,4])

END

#<Set: {#<Test:0xb79f73e4 @prop=2>, #<Test:0xb79f73d0 @prop=3>}>
#<Set: {2, 3}>


#3

Eric T. wrote:

end
end
n
end

I have met this requirement quite some times while scripting and I may miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Hm, it’s like Schwartzian transform: intersect_by instead of sort_by.

Your implementation looks good, but might be more efficient with:

   mapped = self.class.new(map{|e| property.call(e)})

which would make mapped a set, for faster lookups.

Also, using yield will probably be more efficient than instantiating a
proc and calling it:

def &(enum)
n = self.class.new
if block_given?
mapped = self.class.new(map{|e| yield e})
enum.each { |o| n.add(o) if mapped.include?(yield o) }
else
enum.each { |o| n.add(o) if include?(o) }
end
n
end

It would be more elegant to use select than to iterate through enum
using #each and #add to the set, but Set#select returns an Array. At the
cost of using two iterations instead of once, you could do this:

require ‘set’

class Set
def intersect_by(other)
props = self.class.new(other.map {|t| yield t})
self.class.new(select {|t| props.include? t.prop})
end

Actually the following is closer to the origial:

def intersect_by(other)

props = self.class.new(map {|t| yield t})

self.class.new(other.select {|t| props.include? t.prop})

end

alias old_intersect &

def &(other)
if block_given?
intersect_by(other) {|t| yield t}
else
old_intersect(other) {|t| yield t}
end
end
end

class Test
attr_accessor :prop
def initialize prop
@prop = prop
end
end

s1 = Set.new([
Test.new(1),
Test.new(2),
Test.new(3)
])

s2 = Set.new([
Test.new(2),
Test.new(3),
Test.new(4)
])

p(s1.intersect_by(s2) {|t| t.prop})
p(Set[1,2,3]&Set[2,3,4])

END

Output:

#<Set: {#<Test:0xb79f73e4 @prop=2>, #<Test:0xb79f73d0 @prop=3>}>
#<Set: {2, 3}>


#4

On 31.05.2007 19:21, Joel VanderWerf wrote:

def intersection_set
enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
else
enum.each { |o| n.add(o) if include?(o) }
end
n
end

I have met this requirement quite some times while scripting and I may
miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Basically you want to do the intersection based on a specific criterion.
My first choice in such a scenario would be a Hash based solution,
i.e. you create an index based on what you consider the key field. See
attachment for a sample implementation.

Regarding your question whether to include your code in std lib or
facets lib: I vote “no”. Here’s why: a set intersection will leave you
with a single uniform set of items, namely all items present in all sets
that take part in the intersection. Your solution is not directly an
intersection but it creates a copy of one set with all elements from the
copy removed that do not fit a certain criterion. More specifically,
you kind of reversed the logic because the resulting set has only
elements from the /parameter/ (i.e. the right hand side of the operator)
and not from the left hand side. So while for intersection order of
arguments does not matter, it does for substraction:

irb(main):001:0> require ‘set’
=> true
irb(main):002:0> s1 = (1…10).to_set
=> #<Set: {5, 6, 1, 7, 2, 8, 3, 9, 4, 10}>
irb(main):003:0> s2 = (5…15).to_set
=> #<Set: {5, 11, 6, 12, 7, 13, 8, 14, 9, 15, 10}>
irb(main):004:0> s1 - s2
=> #<Set: {1, 2, 3, 4}>
irb(main):005:0> s2 - s1
=> #<Set: {11, 12, 13, 14, 15}>

I would rewrite your original code like this:

require ‘set’

use a set for efficiency reasons

keys = event_errors.map {|x| file_name x}.to_set

or, more efficient:

keys = event_errors.inject(Set.new) {|s,x| s << file_name(x)}

create the result

result = socket_channel_errors.select {|x| keys.include? file_name(x)}

Since this is basically a two liner I don’t see any necessity to change
std libs.

Kind regards

robert