Forum: Ruby Set intersection with a specific criteria: a better idiom?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
5de8b254c711059975670fba5ec10ab1?d=identicon&s=25 Eric Torreborre (Guest)
on 2007-05-31 05:32
(Received via mailing list)
Hi,

I have been trying to intersect 2 sets of objects that where I want the
"intersected" objects to share a common property.

This is what I came up with:

Here, I want to get all the Errors having the same file name:

def intersection_set
  Set.new(event_errors).&(Set.new(socket_channel_errors)) {|x|
file_name(x)}
end

To allow that I extended the Set class with:

def &(enum, &property)
    n = self.class.new
    if block_given?
      mapped = self.map{|e| property.call(e)}
      enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
    else
      enum.each { |o| n.add(o) if include?(o) }
    end
    n
end

I have met this requirement quite some times while scripting and I may
miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Thanks,

Eric.


---------------------------------------------------------------------------
Eric TORREBORRE
tel: +81 (0)90 5580 3280
e-mail: etorreborre@yahoo.com / etorreborre@docomo.ne.jp
blog: http://etorreborre.blogspot.com
---------------------------------------------------------------------------




      _____________________________________________________________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
Mail
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2007-05-31 19:21
(Received via mailing list)
Eric Torreborre wrote:
> end
>     end
>     n
> end
>
> I have met this requirement quite some times while scripting and I may miss something 
here.
> Do you know a more idiomatic, Ruby-way of doing that?

Hm, it's like Schwartzian transform: intersect_by instead of sort_by.

Your implementation looks good, but might be more efficient with:

       mapped = self.class.new(map{|e| property.call(e)})

which would make mapped a set, for faster lookups.

Also, using yield will probably be more efficient than instantiating a
proc and calling it:

   def &(enum)
     n = self.class.new
     if block_given?
       mapped = self.class.new(map{|e| yield e})
       enum.each { |o| n.add(o) if mapped.include?(yield o) }
     else
       enum.each { |o| n.add(o) if include?(o) }
     end
     n
   end

It would be more elegant to use select than to iterate through enum
using #each and #add to the set, but Set#select returns an Array. At the
cost of using two iterations instead of once, you could do this:


require 'set'

class Set
   def intersect_by(other)
     props = self.class.new(other.map {|t| yield t})
     self.class.new(select {|t| props.include? t.prop})
   end

# Actually the following is closer to the origial:
#  def intersect_by(other)
#    props = self.class.new(map {|t| yield t})
#    self.class.new(other.select {|t| props.include? t.prop})
#  end

   alias old_intersect &

   def &(other)
     if block_given?
       intersect_by(other) {|t| yield t}
     else
       old_intersect(other) {|t| yield t}
     end
   end
end

class Test
   attr_accessor :prop
   def initialize prop
     @prop = prop
   end
end

s1 = Set.new([
   Test.new(1),
   Test.new(2),
   Test.new(3)
])

s2 = Set.new([
   Test.new(2),
   Test.new(3),
   Test.new(4)
])

p(s1.intersect_by(s2) {|t| t.prop})
p(Set[1,2,3]&Set[2,3,4])

__END__

#<Set: {#<Test:0xb79f73e4 @prop=2>, #<Test:0xb79f73d0 @prop=3>}>
#<Set: {2, 3}>
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2007-05-31 19:23
(Received via mailing list)
Eric Torreborre wrote:
> end
>     end
>     n
> end
>
> I have met this requirement quite some times while scripting and I may miss something 
here.
> Do you know a more idiomatic, Ruby-way of doing that?

Hm, it's like Schwartzian transform: intersect_by instead of sort_by.

Your implementation looks good, but might be more efficient with:

       mapped = self.class.new(map{|e| property.call(e)})

which would make mapped a set, for faster lookups.

Also, using yield will probably be more efficient than instantiating a
proc and calling it:

   def &(enum)
     n = self.class.new
     if block_given?
       mapped = self.class.new(map{|e| yield e})
       enum.each { |o| n.add(o) if mapped.include?(yield o) }
     else
       enum.each { |o| n.add(o) if include?(o) }
     end
     n
   end

It would be more elegant to use select than to iterate through enum
using #each and #add to the set, but Set#select returns an Array. At the
cost of using two iterations instead of once, you could do this:


require 'set'

class Set
   def intersect_by(other)
     props = self.class.new(other.map {|t| yield t})
     self.class.new(select {|t| props.include? t.prop})
   end

# Actually the following is closer to the origial:
#  def intersect_by(other)
#    props = self.class.new(map {|t| yield t})
#    self.class.new(other.select {|t| props.include? t.prop})
#  end

   alias old_intersect &

   def &(other)
     if block_given?
       intersect_by(other) {|t| yield t}
     else
       old_intersect(other) {|t| yield t}
     end
   end
end

class Test
   attr_accessor :prop
   def initialize prop
     @prop = prop
   end
end

s1 = Set.new([
   Test.new(1),
   Test.new(2),
   Test.new(3)
])

s2 = Set.new([
   Test.new(2),
   Test.new(3),
   Test.new(4)
])

p(s1.intersect_by(s2) {|t| t.prop})
p(Set[1,2,3]&Set[2,3,4])

__END__

Output:

#<Set: {#<Test:0xb79f73e4 @prop=2>, #<Test:0xb79f73d0 @prop=3>}>
#<Set: {2, 3}>
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-06-01 13:25
(Received via mailing list)
On 31.05.2007 19:21, Joel VanderWerf wrote:
>> def intersection_set
>>       enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
>>     else
>>       enum.each { |o| n.add(o) if include?(o) }
>>     end
>>     n
>> end
>>
>> I have met this requirement quite some times while scripting and I may
>> miss something here.
>> Do you know a more idiomatic, Ruby-way of doing that?

Basically you want to do the intersection based on a specific criterion.
  My first choice in such a scenario would be a Hash based solution,
i.e. you create an index based on what you consider the key field.  See
attachment for a sample implementation.

Regarding your question whether to include your code in std lib or
facets lib: I vote "no".  Here's why: a set intersection will leave you
with a single uniform set of items, namely all items present in all sets
that take part in the intersection.  Your solution is not directly an
intersection but it creates a copy of one set with all elements from the
copy removed that do not fit a certain criterion.  More specifically,
you kind of reversed the logic because the resulting set has only
elements from the /parameter/ (i.e. the right hand side of the operator)
and not from the left hand side.  So while for intersection order of
arguments does not matter, it does for substraction:

irb(main):001:0> require 'set'
=> true
irb(main):002:0> s1 = (1..10).to_set
=> #<Set: {5, 6, 1, 7, 2, 8, 3, 9, 4, 10}>
irb(main):003:0> s2 = (5..15).to_set
=> #<Set: {5, 11, 6, 12, 7, 13, 8, 14, 9, 15, 10}>
irb(main):004:0> s1 - s2
=> #<Set: {1, 2, 3, 4}>
irb(main):005:0> s2 - s1
=> #<Set: {11, 12, 13, 14, 15}>

I would rewrite your original code like this:

require 'set'
...
# use a set for efficiency reasons
keys = event_errors.map {|x| file_name x}.to_set
# or, more efficient:
keys = event_errors.inject(Set.new) {|s,x| s << file_name(x)}
# create the result
result = socket_channel_errors.select {|x| keys.include? file_name(x)}

Since this is basically a two liner I don't see any necessity to change
std libs.

Kind regards

  robert
This topic is locked and can not be replied to.