Finding shared elements between to arrays

casper_the_ghost · October 9, 2007, 11:46pm

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
“solutions” myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

Well… here are the two solutions that seemed the most obvious to me. I
profiled them and the second one performed a little better:

a1 = [‘a’, ‘b’, ‘c’, ‘d’]
a2 = [‘c’, ‘d’, ‘e’, ‘f’]
a3 = []

#Solution 1
a1.each { |e| a3 << e if a2.include?(e) }

#Solution 2
a3 = a1.map { |e| e if a2.include?(e) }
a3.compact!

Info from the profiler:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
41.56 29.29 29.29 400000 0.07 0.11 Array#include?
28.31 49.24 19.95 100000 0.20 0.68 Array#each
22.06 64.79 15.55 1100000 0.01 0.01 String#==
4.19 67.74 2.95 200000 0.01 0.01 Array#<<
3.89 70.48 2.74 1 2740.00 70480.00 Integer#upto
0.00 70.48 0.00 1 0.00 70480.00 #toplevel

% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.82 28.24 28.24 400000 0.07 0.11 Array#include?
23.32 43.27 15.03 1100000 0.01 0.01 String#==
22.25 57.61 14.34 100000 0.14 0.58 Array#map
8.50 63.09 5.48 1 5480.00 64450.00 Integer#upto
2.11 64.45 1.36 100000 0.01 0.01 Array#compact!
0.00 64.45 0.00 1 0.00 64450.00 #toplevel

Thanks in advance!

Best regards
Sebastian

casper_the_ghost · October 10, 2007, 12:03am

I think the operation you are looking for is a set intersection. In
Ruby, it is the & method.

[‘a’, ‘b’, ‘c’, ‘d’] & [‘c’, ‘d’, ‘e’, ‘f’]
=> [“c”, “d”]

Carl

On Oct 9, 2:46 pm, Sebastian probst Eide

casper_the_ghost · October 10, 2007, 12:05am

Nice Carl!
Thank you very much!

Sebastian

casper_the_ghost · October 10, 2007, 12:24am

On Oct 9, 5:46 pm, Sebastian probst Eide
[email protected] wrote:

Hi
I am wondering if there is a really smart built in way to get an array
which has the elements shared by two other arrays? I made two
“solutions” myself, but I am wondering what other people are using, and
if there is a preferred solution to the task? array.union(other_array)
or something like that?

The & operator, as defined on Arrays, performs an intersection
operation.

From the RDoc:

array & other_array

 Set Intersection---Returns a new array containing elements common
 to the two arrays, with no duplicates.

    [ 1, 1, 3, 5 ] & [ 1, 2, 3 ]   #=> [ 1, 3 ]

Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:

a3 = a1.select { |e| a2.include?(e) }

Best,

Eric

Are you interested in on-site Ruby training that’s been highly
reviewed by former students? http://LearnRuby.com

casper_the_ghost · October 10, 2007, 12:29am

Yeah, I was wondering if there wasn’t a better way than using map and
compact!
Ruby has so many functions that it’s often hard for me to know which one
to chose. I guess I’ll learn as I use Ruby more!

Thanks for the reply Eric!

Sebastian

casper_the_ghost · October 10, 2007, 3:35pm

I’ll check it out!
Thanks Robert.

By the way: What a great forum this is! So many helpful, friendly and
knowledgeable people!

Thanks to all of you!

Best regards
Sebastian

casper_the_ghost · October 10, 2007, 3:57pm

On 10/10/07, Eric I. [email protected] wrote:

operation.
Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:
a3 = a1.select { |e| a2.include?(e) }
Best,

Eric

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is not a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is not
[1,1,2] & [1,1,3]

Well so much for being clever, I believe it was Kent Back, “clever is
an insult”.

Cheers
Robert

casper_the_ghost · October 10, 2007, 10:13am

2007/10/10, Eric I. [email protected]:

operation.
Also, another tip: rather than using a combination of map and
compact!, you could instead use select, as in:
a3 = a1.select { |e| a2.include?(e) }

Just a caveat: for large Arrays it might be inefficient (I don’t know
how it’s implemented). You might want to look at class Set:

irb(main):001:0> require ‘set’
=> true
irb(main):002:0> [ 1, 1, 3, 5 ].to_set.intersection [ 1, 2, 3 ]
=> #<Set: {1, 3}>

Kind regards

robert

casper_the_ghost · October 10, 2007, 4:33pm

On 10/10/07, Robert D. [email protected] wrote:

Oops I just made a fool out of myself in a different thread, funny
nobody hollered at me so far, please let me repair the damage here
a3 = a1.select { |e| a2.include?(e) }
is not a1 & a2
[1,1,2].select{ |x| [1,1,3].include? x }
is not
[1,1,2] & [1,1,3]

The difference being that & eliminates duplicates from the result.

Another semantic equivalent for a1.select{ |e| a2.include?(e) }

is either
a1 - (a1 - a2)
or
a2 - (a2 - a1)

Whether or not one of these performs better than the select depends on
the ‘statistical’ properites of the arrays:

require “benchmark”

TESTS = 10_000
a1 = [1, 2, 3, 1, 1, 4, 5] * 100
a2 = [1, 3, 5]

r1 = a1.select { |e| a2.include?(e) }
r2 = a2.select { |e| a1.include?(e) }
r3 = a1 - (a1 - a2)
r4 = a2 - (a2 - a1)

puts r1 = r2 ? “Good” : “Bad”
puts r2 = r3 ? “Good” : “Bad”
puts r3 = r4 ? “Good” : “Bad”

Benchmark.bmbm do |results|
results.report(“select a1 in a2:”) { TESTS.times { a1.select { |e|
a2.include?(e) } } }
results.report(“select a2 in a1:”) { TESTS.times { a2.select { |e|
a1.include?(e) } } }
results.report(“a1 - (a1 - a2) :”) { TESTS.times { a1 - (a1 - a2 ) } }
results.report(“a2 - (a2 - a1) :”) { TESTS.times { a2 - (a2 - a1) } }
end

Produces:

Good
Good
Good
Rehearsal ----------------------------------------------------
select a1 in a2: 3.360000 0.020000 3.380000 ( 3.417788)
select a2 in a1: 0.030000 0.000000 0.030000 ( 0.026397)
a1 - (a1 - a2) : 0.670000 0.000000 0.670000 ( 0.680342)
a2 - (a2 - a1) : 0.240000 0.010000 0.250000 ( 0.250795)
------------------------------------------- total: 4.330000sec

                   user     system      total        real

select a1 in a2: 3.360000 0.010000 3.370000 ( 3.386324)
select a2 in a1: 0.020000 0.000000 0.020000 ( 0.024740)
a1 - (a1 - a2) : 0.660000 0.000000 0.660000 ( 0.670034)
a2 - (a2 - a1) : 0.240000 0.000000 0.240000 ( 0.242316)

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

casper_the_ghost · October 10, 2007, 6:38pm

On 10/10/07, Robert D. [email protected] wrote:

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it

Heck, I use it so infrequently that I just thought of it.

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

casper_the_ghost · October 10, 2007, 4:47pm

On 10/10/07, Rick DeNatale [email protected] wrote:

The difference being that & eliminates duplicates from the result.

Another semantic equivalent for a1.select{ |e| a2.include?(e) }

is either
a1 - (a1 - a2)
or
a2 - (a2 - a1)
ah interesting
I do however feel very bad about the duplicate elimination of Array#&
this really seems to be worth a RCR, Arrays are just not sets from a
conceptional POV and from a practical POV we have uniq if we want to
be the result unique.

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it
Maybe Array#intersect would be nice to have too?

Cheers
Robert

casper_the_ghost · October 11, 2007, 6:30pm

On Oct 10, 10:47 am, “Robert D.” [email protected] wrote:

I do however feel very bad about the duplicate elimination of Array#&
this really seems to be worth a RCR, Arrays are just not sets from a
conceptional POV and from a practical POV we have uniq if we want to
be the result unique.

a1 - (a1 - a2 )
which is the best effort one can make ( I guess ) just seems not so
readable anymore, and it is not an idiom frequently used so one does
not get easily acquainted to it
Maybe Array#intersect would be nice to have too?

Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
The difference is whether the duplicates in a1 are maintained (the
former) or the duplicates is a2 are maintained.

For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

If you wanted the & operator to maintain duplicates, would it maintain
duplicates on the left-hand side, the right-hand side, or both? And
if it is the LHS or RHS, then we’ve lost commuatativity (a1 & a2 is
not necessarily equal to a2 & a1).

Eric

P.S. What is “RCR”?

Interested in hands-on, on-site Ruby training? See http://LearnRuby.com
for information about a well-reviewed class.

casper_the_ghost · October 12, 2007, 8:34pm

On 10/11/07, Eric I. [email protected] wrote:

Well a1 - (a1 - a2) is not necessarily the same as a2 - (a2 - a1).
The difference is whether the duplicates in a1 are maintained (the
former) or the duplicates is a2 are maintained.

For example, suppose a1 is [1, 1, 2, 3, 4] and a2 is [1, 2, 2, 3, 5].
The result would be either [1, 1, 2, 3] or [1, 2, 2, 3].

Yep, you’re right, not enough test cases!

I realize now that the OPs request for “an array which has the
elements shared by two other arrays” is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.

a1 = [1,1,2,3,4]
a2 = [1,2,2,3,5]
a1 - (a1-a2) => [1, 1, 2, 3]
a2 - (a2-a1)=> [1, 2, 2, 3]
a1.select {|e| a2.include? e} => [1, 1, 2, 3]
a2.select {|e| a1.include? e}=> [1, 2, 2, 3]

Should the elements shared between [1,1,2,3,4] and [1,2,2,3,5] be
different than the elements shared between [1,2,2,3,5] and
[1,1,2,3,4]?

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

casper_the_ghost · October 11, 2007, 10:58pm

Eric I. wrote:

P.S. What is “RCR”?

A “Ruby change request” - see http://rcrchive.net/

Wolfgang NÃ¡dasi-Donner

casper_the_ghost · October 12, 2007, 11:44pm

I realize now that the OPs request for “an array which has the
elements shared by two other arrays” is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.
Well, I never really thought about the possibility of duplicates. In my
case the arrays wouldn’t contain any dupes, but I wouldn’t want my
resulting array to have any anyway!

S

casper_the_ghost · October 13, 2007, 8:06pm

On 12.10.2007 23:44, Sebastian probst Eide wrote:

I realize now that the OPs request for “an array which has the
elements shared by two other arrays” is a little ambiguous as to
duplications.
Note that the OPs solutions and the solutions using select produce the
same different results if you reverse the two arrays.
Well, I never really thought about the possibility of duplicates. In my
case the arrays wouldn’t contain any dupes, but I wouldn’t want my
resulting array to have any anyway!

In that case you should probably use Set anyway as typical set
operations are more efficient with Set. It is just the right data type
to model a set, which seems what you are trying to do.

Kind regards

robert