How would you accomplish this?

Given an array of arrays for example
[[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”, “E”],
[“C”, “F”], [“C”, “E”], [“G”]]

what would be a nice way of “merging” the items that seem to already be
contained in another array. e.g.
[“A”,“B”] and [“B”, “D”] are subsets of [“A”, “B”, “D”, “C”] I would
like to drop the subsets and be left with [“A”, “B”, “D”, “C”] only.

My approach was to use the set module in Ruby, and look for set
membership using a pairwise comparison.
I also thought maybe i should look for the largest set. and then check
whether the rest of the sets are subsets of this set. if they are, then
delete them.

However i thought a better solution, or strategy may already exist.
How would you accomplish this?

2009/11/4 George G. [email protected]:

membership using a pairwise comparison.
I also thought maybe i should look for the largest set. and then check
whether the rest of the sets are subsets of this set. if they are, then
delete them.

However i thought a better solution, or strategy may already exist.
How would you accomplish this?

The approach using set size seems reasonable to reduce the number of
set comparisons. You could do

require ‘set’

arrs = [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”,
“E”],
[“C”, “F”], [“C”, “E”], [“G”]]

result = []

arrs.sort_by {|a| -a.size}.each do |a|
s = a.to_set
result << s unless result.any? {|rs| rs.superset? s}
end

p result

→ [#<Set: {“B”, “C”, “F”, “E”}>, #<Set: {“A”, “B”, “D”, “C”}>, #<Set:
{“G”}>]

Kind regards

robert

On Wed, Nov 4, 2009 at 11:49 AM, George G.
[email protected] wrote:

Given an array of arrays for example
[[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”, “E”],
[“C”, “F”], [“C”, “E”], [“G”]]

what would be a nice way of “merging” the items that seem to already be
contained in another array. e.g.
[“A”,“B”] and [“B”, “D”] are subsets of [“A”, “B”, “D”, “C”] I would
like to drop the subsets and be left with [“A”, “B”, “D”, “C”] only.

This is one way:

irb(main):025:0> a = [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”],
[“B”, “C”, “F”, “E”],[“C”, “F”], [“C”, “E”], [“G”]]
=> [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”,
“E”], [“C”, “F”], [“C”, “E”], [“G”]]
irb(main):026:0> a.delete_if {|x| a.any? {|y| (x != y) && ((x&y).size
== x.size)}}
=> [[“A”, “B”, “D”, “C”], [“B”, “C”, “F”, “E”], [“G”]]

Maybe there would be a way to optimize this sorting by size and
checking smaller ones against bigger ones only.
Left as an excercise for the reader :slight_smile:

Jesus.

2009/11/4 Jesús Gabriel y Galán [email protected]:

Maybe there would be a way to optimize this sorting by size and
checking smaller ones against bigger ones only.
Left as an excercise for the reader :slight_smile:

Done. :slight_smile:

robert

George G. wrote:

Given an array of arrays for example
[[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”, “E”],
[“C”, “F”], [“C”, “E”], [“G”]]

what would be a nice way of “merging” the items that seem to already be
contained in another array.

Try this:

arr = [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”,
“E”],
[“C”, “F”], [“C”, “E”], [“G”]]

arr = arr.flatten.sort

str = arr.to_s.squeeze

arr = str.split(//)

This takes the array of arrays, flattens it into a single array
and sorts it. Then turns it into a string so we can use the
squeeze method that eliminates duplicate characters. Then
it changes it back into an array. =)

Greg B. wrote:

George G. wrote:

Given an array of arrays for example
[[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”, “E”],
[“C”, “F”], [“C”, “E”], [“G”]]

what would be a nice way of “merging” the items that seem to already be
contained in another array.

Try this:

arr = [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”,
“E”],
[“C”, “F”], [“C”, “E”], [“G”]]

arr = arr.flatten.sort

str = arr.to_s.squeeze

arr = str.split(//)

This takes the array of arrays, flattens it into a single array
and sorts it. Then turns it into a string so we can use the
squeeze method that eliminates duplicate characters. Then
it changes it back into an array. =)

Well, this will just yield the discrete strings – not what the OP
asked. But what you’re talking about can be done more efficiently with
arr.flatten.uniq.sort . No need for the intermediate string
representation.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Wed, Nov 4, 2009 at 4:19 PM, George G.
[email protected] wrote:

Given an array of arrays for example
[[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”, “E”],
[“C”, “F”], [“C”, “E”], [“G”]]

what would be a nice way of “merging” the items that seem to already be
contained in another array. e.g.
[“A”,“B”] and [“B”, “D”] are subsets of [“A”, “B”, “D”, “C”] I would
like to drop the subsets and be left with [“A”, “B”, “D”, “C”] only.

Here are two ways:

require ‘set’

--------------------------------------------------------------

method 1:

sets = [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”,
“E”], [“C”, “F”], [“C”, “E”], [“G”]]

sets = sets.map {|x| x.to_set}.sort_by {|x| -x.size}

(0…(sets.length)).each do |i|
next unless sets[i]
((i+1)…(sets.length)).each do |j|
sets[j] = nil if sets[j] && sets[j].subset?(sets[i])
end
end

sets = sets.compact.map {|x| x.to_a}
p sets

--------------------------------------------------------------

method 2:

sets = [[“A”, “B”], [“A”, “B”, “D”, “C”], [“B”, “D”], [“B”, “C”, “F”,
“E”], [“C”, “F”], [“C”, “E”], [“G”]]

sets = sets.map {|x| x.to_set}.sort_by {|x| -x.size}

i = 0
while i < sets.length
((i+1)…(sets.length)).each do |j|
sets[j] = nil if sets[j].subset?(sets[i])
end
sets.compact!
i += 1
end

sets = sets.map {|x| x.to_a}
p sets

martin