How to remove dups from 2 lists?


#1

I’m trying to write some code that removes all elements from 2 lists
that
are in both lists. However, I don’t want any duplicates from each list
deleted also (which is what the array “-” operator does). The code I
have
now doesn’t handle restarting the current iteration for both loops when
a
match is found and deleted in both loops. Here’s the code:

def RemoveDupsFromLists ( list1 , list2 )
list1.each_index do | i |
list2.each_index do | j |
if list1[i] == list2[j]
list1.delete_at ( i )
list2.delete_at ( j )
end
end
end
return [ list1 , list2 ]
end

What’s weird is that doing this is easy in C (my first language), but
difficult in Ruby. Everything else I’ve seen has been MUCH easier in
Ruby.

Mike S.


#2

On May 10, 2007, at 10:18 PM, Mike S. wrote:

list1.each_index do | i |
What’s weird is that doing this is easy in C (my first language), but
difficult in Ruby. Everything else I’ve seen has been MUCH easier
in Ruby.

Mike S.

Having some examples might help, but is something like this what
you’re after?

a = %w[ a a a b b c ]
=> [“a”, “a”, “a”, “b”, “b”, “c”]

b = %w[ b c c c d ]
=> [“b”, “c”, “c”, “c”, “d”]

a + b
=> [“a”, “a”, “a”, “b”, “b”, “c”, “b”, “c”, “c”, “c”, “d”]

a | b
=> [“a”, “b”, “c”, “d”]

a - b
=> [“a”, “a”, “a”]

a & b
=> [“b”, “c”]

a - (a & b)
=> [“a”, “a”, “a”]

b - (a & b)
=> [“d”]

In particular, which do you expect from RemoveDupsFromLists( a, b )?

What I have for the last two expressions above:
=> [ [“a”, “a”, “a”], [“d”] ]

Or:
=> [ [“a”, “a”, “a”, “b”], [“c”, “c”, “d”] ]
because it works like canceling terms in a fraction:

[“a”, “a”, “a”, “b”, “b”, “c”]

                  ["b", "c", "c", "c", "d"]

Basically, write a test:

require ‘test/unit’
class RemoveDupsTest < Test::Unit::TestCase
def test_simple
list1 = [“a”, “a”, “a”, “b”, “b”, “c”]
list2 = [“b”, “c”, “c”, “c”, “d”]
expects = [ [,], [,] ] #<=== fill me it!
assert_equals expects, RemoveDupsFromLists(list1, list2)
end
end

Then you’ll know what you want (and so will we!) and you’ll be sure
when you get it working.

-Rob

Rob B. http://agileconsultingllc.com
removed_email_address@domain.invalid


#3

On 5/11/07, Mike S. removed_email_address@domain.invalid wrote:

I’m trying to write some code that removes all elements from 2 lists that
are in both lists. However, I don’t want any duplicates from each list
deleted also (which is what the array “-” operator does). The code I have

If I understand correctly, I think this does what you want.

arr1 = %w[a a a b b b c d d e e]
arr2 = %w[a b b b c c c d d d d d e a b]

str1 = arr1.to_s
str2 = arr2.to_s

(arr1 & arr2).each do |x|
str1.sub!(x,"")
str2.sub!(x,"")
end

p str1.split(//).to_a
p str2.split(//).to_a

Harry


#4

On 5/11/07, Harry K. removed_email_address@domain.invalid wrote:

On 5/11/07, Mike S. removed_email_address@domain.invalid wrote:

I’m trying to write some code that removes all elements from 2 lists that
are in both lists. However, I don’t want any duplicates from each list
deleted also (which is what the array “-” operator does). The code I have

If I understand correctly, I think this does what you want.

Oops! Correction.

arr1 = %w[a a a b b b c d d e e]
arr2 = %w[a b b b c c c d d d d d e a b]

str1 = arr1.to_s
str2 = arr2.to_s

(arr1 & arr2).each do |x|
str1.sub!(x,"")
str2.sub!(x,"")
end

p str1.split(//)
p str2.split(//)


#5

On 11 May 2007, at 11:18, Mike S. wrote:

end
Here is how I would write it: (it didn’t give quite the same results
as your method but seems to be closer to what you asked for so…):

def remove_duplicates(list1,list2)
(list1 & list2).each do |x|
list1.delete(x)
list2.delete(x)
end
return list1,list2
end

irb(main):028:0> RemoveDupsFromLists([1,1,2,3,4],[2,2,3,5,5])
=> [[1, 1, 4], [2, 5, 5]]
irb(main):029:0> remove_duplicates([1,1,2,3,4],[2,2,3,5,5])
=> [[1, 1, 4], [5, 5]]

Be careful with your method since it alters the Arrays as you iterate
through them and will give hard to understand results sometimes. In
the example above ‘2’ is a duplicate but is only deleted from the
second array once because of this problem/feature.

Alex G.

Bioinformatics Center
Kyoto University


#6

I’m also not real sure of what exactly you wanted but…
I assumed the following :

def remove_dups_from_both_lists (list1, list2)
list1_dup = list1.dup
remove_dups_from_first_list list1, list2
remove_dups_from_first_list list2, list1_dup
end

def remove_dups_from_first_list(list_to_prune, list_with_dups)
hash_with_occurrence_count = get_hash_with_occurrence_count
list_to_prune
decrement_count_for_dups(list_with_dups, hash_with_occurrence_count)
get_remaining_item_list(list_to_prune, hash_with_occurrence_count)
end

def get_hash_with_occurrence_count(list)
hsh = Hash.new { |h,k| h[k] = 0}
list.each { |item| hsh[item] += 1 }
hsh
end

def decrement_count_for_dups(list, other_list_as_hash)
list.each { |item| other_list_as_hash[item] -= 1 }
end

def get_remaining_item_list(list, list_as_hash)
list.each_with_index do |item, idx|
if (list_as_hash[item] > 0)
list_as_hash[item] -= 1
else
list[idx] = nil
end
end
list.compact
end

list1 = %w{one one two three four four five}
list2 = %w{one three three four five five five}
puts “before — list1:#{list1}”
puts “before — list2:#{list2}”
remove_dups_from_both_lists list1, list2
puts “after — list1:#{list1}”
puts “after — list2:#{list2}”


#7

On 11.05.2007 06:46, Alex G. wrote:

end
end
example above ‘2’ is a duplicate but is only deleted from the second
array once because of this problem/feature.

Sets also come in handy - especially if those lists are large.

Kind regards

robert


#8

On Fri, May 11, 2007 at 04:55:14PM +0900, Enrique Comba R.
wrote:

def RemoveDupsFromLists ( list1 , list2 )
lists = SyncEnumerator.new(list1, list2)
lists.each { |element_list1, element_list2|

if list1[element_list1] == list2[element_list2]
  list1.delete(element_list1)
  list2.delete(element_list2)
end

}
end

Is it safe to delete from lists while you’re enumerating through them?


#9

On 11 May 2007, at 09:10, Robert K. wrote:

Sets also come in handy - especially if those lists are large.

Kind regards

robert

I would actually need to know what you really want to do :frowning: Let me
explain. In a list you can put different elements in the list with
the same values. Unlike in a hash, where the keys must be unique.

Do you want to remove the elements that are in the same position on
the lists and are equal?

If so I would say:

require ‘generator’

list1 = [1,1,2,3,4,6] # => I included the 6 to show what I mean…
list2 = [2,2,3,5,5,6] # => I included the 6 to show what I mean…

def RemoveDupsFromLists ( list1 , list2 )
lists = SyncEnumerator.new(list1, list2)
lists.each { |element_list1, element_list2|

 if list1[element_list1] == list2[element_list2]
   list1.delete(element_list1)
   list2.delete(element_list2)
 end

}
end

Cheers,

Enrique Comba R.


#10

On 11 May 2007, at 10:23, Brian C. wrote:

list2 = [2,2,3,5,5,6] # => I included the 6 to show what I mean…
end

Is it safe to delete from lists while you’re enumerating through them?

Actually not :wink: It depends if other objects are trying to access
those lists at the same time though…


#11

Assuming my previous assumption of what exactly is needed… at this
point
it’s academic anyway, right :>

This will produce the same result with much cleaner code than my
previous
post:

APPENDAGE_START = “_”

def make_items_unique(list)
hsh = Hash.new { |h,k| h[k] = 0}
list_mod = list.collect do |x|
hsh[x] += 1
x.to_s + APPENDAGE_START + hsh[x].to_s
end
end

list1 = %w{one one two three four four five}
list2 = %w{one three three four five five five}

puts “before — list1:#{list1}”
puts “before — list2:#{list2}”

list1_mod = make_items_unique(list1)
list2_mod = make_items_unique(list2)

list3 = list1_mod - list2_mod
list4 = list2_mod - list1_mod

list1 = list3.collect { |x| x.split(APPENDAGE_START)[0] }
list2 = list4.collect { |x| x.split(APPENDAGE_START)[0] }

puts “after — list1:#{list1}”
puts “after — list2:#{list2}”


output=
before — list1:oneonetwothreefourfourfive
before — list2:onethreethreefourfivefivefive
after — list1:onetwofour
after — list2:threefivefive


#12

I envision using sets:

a = [a, b, c, d, e, e]
b = [d, e, f, f, g]
dups = a.to_set & b

a -= dups # => [a, b, c]
b -= dups # => [f, f, g]

Or how about just using #-?

a = [a, b, c, d, e, e]
b = [d, e, f, f, g]
c = a.dup
d = b.dup

b -= c # => [a, b, c]
a -= d # => [f, f, g]

Do these fit?

Aur

P.S. have you had a look at http://RubyMentor.rubyforge.org/


#13

I am new to Ruby but I am wondering why it is that no one is using the
uniq call that gets rid of duplicates in an array. Couldn’t you join
the two arrays, then call MyJoinedArray.uniq!, then take the resulting
set and format it as you please? I know that you are doing more than
just that. I was mostly wondering why you would not use the built in
call.


#14

I think a big part of it is that there are variations in what we assume
the
questioner wanted.

In my case, I interpreted it as:

  1. remove items that appear in both lists from both lists
    For instance, removeDups ([a, b, c, d], [b, d, f, g]) => [a, c],
    [f,
    g]
  2. don’t go so far as remove more than the common count of dups
    For instance, removeDups ([a, a, b, b, c, d, d], [b, d, d, d, f, g, g])
    => [a, a, b, c], [d, f, g, g]
  3. keep the lists in original order (probably not required but I’m not
    sure)

I guess, it would be nice to have had the need demonstrated via
example or
clearly stated…
but its been fun.


#15

On 5/11/07, Todd B. removed_email_address@domain.invalid wrote:

a.uniq.each { |i| a.each { |j| ha[i] += 1 if i == j } }
Of course, you can only compare 2 lists.

Todd

Sorry, the puts lines should read:

puts “list1: #{ca.inspect}”
puts “list2: #{cb.inspect}”


#16

This is how I would do it (that is, if I understand the OP’s request).

a = %w( a a b b c e f)
b = %w( b c c c d e e g)

ha = {}; hb = {}; dif = {}
ha.default = hb.default = 0
ca = []; cb = []

a.uniq.each { |i| a.each { |j| ha[i] += 1 if i == j } }
b.uniq.each { |i| b.each { |j| hb[j| += 1 if i == j } }
(ha.merge hb).keys.each { |k| dif[k] = ha[k] - hb[k] }
dif.each { |k,v| v>0 ? ca += (kv).split(//) : cb += (k-v).split(//) }

puts “list1: #{ (dif.each { |k,v| (kv).split(//) if v>=0 }" # [“a”,
“a”,
“b”, “f”]
puts "list2: #{ (dif.each { |k,v| (k
-v).split(//) if v<0 }” # [“c”,
“c”,
“d”, “e”, “g”]

Of course, you can only compare 2 lists.

Todd


#17

On 5/11/07, Mike S. removed_email_address@domain.invalid wrote:

            list1.delete_at ( i )

Mike S.

After thinking about your question again, I think you meant something
a little different than what I was thinking before, I think :slight_smile: :slight_smile:
This is no shorter than your code, just different.

arr1 = %w[a b car car b c r c car c c r d]
arr2 = %w[a r1 a b c c car d r r r d]

counts1 = Hash.new(0)
arr1.each {|x| counts1[x] += 1}

counts2 = Hash.new(0)
arr2.each {|x| counts2[x] += 1}

new1 = []
new2 = []

arr1.uniq.each do |x|
(counts1[x] - counts2[x]).times {new1 << x} if counts1[x] > counts2[x]
end

arr2.uniq.each do |x|
(counts2[x] - counts1[x]).times {new2 << x} if counts2[x] > counts1[x]
end

p new1 #[“b”,“car”,“car”, “c”, “c”]
p new2 #[“a”,“r1”,“d”,“r”]

Harry


#18

Kevin C. wrote:

I think a big part of it is that there are variations in what we assume
the
questioner wanted.

In my case, I interpreted it as:

  1. remove items that appear in both lists from both lists
    For instance, removeDups ([a, b, c, d], [b, d, f, g]) => [a, c],
    [f,
    g]
  2. don’t go so far as remove more than the common count of dups
    For instance, removeDups ([a, a, b, b, c, d, d], [b, d, d, d, f, g, g])
    => [a, a, b, c], [d, f, g, g]
  3. keep the lists in original order (probably not required but I’m not
    sure)

I guess, it would be nice to have had the need demonstrated via
example or
clearly stated…
but its been fun.

I have to say that I am almost certainly being simplistic here but why
cannot we do something like this:

a = [1, 2, 3, 4]
b = [2, 4, 6, 8]

p a
p b

c = a & b
a = a - c
b = b - c

p c
p a
p b

result:
[1, 2, 3, 4]
[2, 4, 6, 8]
[2, 4]
[1, 3]
[6, 8]


#19

Harry K. wrote:

What happens here? :slight_smile:
All 5’s are deleted.

If you are asking about my example, there were no 5s in it to begin
with. As was observed earlier, I think that better examples would have
resulted in better code. I was mostly interested in finding a rubyish
way to make it happen with great simplicity. If there is something
slightly different needed, perhaps a tweak or two to the arrays before
applying the differences could handle it.

Anyway, this is my first attempt at writing code to solve a question in
here. I am happy that I was able to come up with something that worked
and post it. :slight_smile:

yay ruby!


#20

Wow - this question generated lots of replies! I’m still reading through
all
of them.

I have another idea, but not sure if this will work:

def RemoveDupsFromLists ( list1 , list2 )
list1.each_item do | i |
list2.each_item do | j |
if !(list2[j].nil?) and list1[i] == list2[j]
list1[i] = nil
list2[j] = nil
end
end
end
list1.compact!
list2.compact!
return [ list1 , list2 ]
end