How to remove dups from 2 lists?

mschwab · May 11, 2007, 4:19am

I’m trying to write some code that removes all elements from 2 lists
that
are in both lists. However, I don’t want any duplicates from each list
deleted also (which is what the array “-” operator does). The code I
have
now doesn’t handle restarting the current iteration for both loops when
a
match is found and deleted in both loops. Here’s the code:

def RemoveDupsFromLists ( list1 , list2 )
list1.each_index do | i |
list2.each_index do | j |
if list1[i] == list2[j]
list1.delete_at ( i )
list2.delete_at ( j )
end
end
end
return [ list1 , list2 ]
end

What’s weird is that doing this is easy in C (my first language), but
difficult in Ruby. Everything else I’ve seen has been MUCH easier in
Ruby.

Mike S.

mschwab · May 11, 2007, 5:11am

On May 10, 2007, at 10:18 PM, Mike S. wrote:

list1.each_index do | i |
What’s weird is that doing this is easy in C (my first language), but
difficult in Ruby. Everything else I’ve seen has been MUCH easier
in Ruby.

Mike S.

Having some examples might help, but is something like this what
you’re after?

a = %w[ a a a b b c ]
=> [“a”, “a”, “a”, “b”, “b”, “c”]
b = %w[ b c c c d ]
=> [“b”, “c”, “c”, “c”, “d”]
a + b
=> [“a”, “a”, “a”, “b”, “b”, “c”, “b”, “c”, “c”, “c”, “d”]
a | b
=> [“a”, “b”, “c”, “d”]
a - b
=> [“a”, “a”, “a”]
a & b
=> [“b”, “c”]
a - (a & b)
=> [“a”, “a”, “a”]
b - (a & b)
=> [“d”]

In particular, which do you expect from RemoveDupsFromLists( a, b )?

What I have for the last two expressions above:
=> [ [“a”, “a”, “a”], [“d”] ]

Or:
=> [ [“a”, “a”, “a”, “b”], [“c”, “c”, “d”] ]
because it works like canceling terms in a fraction:

[“a”, “a”, “a”, “b”, “b”, “c”]

                  ["b", "c", "c", "c", "d"]

Basically, write a test:

require ‘test/unit’
class RemoveDupsTest < Test::Unit::TestCase
def test_simple
list1 = [“a”, “a”, “a”, “b”, “b”, “c”]
list2 = [“b”, “c”, “c”, “c”, “d”]
expects = [ [,], [,] ] #<=== fill me it!
assert_equals expects, RemoveDupsFromLists(list1, list2)
end
end

Then you’ll know what you want (and so will we!) and you’ll be sure
when you get it working.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

mschwab · May 11, 2007, 5:53am

On 5/11/07, Mike S. [email protected] wrote:

I’m trying to write some code that removes all elements from 2 lists that
are in both lists. However, I don’t want any duplicates from each list
deleted also (which is what the array “-” operator does). The code I have

If I understand correctly, I think this does what you want.

arr1 = %w[a a a b b b c d d e e]
arr2 = %w[a b b b c c c d d d d d e a b]

str1 = arr1.to_s
str2 = arr2.to_s

(arr1 & arr2).each do |x|
str1.sub!(x,“”)
str2.sub!(x,“”)
end

p str1.split(//).to_a
p str2.split(//).to_a

Harry

mschwab · May 11, 2007, 5:57am

On 5/11/07, Harry K. [email protected] wrote:

On 5/11/07, Mike S. [email protected] wrote:

I’m trying to write some code that removes all elements from 2 lists that
are in both lists. However, I don’t want any duplicates from each list
deleted also (which is what the array “-” operator does). The code I have

If I understand correctly, I think this does what you want.

Oops! Correction.

arr1 = %w[a a a b b b c d d e e]
arr2 = %w[a b b b c c c d d d d d e a b]

str1 = arr1.to_s
str2 = arr2.to_s

(arr1 & arr2).each do |x|
str1.sub!(x,“”)
str2.sub!(x,“”)
end

p str1.split(//)
p str2.split(//)

mschwab · May 11, 2007, 6:47am

On 11 May 2007, at 11:18, Mike S. wrote:

end
Here is how I would write it: (it didn’t give quite the same results
as your method but seems to be closer to what you asked for so…):

def remove_duplicates(list1,list2)
(list1 & list2).each do |x|
list1.delete(x)
list2.delete(x)
end
return list1,list2
end

irb(main):028:0> RemoveDupsFromLists([1,1,2,3,4],[2,2,3,5,5])
=> [[1, 1, 4], [2, 5, 5]]
irb(main):029:0> remove_duplicates([1,1,2,3,4],[2,2,3,5,5])
=> [[1, 1, 4], [5, 5]]

Be careful with your method since it alters the Arrays as you iterate
through them and will give hard to understand results sometimes. In
the example above ‘2’ is a duplicate but is only deleted from the
second array once because of this problem/feature.

Alex G.

Bioinformatics Center
Kyoto University

mschwab · May 11, 2007, 7:29am

I’m also not real sure of what exactly you wanted but…
I assumed the following :

def remove_dups_from_both_lists (list1, list2)
list1_dup = list1.dup
remove_dups_from_first_list list1, list2
remove_dups_from_first_list list2, list1_dup
end

def remove_dups_from_first_list(list_to_prune, list_with_dups)
hash_with_occurrence_count = get_hash_with_occurrence_count
list_to_prune
decrement_count_for_dups(list_with_dups, hash_with_occurrence_count)
get_remaining_item_list(list_to_prune, hash_with_occurrence_count)
end

def get_hash_with_occurrence_count(list)
hsh = Hash.new { |h,k| h[k] = 0}
list.each { |item| hsh[item] += 1 }
hsh
end

def decrement_count_for_dups(list, other_list_as_hash)
list.each { |item| other_list_as_hash[item] -= 1 }
end

def get_remaining_item_list(list, list_as_hash)
list.each_with_index do |item, idx|
if (list_as_hash[item] > 0)
list_as_hash[item] -= 1
else
list[idx] = nil
end
end
list.compact
end

list1 = %w{one one two three four four five}
list2 = %w{one three three four five five five}
puts “before — list1:#{list1}”
puts “before — list2:#{list2}”
remove_dups_from_both_lists list1, list2
puts “after — list1:#{list1}”
puts “after — list2:#{list2}”

mschwab · May 11, 2007, 9:10am

On 11.05.2007 06:46, Alex G. wrote:

end
end
example above ‘2’ is a duplicate but is only deleted from the second
array once because of this problem/feature.

Sets also come in handy - especially if those lists are large.

Kind regards

robert

mschwab · May 11, 2007, 10:23am

On Fri, May 11, 2007 at 04:55:14PM +0900, Enrique Comba R.
wrote:

def RemoveDupsFromLists ( list1 , list2 )
lists = SyncEnumerator.new(list1, list2)
lists.each { |element_list1, element_list2|
if list1[element_list1] == list2[element_list2]
  list1.delete(element_list1)
  list2.delete(element_list2)
end
}
end

Is it safe to delete from lists while you’re enumerating through them?

mschwab · May 11, 2007, 9:55am

On 11 May 2007, at 09:10, Robert K. wrote:

Sets also come in handy - especially if those lists are large.

Kind regards

robert

I would actually need to know what you really want to do Let me
explain. In a list you can put different elements in the list with
the same values. Unlike in a hash, where the keys must be unique.

Do you want to remove the elements that are in the same position on
the lists and are equal?

If so I would say:

require ‘generator’

list1 = [1,1,2,3,4,6] # => I included the 6 to show what I mean…
list2 = [2,2,3,5,5,6] # => I included the 6 to show what I mean…

def RemoveDupsFromLists ( list1 , list2 )
lists = SyncEnumerator.new(list1, list2)
lists.each { |element_list1, element_list2|

 if list1[element_list1] == list2[element_list2]
   list1.delete(element_list1)
   list2.delete(element_list2)
 end

}
end

Cheers,

Enrique Comba R.

mschwab · May 11, 2007, 10:28am

On 11 May 2007, at 10:23, Brian C. wrote:

list2 = [2,2,3,5,5,6] # => I included the 6 to show what I mean…
end

Is it safe to delete from lists while you’re enumerating through them?

Actually not It depends if other objects are trying to access
those lists at the same time though…

mschwab · May 11, 2007, 1:51pm

Assuming my previous assumption of what exactly is needed… at this
point
it’s academic anyway, right :>

This will produce the same result with much cleaner code than my
previous
post:

APPENDAGE_START = “_”

def make_items_unique(list)
hsh = Hash.new { |h,k| h[k] = 0}
list_mod = list.collect do |x|
hsh[x] += 1
x.to_s + APPENDAGE_START + hsh[x].to_s
end
end

list1 = %w{one one two three four four five}
list2 = %w{one three three four five five five}

puts “before — list1:#{list1}”
puts “before — list2:#{list2}”

list1_mod = make_items_unique(list1)
list2_mod = make_items_unique(list2)

list3 = list1_mod - list2_mod
list4 = list2_mod - list1_mod

list1 = list3.collect { |x| x.split(APPENDAGE_START)[0] }
list2 = list4.collect { |x| x.split(APPENDAGE_START)[0] }

puts “after — list1:#{list1}”
puts “after — list2:#{list2}”

output=
before — list1:oneonetwothreefourfourfive
before — list2:onethreethreefourfivefivefive
after — list1:onetwofour
after — list2:threefivefive

mschwab · May 11, 2007, 2:49pm

I envision using sets:

a = [a, b, c, d, e, e]
b = [d, e, f, f, g]
dups = a.to_set & b

a -= dups # => [a, b, c]
b -= dups # => [f, f, g]

Or how about just using #-?

a = [a, b, c, d, e, e]
b = [d, e, f, f, g]
c = a.dup
d = b.dup

b -= c # => [a, b, c]
a -= d # => [f, f, g]

Do these fit?

Aur

P.S. have you had a look at http://RubyMentor.rubyforge.org/

mschwab · May 11, 2007, 3:17pm

I am new to Ruby but I am wondering why it is that no one is using the
uniq call that gets rid of duplicates in an array. Couldn’t you join
the two arrays, then call MyJoinedArray.uniq!, then take the resulting
set and format it as you please? I know that you are doing more than
just that. I was mostly wondering why you would not use the built in
call.

mschwab · May 11, 2007, 3:33pm

I think a big part of it is that there are variations in what we assume
the
questioner wanted.

In my case, I interpreted it as:

remove items that appear in both lists from both lists
For instance, removeDups ([a, b, c, d], [b, d, f, g]) => [a, c],
[f,
g]
don’t go so far as remove more than the common count of dups
For instance, removeDups ([a, a, b, b, c, d, d], [b, d, d, d, f, g, g])
=> [a, a, b, c], [d, f, g, g]
keep the lists in original order (probably not required but I’m not
sure)

I guess, it would be nice to have had the need demonstrated via
example or
clearly stated…
but its been fun.

mschwab · May 11, 2007, 4:41pm

On 5/11/07, Todd B. [email protected] wrote:

a.uniq.each { |i| a.each { |j| ha[i] += 1 if i == j } }
Of course, you can only compare 2 lists.

Todd

Sorry, the puts lines should read:

puts “list1: #{ca.inspect}”
puts “list2: #{cb.inspect}”

mschwab · May 11, 2007, 4:39pm

This is how I would do it (that is, if I understand the OP’s request).

a = %w( a a b b c e f)
b = %w( b c c c d e e g)

ha = {}; hb = {}; dif = {}
ha.default = hb.default = 0
ca = []; cb = []

a.uniq.each { |i| a.each { |j| ha[i] += 1 if i == j } }
b.uniq.each { |i| b.each { |j| hb[j| += 1 if i == j } }
(ha.merge hb).keys.each { |k| dif[k] = ha[k] - hb[k] }
dif.each { |k,v| v>0 ? ca += (kv).split(//) : cb += (k-v).split(//) }

puts “list1: #{ (dif.each { |k,v| (kv).split(//) if v>=0 }" # [“a”,
“a”,
“b”, “f”]
puts "list2: #{ (dif.each { |k,v| (k-v).split(//) if v<0 }” # [“c”,
“c”,
“d”, “e”, “g”]

Of course, you can only compare 2 lists.

Todd

mschwab · May 11, 2007, 4:43pm

On 5/11/07, Mike S. [email protected] wrote:

            list1.delete_at ( i )

Mike S.

After thinking about your question again, I think you meant something
a little different than what I was thinking before, I think
This is no shorter than your code, just different.

arr1 = %w[a b car car b c r c car c c r d]
arr2 = %w[a r1 a b c c car d r r r d]

counts1 = Hash.new(0)
arr1.each {|x| counts1[x] += 1}

counts2 = Hash.new(0)
arr2.each {|x| counts2[x] += 1}

new1 = []
new2 = []

arr1.uniq.each do |x|
(counts1[x] - counts2[x]).times {new1 << x} if counts1[x] > counts2[x]
end

arr2.uniq.each do |x|
(counts2[x] - counts1[x]).times {new2 << x} if counts2[x] > counts1[x]
end

p new1 #[“b”,“car”,“car”, “c”, “c”]
p new2 #[“a”,“r1”,“d”,“r”]

Harry

mschwab · May 11, 2007, 4:51pm

Kevin C. wrote:

I think a big part of it is that there are variations in what we assume
the
questioner wanted.

In my case, I interpreted it as:

remove items that appear in both lists from both lists
For instance, removeDups ([a, b, c, d], [b, d, f, g]) => [a, c],
[f,
g]

don’t go so far as remove more than the common count of dups
For instance, removeDups ([a, a, b, b, c, d, d], [b, d, d, d, f, g, g])
=> [a, a, b, c], [d, f, g, g]

keep the lists in original order (probably not required but I’m not
sure)

I guess, it would be nice to have had the need demonstrated via
example or
clearly stated…
but its been fun.

I have to say that I am almost certainly being simplistic here but why
cannot we do something like this:

a = [1, 2, 3, 4]
b = [2, 4, 6, 8]

p a
p b

c = a & b
a = a - c
b = b - c

p c
p a
p b

result:
[1, 2, 3, 4]
[2, 4, 6, 8]
[2, 4]
[1, 3]
[6, 8]

mschwab · May 11, 2007, 5:20pm

Harry K. wrote:

What happens here?
All 5’s are deleted.

If you are asking about my example, there were no 5s in it to begin
with. As was observed earlier, I think that better examples would have
resulted in better code. I was mostly interested in finding a rubyish
way to make it happen with great simplicity. If there is something
slightly different needed, perhaps a tweak or two to the arrays before
applying the differences could handle it.

Anyway, this is my first attempt at writing code to solve a question in
here. I am happy that I was able to come up with something that worked
and post it.

yay ruby!

mschwab · May 11, 2007, 5:36pm

Wow - this question generated lots of replies! I’m still reading through
all
of them.

I have another idea, but not sure if this will work:

def RemoveDupsFromLists ( list1 , list2 )
list1.each_item do | i |
list2.each_item do | j |
if !(list2[j].nil?) and list1[i] == list2[j]
list1[i] = nil
list2[j] = nil
end
end
end
list1.compact!
list2.compact!
return [ list1 , list2 ]
end