Detecting duplicates in an array, anything in the standard l

thbar · August 21, 2007, 7:20pm

I just thought I would put in my 2 cents. I actually had to create a
script that would run through a file and find all the duplicate account
numbers and the number of times they were duplicated and write that to a
new file.

@lines = Hash.new(0)
@group = Array.new
IO.readlines(“C:/test/” + @file).each { |line|
@lines[line.split(’;’)[5].chomp] += 1 }
@lines.each_pair { |k,v| @group << k.to_s + " => " + v.to_s if v > 1 }

This is a part of the file that reads the file and grabs the duplicates

~Jeremy

thbar · August 22, 2007, 12:28pm

Jimmy K. wrote:

Jeremy W. wrote:
I actually had to … find all the duplicate account
numbers and the number of times they were duplicated and … .
…
~Jeremy

A much less verbose ‘nil’ fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

This fix does not work for a = [nil,1,2,[7],2,[7],3,nil], but the
previous version using “(a.size - a.nitems > 1) ? …” does. Ruby 1.9
though is said to introduce a non-greedy Array#flatten:

Ruby 1.9

a = [nil,1,[7],2,2,[7],3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten(1)
=> [nil, [7], 2]

Cheers,

j.k.

thbar · September 25, 2007, 11:04pm

On 19.08.2007 23:15, Robert K. wrote:

end
end

which will give:

%w(a b c c).dups
=> [“c”]

Actually you are not deleting duplicates as far as I can see.

Did I say it’s too late? Man, I should’ve worn my glasses…

Here’s another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<1;h}
=> [“c”]

You could even change that to need just one iteration through the
original array but it’s too late and I’m too lazy.

Cheers

robert

thbar · September 25, 2007, 11:07pm

2007/8/21, Peña, Botp [email protected]:

From: Jimmy K. [mailto:[email protected]]

uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact

cool.
could we simplify it like,

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=> [1, 2]

Nice! But I’d think this is more efficient:

irb(main):001:0> a = [1, 1, 2, 2, 2, 4, 3]
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]

Kind regards

robert

thbar · September 25, 2007, 11:10pm

On Aug 21, 2007, at 4:59 AM, Peña, Botp wrote:

From: Robert K. [mailto:[email protected]]

irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=a.dup
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}
=> [1, 2]

I still think it’s easier just to union itself…

a = [1,2,3,2,1]
b = a & a
b = [1,2,3]
---------------------------------------------------------------|
~Ari
“I don’t suffer from insanity. I enjoy every minute of it” --1337est
man alive

thbar · September 25, 2007, 11:10pm

From: Robert K. [mailto:[email protected]]

irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=a.dup
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}
=> [1, 2]

thbar · September 25, 2007, 11:11pm

On Aug 21, 2007, at 01:59 , Peña, Botp wrote:

From: Robert K. [mailto:[email protected]]

irb(main):002:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}

compare also,

irb(main):056:0> b=a.dup
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}
=> [1, 2]

I came up with something vaguely similar:

class Array
def dupes
a = self.dup
self.partition { |o| a.delete(o) }.last
end
end

[1,2,2,3,4,4].dupes
=> [2, 4]

thbar · September 25, 2007, 11:12pm

Hi –

On Tue, 21 Aug 2007, Ryan D. wrote:

irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}

[1,2,2,3,4,4].dupes
=> [2, 4]

You’d want to throw a .uniq on there; otherwise, non-consecutive dupes
get processed twice:

[1,2,2,3,4,4,2].dupes
=> [2, 4, 2]

David

thbar · September 25, 2007, 11:12pm

On Aug 19, 5:16 pm, Robert K. [email protected] wrote:

You could even change that to need just one iteration through the
original array but it’s too late and I’m too lazy.

Cheers
    robert

or…

require ‘set’

new_ary = ary.to_set.to_a #set strips dups.

thbar · September 25, 2007, 11:13pm

On Aug 21, 10:04 am, Ari B. [email protected] wrote:

=> [1, 2]

I still think it’s easier just to union itself…

a = [1,2,3,2,1]
b = a & a
b = [1,2,3]

…but that’s not what the OP wanted. What you’ve written is the same
as the #uniq method.

Don’t feel bad, this thread has been filled with people answering the
wrong question. The original question was roughly “How do I find
out all the elements in the array that are duplicates?”

Solutions to that question would not include ‘3’ in the above results.
It’s unclear to me if %w| a b b b | should include ‘b’ once or twice
in the output, though, and the original poster has not clarified that,
that I can see.

thbar · September 25, 2007, 11:14pm

On 20 Aug 2007, at 13:45, [email protected] wrote:

library to remove duplicates from an array (or an enumerable).
how about calling the uniq method:

[1,2,2,3].uniq

or did I miss the point again?

thbar · September 25, 2007, 11:14pm

Hi –

On Wed, 22 Aug 2007, Phrogz wrote:

irb(main):057:0> b.uniq.select{|e| (b.reject!{|f| f == e}).size > 1}

Don’t feel bad, this thread has been filled with people answering the
wrong question. The original question was roughly “How do I find
out all the elements in the array that are duplicates?”

Solutions to that question would not include ‘3’ in the above results.
It’s unclear to me if %w| a b b b | should include ‘b’ once or twice
in the output, though, and the original poster has not clarified that,
that I can see.

I think once, since it’s just the quality of being non-unique in
the array that qualifies an object for inclusion. At least, that’s my
understanding, though as one of the people who reimplemented
Array#uniq, I may not be the right person to listen to

David

thbar · September 25, 2007, 11:20pm

On 20 Aug 2007, at 14:50, Robert K. wrote:

robert

I’m a n00b, sorry if I’m poking nose in. Couldn’t the op do something
using &, like so:

[1,2,3] & [2,3,4] == [2,3]

?

Regards Gabe

thbar · September 25, 2007, 11:14pm

2007/8/20, [email protected] [email protected]:

seen and used various approaches, like:
=> [“c”]

new_ary = ary.to_set.to_a #set strips dups.

It does, but as far as I can see OP wanted exactly the duplicates back.

Cheers

robert

thbar · September 25, 2007, 11:20pm

Hi –

On Tue, 21 Aug 2007, Gabriel D. wrote:

Cheers

robert

I’m a n00b, sorry if I’m poking nose in. Couldn’t the op do something using
&, like so:

[1,2,3] & [2,3,4] == [2,3]

The original question was how to get all dups occurring in one array:

[1,2,3,2,4,5,5,6] => [2,5]

David