Detecting duplicates in an array, anything in the standard l

thbar · August 19, 2007, 12:40pm

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I’ve
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:

%w(a b c c).dups
=> [“c”]

Anything more elegant ?

cheers

Thibaut

thbar · August 19, 2007, 1:20pm

Thibaut BarrÃ¨re wrote:

Anything more elegant ?

No! :-)) - I tried it only using Arrays…

a = [1,2,3,4,5,4,2,2]
p a.inject([[],a[1…-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1…-1]]
: [r[0], r[1][1…-1]]}[0].uniq # => [2, 4]
b = %w(a b c c)
p b.inject([[],b[1…-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1…-1]]
: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]

Wolfgang NÃ¡dasi-Donner

thbar · August 19, 2007, 1:34pm

Hi –

On Sun, 19 Aug 2007, Wolfgang NÃ¡dasi-Donner wrote:

: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
How about:

a = [1,2,3,4,5,4,2,2]
=> [1, 2, 3, 4, 5, 4, 2, 2]

a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]

David

thbar · August 19, 2007, 2:34pm

Hi –

On Sun, 19 Aug 2007, Wolfgang NÃ¡dasi-Donner wrote:

a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]

David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn’t care about double elements,

Sorry, just ignore me. I’ve reinvented Array#uniq /me reaches for
coffee…

David

thbar · August 19, 2007, 2:06pm

David A. Black wrote:

Hi –

On Sun, 19 Aug 2007, Wolfgang NÃ¡dasi-Donner wrote:

: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
How about:

a = [1,2,3,4,5,4,2,2]
=> [1, 2, 3, 4, 5, 4, 2, 2]

a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]

David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn’t care about double elements,
otherwise…

irb(main):004:0> a
=> [1, 2, 3, 4, 5, 4, 2, 2]
irb(main):005:0> b
=> [1, 2, 3, 4, 5]
irb(main):006:0> a-b
=> []

…would work. My solution is not recommended at all - it’s sunday after
lunch time, and I had the decision between cleaning the dishes or to do
some nice things before…

Wolfgang NÃ¡dasi-Donner

thbar · August 19, 2007, 3:01pm

On Aug 19, 2007, at 6:39 AM, Thibaut Barrère wrote:

end

which will give:

%w(a b c c).dups
=> [“c”]

Anything more elegant ?

Couldn’t you also just do a union with itself?

a = %w(a b c b a)
b = a & a #=> [“a”, “b”, “c”]

Score one for me :-))
~ Ari
English is like a pseudo-random number generator - there are a
bajillion rules to it, but nobody cares.

thbar · August 19, 2007, 3:25pm

On Aug 19, 2007, at 9:06 AM, David A. Black wrote:

I think that just reinvents uniq (see my previous reinvention

The only reason I’ll accept that

is because you wrote the book I’m reading.

---------------------------------------------------------------|
~Ari
“I don’t suffer from insanity. I enjoy every minute of it” --1337est
man alive

thbar · August 19, 2007, 9:36pm

On Aug 19, 5:38 am, Thibaut Barrère [email protected] wrote:

end

Thibaut

Here’s a modification of a technique used by
Simon Kroger:

class Array
def dups
values_at( * (0…size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil

%w(a b a c c d).dups
==>[“a”, “c”]

thbar · August 19, 2007, 3:08pm

Hi –

On Sun, 19 Aug 2007, Ari B. wrote:

def dups

Couldn’t you also just do a union with itself?

a = %w(a b c b a)
b = a & a #=> [“a”, “b”, “c”]

Score one for me :-))

I think that just reinvents uniq (see my previous reinvention

For what it’s worth, here’s a nice-looking but probably very
inefficient version:

module ArrayStuff
def count(e)
select {|f| f == e }.size
end

def dups
select {|e| count(e) > 1 }.uniq
end
end

a = [1,2,3,3,4,5,2].extend(ArrayStuff)

p a.dups # [2,3]

David

thbar · August 19, 2007, 10:06pm

On Aug 19, 12:34 pm, William J. [email protected] wrote:

module Enumerable

def dups
values_at( * (0…size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil

Does everyone agree that #dups is the best name for this? I recently
added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

(Facets already had #nonuniq, btw.)

T.

thbar · August 19, 2007, 10:31pm

On Aug 19, 3:05 pm, Trans [email protected] wrote:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:

%w(a b c c).dups

=> [“c”]

                            I recently
added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

+1

thbar · August 19, 2007, 10:01pm

Thanks for all your replies!

thbar · August 21, 2007, 5:35am

From: Thibaut Barrère [mailto:[email protected]]

inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys

sshhh, in ruby1.9, i think you just do

group_by{|e|e}.select{|_,v| v.size>1}.keys

yes, yes, hash#select now hopefully returns hash.
can’t we have group_by now ?

kind regards -botp

thbar · August 19, 2007, 11:23pm

On 19.08.2007 12:38, Thibaut Barrère wrote:

end

which will give:

%w(a b c c).dups
=> [“c”]

Actually you are not deleting duplicates as far as I can see. Here’s
another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<1;h}
=> [“c”]

You could even change that to need just one iteration through the
original array but it’s too late and I’m too lazy.

Kind regards

robert

thbar · August 21, 2007, 10:01am

Duplicates can also be extracted from an array like this:

class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

(The faster, the better; http://snippets.dzone.com/posts/show/4148 )

Cheers,

j.k.

thbar · August 21, 2007, 10:25am

From: Jimmy K. [mailto:[email protected]]

uniq.map {|v| (self - [v]).size < (self.size - 1) ? v :

nil}.compact

cool.
could we simplify it like,

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=> [1, 2]

kind regards -botp

thbar · August 21, 2007, 1:46pm

Hi –

On Tue, 21 Aug 2007, Jimmy K. wrote:

Duplicates can also be extracted from an array like this:

class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

It’s buggy, though:

[nil,1,2,2,3,nil].find_dups
=> [2]

David

thbar · August 21, 2007, 10:31am

From: PeÃ±a, Botp [mailto:[email protected]]

irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq

=> [1, 2]

oops,

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]

thbar · August 21, 2007, 3:55pm

Posted by PeÃ±a, Botp (Guest) on 21.08.2007 10:31

could we simplify it like

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]

Sure.

ruby -e ‘a = [nil,1,2,2,3,nil]’ -e ‘p a.uniq.select{|e| (a-[e]).size <
a.size - 1}’
=> [nil, 2]

So we do not need to fix the original version to handle nil correctly:

ruby -e ‘a = [nil,1,2,2,3,nil]’ -e 'p (a.size - a.nitems > 1) ? ([nil]

a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact) :
(a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact)’
=> [nil, 2]

Cheers,

j.k.

thbar · August 21, 2007, 9:41pm

Jeremy W. wrote:
I actually had to … find all the duplicate account
numbers and the number of times they were duplicated and … .
…
~Jeremy

A much less verbose ‘nil’ fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

And with this fixed version it’s also possible to count & grab duplicate
array items in one go:

a = [nil,1,2,2,3,nil,nil]
a = (a * 5 << “unique_obj1” << “unique_obj2”).sort_by { rand }

p a.uniq.map {|v| diff = (a.size - (a-[v]).size); (diff > 1) ? [v, diff]
: nil}.compact

=> [[2, 10], [3, 5], [nil, 15], [1, 5]]

Cheers,

j.k.