Detecting duplicates in an array, anything in the standard l

Hi!

Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I’ve
seen and used various approaches, like:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:

%w(a b c c).dups
=> [“c”]

Anything more elegant ?

cheers

Thibaut

Thibaut Barrère wrote:

Anything more elegant ?

No! :-)) - I tried it only using Arrays…

a = [1,2,3,4,5,4,2,2]
p a.inject([[],a[1…-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1…-1]]
: [r[0], r[1][1…-1]]}[0].uniq # => [2, 4]
b = %w(a b c c)
p b.inject([[],b[1…-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1…-1]]
: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]

Wolfgang Nádasi-Donner

Hi –

On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:

: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
How about:

a = [1,2,3,4,5,4,2,2]
=> [1, 2, 3, 4, 5, 4, 2, 2]

a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]

David

Hi –

On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:

a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]

David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn’t care about double elements,

Sorry, just ignore me. I’ve reinvented Array#uniq :slight_smile: /me reaches for
coffee…

David

David A. Black wrote:

Hi –

On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:

: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
How about:

a = [1,2,3,4,5,4,2,2]
=> [1, 2, 3, 4, 5, 4, 2, 2]

a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]

David

The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn’t care about double elements,
otherwise…

irb(main):004:0> a
=> [1, 2, 3, 4, 5, 4, 2, 2]
irb(main):005:0> b
=> [1, 2, 3, 4, 5]
irb(main):006:0> a-b
=> []

…would work. My solution is not recommended at all - it’s sunday after
lunch time, and I had the decision between cleaning the dishes or to do
some nice things before…

Wolfgang Nádasi-Donner

On Aug 19, 2007, at 6:39 AM, Thibaut Barrère wrote:

end

which will give:

%w(a b c c).dups
=> [“c”]

Anything more elegant ?

Couldn’t you also just do a union with itself?

a = %w(a b c b a)
b = a & a #=> [“a”, “b”, “c”]

Score one for me :-))
~ Ari
English is like a pseudo-random number generator - there are a
bajillion rules to it, but nobody cares.

On Aug 19, 2007, at 9:06 AM, David A. Black wrote:

I think that just reinvents uniq (see my previous reinvention :slight_smile:

The only reason I’ll accept that

is because you wrote the book I’m reading.

---------------------------------------------------------------|
~Ari
“I don’t suffer from insanity. I enjoy every minute of it” --1337est
man alive

On Aug 19, 5:38 am, Thibaut Barrère [email protected] wrote:

end

Thibaut

Here’s a modification of a technique used by
Simon Kroger:

class Array
def dups
values_at( * (0…size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil

%w(a b a c c d).dups
==>[“a”, “c”]

Hi –

On Sun, 19 Aug 2007, Ari B. wrote:

def dups

Couldn’t you also just do a union with itself?

a = %w(a b c b a)
b = a & a #=> [“a”, “b”, “c”]

Score one for me :-))

I think that just reinvents uniq (see my previous reinvention :slight_smile:

For what it’s worth, here’s a nice-looking but probably very
inefficient version:

module ArrayStuff
def count(e)
select {|f| f == e }.size
end

def dups
select {|e| count(e) > 1 }.uniq
end
end

a = [1,2,3,3,4,5,2].extend(ArrayStuff)

p a.dups # [2,3]

David

On Aug 19, 12:34 pm, William J. [email protected] wrote:

module Enumerable

def dups
values_at( * (0…size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil

Does everyone agree that #dups is the best name for this? I recently
added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

(Facets already had #nonuniq, btw.)

T.

On Aug 19, 3:05 pm, Trans [email protected] wrote:

module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end

which will give:

%w(a b c c).dups

=> [“c”]

                            I recently

added this to Facets as #duplicates to avoid proximity to #dup. Is
that reasonable?

+1

Thanks for all your replies!

From: Thibaut Barrère [mailto:[email protected]]

inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys

sshhh, in ruby1.9, i think you just do

group_by{|e|e}.select{|_,v| v.size>1}.keys

yes, yes, hash#select now hopefully returns hash.
can’t we have group_by now ? :slight_smile:

kind regards -botp

On 19.08.2007 12:38, Thibaut Barrère wrote:

end

which will give:

%w(a b c c).dups
=> [“c”]

Actually you are not deleting duplicates as far as I can see. Here’s
another one

irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<1;h}
=> [“c”]

You could even change that to need just one iteration through the
original array but it’s too late and I’m too lazy. :slight_smile:

Kind regards

robert

Duplicates can also be extracted from an array like this:

class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

(The faster, the better; http://snippets.dzone.com/posts/show/4148 )

Cheers,

j.k.

From: Jimmy K. [mailto:[email protected]]

uniq.map {|v| (self - [v]).size < (self.size - 1) ? v :

nil}.compact

cool.
could we simplify it like,

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=> [1, 2]

kind regards -botp

Hi –

On Tue, 21 Aug 2007, Jimmy K. wrote:

Duplicates can also be extracted from an array like this:

class Array

def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end

end

It’s buggy, though:

[nil,1,2,2,3,nil].find_dups
=> [2]

David

From: Peña, Botp [mailto:[email protected]]

irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq

=> [1, 2]

oops,

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]

Posted by Peña, Botp (Guest) on 21.08.2007 10:31

could we simplify it like

irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]

Sure.

ruby -e ‘a = [nil,1,2,2,3,nil]’ -e ‘p a.uniq.select{|e| (a-[e]).size <
a.size - 1}’
=> [nil, 2]

So we do not need to fix the original version to handle nil correctly:

ruby -e ‘a = [nil,1,2,2,3,nil]’ -e 'p (a.size - a.nitems > 1) ? ([nil]

  • a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact) :
    (a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact)’
    => [nil, 2]

Cheers,

j.k.

Jeremy W. wrote:
I actually had to … find all the duplicate account
numbers and the number of times they were duplicated and … .

~Jeremy

A much less verbose ‘nil’ fix of the original version would be to use
[v] instead of v:

a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]

And with this fixed version it’s also possible to count & grab duplicate
array items in one go:

a = [nil,1,2,2,3,nil,nil]
a = (a * 5 << “unique_obj1” << “unique_obj2”).sort_by { rand }

p a.uniq.map {|v| diff = (a.size - (a-[v]).size); (diff > 1) ? [v, diff]
: nil}.compact

=> [[2, 10], [3, 5], [nil, 15], [1, 5]]

Cheers,

j.k.