thbar
August 19, 2007, 12:40pm
1
Hi!
Just wondering if there is something simple already built in the std
library to remove duplicates from an array (or an enumerable). I’ve
seen and used various approaches, like:
module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end
which will give:
%w(a b c c).dups
=> [“c”]
Anything more elegant ?
cheers
Thibaut
thbar
August 19, 2007, 1:20pm
2
Thibaut Barrère wrote:
Anything more elegant ?
No! :-)) - I tried it only using Arrays…
a = [1,2,3,4,5,4,2,2]
p a.inject([[],a[1…-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1…-1]]
: [r[0], r[1][1…-1]]}[0].uniq # => [2, 4]
b = %w(a b c c)
p b.inject([[],b[1…-1]]){|r,e|r[1].include?(e) ? [r[0]<<e, r[1][1…-1]]
: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
Wolfgang Nádasi-Donner
thbar
August 19, 2007, 1:34pm
3
Hi –
On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:
: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
How about:
a = [1,2,3,4,5,4,2,2]
=> [1, 2, 3, 4, 5, 4, 2, 2]
a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]
David
thbar
August 19, 2007, 2:34pm
4
Hi –
On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:
a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]
David
The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn’t care about double elements,
Sorry, just ignore me. I’ve reinvented Array#uniq /me reaches for
coffee…
David
thbar
August 19, 2007, 2:06pm
5
David A. Black wrote:
Hi –
On Sun, 19 Aug 2007, Wolfgang Nádasi-Donner wrote:
: [r[0], r[1][1…-1]]}[0].uniq # => [“c”]
How about:
a = [1,2,3,4,5,4,2,2]
=> [1, 2, 3, 4, 5, 4, 2, 2]
a.inject([]) {|acc,e| acc << e unless acc.include?(e); acc }
=> [1, 2, 3, 4, 5]
David
The problem is, that he wants all non unique elements. Unfortunately the
difference of two arrays doesn’t care about double elements,
otherwise…
irb(main):004:0> a
=> [1, 2, 3, 4, 5, 4, 2, 2]
irb(main):005:0> b
=> [1, 2, 3, 4, 5]
irb(main):006:0> a-b
=> []
…would work. My solution is not recommended at all - it’s sunday after
lunch time, and I had the decision between cleaning the dishes or to do
some nice things before…
Wolfgang Nádasi-Donner
thbar
August 19, 2007, 3:01pm
6
On Aug 19, 2007, at 6:39 AM, Thibaut Barrère wrote:
end
which will give:
%w(a b c c).dups
=> [“c”]
Anything more elegant ?
Couldn’t you also just do a union with itself?
a = %w(a b c b a)
b = a & a #=> [“a”, “b”, “c”]
Score one for me :-))
~ Ari
English is like a pseudo-random number generator - there are a
bajillion rules to it, but nobody cares.
thbar
August 19, 2007, 3:25pm
7
On Aug 19, 2007, at 9:06 AM, David A. Black wrote:
I think that just reinvents uniq (see my previous reinvention
The only reason I’ll accept that
is because you wrote the book I’m reading.
---------------------------------------------------------------|
~Ari
“I don’t suffer from insanity. I enjoy every minute of it” --1337est
man alive
thbar
August 19, 2007, 9:36pm
8
On Aug 19, 5:38 am, Thibaut Barrère [email protected] wrote:
end
Thibaut
Here’s a modification of a technique used by
Simon Kroger:
class Array
def dups
values_at( * (0…size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil
%w(a b a c c d).dups
==>[“a”, “c”]
thbar
August 19, 2007, 3:08pm
9
Hi –
On Sun, 19 Aug 2007, Ari B. wrote:
def dups
Couldn’t you also just do a union with itself?
a = %w(a b c b a)
b = a & a #=> [“a”, “b”, “c”]
Score one for me :-))
I think that just reinvents uniq (see my previous reinvention
For what it’s worth, here’s a nice-looking but probably very
inefficient version:
module ArrayStuff
def count(e)
select {|f| f == e }.size
end
def dups
select {|e| count(e) > 1 }.uniq
end
end
a = [1,2,3,3,4,5,2].extend(ArrayStuff)
p a.dups # [2,3]
David
thbar
August 19, 2007, 10:06pm
10
On Aug 19, 12:34 pm, William J. [email protected] wrote:
module Enumerable
def dups
values_at( * (0…size).to_a - uniq.map{|x| index(x)} )
end
end
==>nil
Does everyone agree that #dups is the best name for this? I recently
added this to Facets as #duplicates to avoid proximity to #dup . Is
that reasonable?
(Facets already had #nonuniq , btw.)
T.
thbar
August 19, 2007, 10:31pm
11
On Aug 19, 3:05 pm, Trans [email protected] wrote:
module Enumerable
def dups
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
end
end
which will give:
%w(a b c c).dups
=> [“c”]
I recently
added this to Facets as #duplicates to avoid proximity to #dup . Is
that reasonable?
+1
thbar
August 19, 2007, 10:01pm
12
Thanks for all your replies!
thbar
August 21, 2007, 5:35am
13
From: Thibaut Barrère [mailto:[email protected] ]
inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys
sshhh, in ruby1.9, i think you just do
group_by{|e|e}.select{|_,v| v.size>1}.keys
yes, yes, hash#select now hopefully returns hash.
can’t we have group_by now ?
kind regards -botp
thbar
August 19, 2007, 11:23pm
14
On 19.08.2007 12:38, Thibaut Barrère wrote:
end
which will give:
%w(a b c c).dups
=> [“c”]
Actually you are not deleting duplicates as far as I can see. Here’s
another one
irb(main):012:0> a.inject(Hash.new(0)) {|h,x|
h[x]+=1;h}.inject([]){|h,(k,v)|h<1;h}
=> [“c”]
You could even change that to need just one iteration through the
original array but it’s too late and I’m too lazy.
Kind regards
robert
thbar
August 21, 2007, 10:01am
15
Duplicates can also be extracted from an array like this:
class Array
def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end
end
(The faster, the better; http://snippets.dzone.com/posts/show/4148 )
Cheers,
j.k.
thbar
August 21, 2007, 10:25am
16
From: Jimmy K. [mailto:[email protected] ]
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v :
nil}.compact
cool.
could we simplify it like,
irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=> [1, 2]
kind regards -botp
thbar
August 21, 2007, 1:46pm
17
Hi –
On Tue, 21 Aug 2007, Jimmy K. wrote:
Duplicates can also be extracted from an array like this:
class Array
def find_dups
uniq.map {|v| (self - [v]).size < (self.size - 1) ? v : nil}.compact
end
end
It’s buggy, though:
[nil,1,2,2,3,nil].find_dups
=> [2]
David
thbar
August 21, 2007, 10:31am
18
From: Peña, Botp [mailto:[email protected] ]
irb(main):015:0> a.select{|e| (a-[e]).size < a.size - 1}.uniq
=> [1, 2]
oops,
irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]
thbar
August 21, 2007, 3:55pm
19
Posted by Peña, Botp (Guest) on 21.08.2007 10:31
could we simplify it like
irb(main):014:0> a
=> [1, 1, 2, 2, 2, 4, 3]
irb(main):015:0> a.uniq.select{|e| (a-[e]).size < a.size - 1}
=> [1, 2]
Sure.
ruby -e ‘a = [nil,1,2,2,3,nil]’ -e ‘p a.uniq.select{|e| (a-[e]).size <
a.size - 1}’
=> [nil, 2]
So we do not need to fix the original version to handle nil correctly:
ruby -e ‘a = [nil,1,2,2,3,nil]’ -e 'p (a.size - a.nitems > 1) ? ([nil]
a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact) :
(a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? v : nil}.compact)’
=> [nil, 2]
Cheers,
j.k.
thbar
August 21, 2007, 9:41pm
20
Jeremy W. wrote:
I actually had to … find all the duplicate account
numbers and the number of times they were duplicated and … .
…
~Jeremy
A much less verbose ‘nil’ fix of the original version would be to use
[v] instead of v:
a = [nil,1,2,2,3,nil]
p a.uniq.map {|v| (a - [v]).size < (a.size - 1) ? [v] :
nil}.compact.flatten
=> [nil, 2]
And with this fixed version it’s also possible to count & grab duplicate
array items in one go:
a = [nil,1,2,2,3,nil,nil]
a = (a * 5 << “unique_obj1” << “unique_obj2”).sort_by { rand }
p a.uniq.map {|v| diff = (a.size - (a-[v]).size); (diff > 1) ? [v, diff]
: nil}.compact
=> [[2, 10], [3, 5], [nil, 15], [1, 5]]
Cheers,
j.k.