Finding duplicates in an array and its index number

Hi,

I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.

e.g.

a=[1,2,3,1,3]

Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.

Any suggestions?

Thanks

I’m curious what the purpose is. If it’s simply to remove the duplicates
you can call .uniq on the array to remove duplicates.

I dont have to remove the duplicates but just need to find them with
their index

If you really want to know how it finds the duplicates, look at the
source
code for the .uniq method as a starting point.

Hi,

As always, there are many ways to do this.

A highlevel approach: (short but not really inefficient)

x = [1, 2, 3, 2, 4, 5, 4, 4]

a = x.each_with_index.group_by(&:first).inject({}) do |result, (val,
group)|
next result if group.length == 1
result.merge val => group.map {|pair| pair[1]}
end

p a

A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.

Jan E. wrote in post #1057484:

A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.

… like this:

x = [1, 2, 3, 2, 4, 5, 4, 4]

duplicates = {}
x.each_with_index do |value, i|
(i + 1).upto x.length - 1 do |j|
if x[j] == value
duplicates[value] = [i] if duplicates[value].nil?
duplicates[value] << j
break
end
end
end

p duplicates

On 20/04/12 09:53, newto ruby wrote:

array.

Any suggestions?

Thanks

How about;

1.9.3p125 :001 > foo = [1,2,3,1,3]
=> [1, 2, 3, 1, 3]
1.9.3p125 :002 > dup_indices = Hash.new {|h,k| h[k]=[]}
=> {}
1.9.3p125 :003 > foo.each_index {|i| dup_indices[foo[i]] << i unless 1
== foo.count(foo[i])}
=> [1, 2, 3, 1, 3]
1.9.3p125 :004 > dup_indices
=> {1=>[0, 3], 3=>[2, 4]}

Sam

I do a lot of indexing of data. Usually they’re multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it’s a slightly
different way to do it in the event an index of all entries would be
useful for lookups.

data = [1, 2, 3, 1, 3, 4, 6, 5, 4]
dataIndex = {}

data.each_with_index do |value, indx|

index_key = value
index_value = indx

if !dataIndex.has_key?(index_key)
dataIndex.store(index_key, [])
end

dataIndex[index_key].push(index_value)
end

p dataIndex

dataIndex.each do |key, indxs|
if indxs.count > 1
# do something
puts “#{key} appears at #{indxs.join(’, ')}”
end
end

– gw

On 04/19/2012 11:53 PM, newto ruby wrote:

Hi, I have following array and I am trying to find if there are
duplicates in the array. I also want to find out their index.

Generate a hash with unique items as keys, and arrays of indices as
values,
then select only those with more than one index:

array = [1, 2, 3, 4, 5, 3, 6, 7, 2, 8, 1, 9]

array.each_with_index.reduce({}) { |hash, (item, index)|
hash[item] = (hash[item] || []) << index
hash
}.select { |key, value|
value.size > 1
}

=> {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}

Greg W. wrote in post #1057552:

I do a lot of indexing of data. Usually they’re multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it’s a slightly
different way to do it in the event an index of all entries would be
useful for lookups.

DOH. That first loop only needed to be:

data.each_with_index do |value, indx|
if !dataIndex.has_key?(key)
dataIndex.store(key, [])
end
dataIndex[key].push(value)
end

That’s what I get for doing a quick transcribe of project code.

– gw

On Apr 20, 2012, at 03:46 , Lars H. wrote:

hash
}.select { |key, value|
value.size > 1
}

=> {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}

Don’t use reduce/inject for non-reductive applications. Use something
more appropriate, like a plain each.

Also use Hash to its full capabilities:

hash = Hash.new { |h,k| h[k] = [] }

then it is as clean as:

array.each_with_index do |val, idx|
hash[val] << idx
end

Notice how you’re not constantly re-assigning hash for no good reason in
my version? That adds up, but more importantly it obfuscates the
original intent.

OK folks,

news from the service department. Here’s the shootout:

Turns out quickest is one of the less arcane solutions:

  dups = {}

  dat.each_with_index do |val, idx|
    (dups[val] ||= []) << idx
  end

  dups.delete_if {|k,v| v.size == 1}

A few remarks:

It’s amazing how many forms even the same algorithm can take in Ruby.
Just the different options for adding to a Hash (#default_proc,
#fetch, ||=) can make a huge visual difference although the underlying
strategy of all these algorithms is the same.

Downside of Jan’s first approach is that it has effort O(n**2) in
terms of elements in the array because of the nested iteration. Sam’s
approach has a similar quality although less obvious (it’s in the
#count).

Allocating multiple Hash instances is what makes Jan’s second approach
slow.

As always, #inject (or #reduce) approaches are slower than others.

The best strategy is to create a Hash with elements as keys and Arrays
of indexes as values. This approach won’t suffer for large inputs
from the O(n**2) issue.

For one off scripts and infrequent execution it doesn’t really matter
much. :slight_smile:

Kind regards

robert