Hi,
I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.
e.g.
a=[1,2,3,1,3]
Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.
Any suggestions?
Thanks
Hi,
I have following array and I am trying to find if there are duplicates
in the array. I also want to find out their index.
e.g.
a=[1,2,3,1,3]
Here 1 is a duplicate which comes twice and its index is 0 and 3 in the
array.
Any suggestions?
Thanks
I’m curious what the purpose is. If it’s simply to remove the duplicates
you can call .uniq on the array to remove duplicates.
I dont have to remove the duplicates but just need to find them with
their index
If you really want to know how it finds the duplicates, look at the
source
code for the .uniq method as a starting point.
Hi,
As always, there are many ways to do this.
A highlevel approach: (short but not really inefficient)
x = [1, 2, 3, 2, 4, 5, 4, 4]
a = x.each_with_index.group_by(&:first).inject({}) do |result, (val,
group)|
next result if group.length == 1
result.merge val => group.map {|pair| pair[1]}
end
p a
A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.
Jan E. wrote in post #1057484:
A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.
… like this:
x = [1, 2, 3, 2, 4, 5, 4, 4]
duplicates = {}
x.each_with_index do |value, i|
(i + 1).upto x.length - 1 do |j|
if x[j] == value
duplicates[value] = [i] if duplicates[value].nil?
duplicates[value] << j
break
end
end
end
p duplicates
On 20/04/12 09:53, newto ruby wrote:
array.
Any suggestions?
Thanks
How about;
1.9.3p125 :001 > foo = [1,2,3,1,3]
=> [1, 2, 3, 1, 3]
1.9.3p125 :002 > dup_indices = Hash.new {|h,k| h[k]=[]}
=> {}
1.9.3p125 :003 > foo.each_index {|i| dup_indices[foo[i]] << i unless 1
== foo.count(foo[i])}
=> [1, 2, 3, 1, 3]
1.9.3p125 :004 > dup_indices
=> {1=>[0, 3], 3=>[2, 4]}
Sam
I do a lot of indexing of data. Usually they’re multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it’s a slightly
different way to do it in the event an index of all entries would be
useful for lookups.
data = [1, 2, 3, 1, 3, 4, 6, 5, 4]
dataIndex = {}
data.each_with_index do |value, indx|
index_key = value
index_value = indx
if !dataIndex.has_key?(index_key)
dataIndex.store(index_key, [])
end
dataIndex[index_key].push(index_value)
end
p dataIndex
dataIndex.each do |key, indxs|
if indxs.count > 1
# do something
puts “#{key} appears at #{indxs.join(’, ')}”
end
end
– gw
On 04/19/2012 11:53 PM, newto ruby wrote:
Hi, I have following array and I am trying to find if there are
duplicates in the array. I also want to find out their index.
Generate a hash with unique items as keys, and arrays of indices as
values,
then select only those with more than one index:
array = [1, 2, 3, 4, 5, 3, 6, 7, 2, 8, 1, 9]
array.each_with_index.reduce({}) { |hash, (item, index)|
hash[item] = (hash[item] || []) << index
hash
}.select { |key, value|
value.size > 1
}
Greg W. wrote in post #1057552:
I do a lot of indexing of data. Usually they’re multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it’s a slightly
different way to do it in the event an index of all entries would be
useful for lookups.
DOH. That first loop only needed to be:
data.each_with_index do |value, indx|
if !dataIndex.has_key?(key)
dataIndex.store(key, [])
end
dataIndex[key].push(value)
end
That’s what I get for doing a quick transcribe of project code.
– gw
On Apr 20, 2012, at 03:46 , Lars H. wrote:
hash
}.select { |key, value|
value.size > 1
}=> {1=>[0, 10], 2=>[1, 8], 3=>[2, 5]}
Don’t use reduce/inject for non-reductive applications. Use something
more appropriate, like a plain each.
Also use Hash to its full capabilities:
hash = Hash.new { |h,k| h[k] = [] }
then it is as clean as:
array.each_with_index do |val, idx|
hash[val] << idx
end
Notice how you’re not constantly re-assigning hash for no good reason in
my version? That adds up, but more importantly it obfuscates the
original intent.
OK folks,
news from the service department. Here’s the shootout:
Turns out quickest is one of the less arcane solutions:
dups = {}
dat.each_with_index do |val, idx|
(dups[val] ||= []) << idx
end
dups.delete_if {|k,v| v.size == 1}
A few remarks:
It’s amazing how many forms even the same algorithm can take in Ruby.
Just the different options for adding to a Hash (#default_proc,
#fetch, ||=) can make a huge visual difference although the underlying
strategy of all these algorithms is the same.
Downside of Jan’s first approach is that it has effort O(n**2) in
terms of elements in the array because of the nested iteration. Sam’s
approach has a similar quality although less obvious (it’s in the
#count).
Allocating multiple Hash instances is what makes Jan’s second approach
slow.
As always, #inject (or #reduce) approaches are slower than others.
The best strategy is to create a Hash with elements as keys and Arrays
of indexes as values. This approach won’t suffer for large inputs
from the O(n**2) issue.
For one off scripts and infrequent execution it doesn’t really matter
much. ![]()
Kind regards
robert
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs