A highlevel approach: (short but not really inefficient)
x = [1, 2, 3, 2, 4, 5, 4, 4]
a = x.each_with_index.group_by(&:first).inject({}) do |result, (val,
group)|
next result if group.length == 1
result.merge val => group.map {|pair| pair[1]}
end
p a
A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.
A more efficient lowlevel solution would be to iterate over the array,
compare the values and save the indices in case of duplicates.
… like this:
x = [1, 2, 3, 2, 4, 5, 4, 4]
duplicates = {}
x.each_with_index do |value, i|
(i + 1).upto x.length - 1 do |j|
if x[j] == value
duplicates[value] = [i] if duplicates[value].nil?
duplicates[value] << j
break
end
end
end
I do a lot of indexing of data. Usually they’re multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it’s a slightly
different way to do it in the event an index of all entries would be
useful for lookups.
data = [1, 2, 3, 1, 3, 4, 6, 5, 4]
dataIndex = {}
data.each_with_index do |value, indx|
index_key = value
index_value = indx
if !dataIndex.has_key?(index_key)
dataIndex.store(index_key, [])
end
dataIndex[index_key].push(index_value)
end
p dataIndex
dataIndex.each do |key, indxs|
if indxs.count > 1
# do something
puts “#{key} appears at #{indxs.join(’, ')}”
end
end
I do a lot of indexing of data. Usually they’re multi-field records, and
each data set will have multiple indexes. The technique I use is below.
This indexes all elements of the array. So, then to find duplicates, you
have to loop again to find array counts > 1. Anyway, it’s a slightly
different way to do it in the event an index of all entries would be
useful for lookups.
DOH. That first loop only needed to be:
data.each_with_index do |value, indx|
if !dataIndex.has_key?(key)
dataIndex.store(key, [])
end
dataIndex[key].push(value)
end
That’s what I get for doing a quick transcribe of project code.
Don’t use reduce/inject for non-reductive applications. Use something
more appropriate, like a plain each.
Also use Hash to its full capabilities:
hash = Hash.new { |h,k| h[k] = [] }
then it is as clean as:
array.each_with_index do |val, idx|
hash[val] << idx
end
Notice how you’re not constantly re-assigning hash for no good reason in
my version? That adds up, but more importantly it obfuscates the
original intent.
news from the service department. Here’s the shootout:
Turns out quickest is one of the less arcane solutions:
dups = {}
dat.each_with_index do |val, idx|
(dups[val] ||= []) << idx
end
dups.delete_if {|k,v| v.size == 1}
A few remarks:
It’s amazing how many forms even the same algorithm can take in Ruby.
Just the different options for adding to a Hash (#default_proc, #fetch, ||=) can make a huge visual difference although the underlying
strategy of all these algorithms is the same.
Downside of Jan’s first approach is that it has effort O(n**2) in
terms of elements in the array because of the nested iteration. Sam’s
approach has a similar quality although less obvious (it’s in the #count).
Allocating multiple Hash instances is what makes Jan’s second approach
slow.
As always, #inject (or #reduce) approaches are slower than others.
The best strategy is to create a Hash with elements as keys and Arrays
of indexes as values. This approach won’t suffer for large inputs
from the O(n**2) issue.
For one off scripts and infrequent execution it doesn’t really matter
much.
Kind regards
robert
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.