Determining uniqueness on a single array element

I am loading file names and mtimes into an array and then putting that
array inside an outer array. I have run into the situation where the
same file sometimes exists in different places in the file system and
occasionally with a different file name.

I need to ensure that I process the contents of each file only once.
So, in addition to the two elements originally captured I now create an
MD5 hexdigest of the file contents: [ f.mtime, f.name, f.hexdigest ]
and store that.

Now I wish to ensure that each distinct hexdigest is processed but once.
I can do this:

hex_array = []
outer_array.each do |inner_array|
next if hex_array.include?( inner_array[2] )
hex_array << inner_array[2]
. . .

I wonder if there is a better way? Any suggestions?

On Feb 4, 2011, at 4:58 PM, James B. wrote:

hex_array = []
outer_array.each do |inner_array|
next if hex_array.include?( inner_array[2] )
hex_array << inner_array[2]
. . .

This assumes Ruby 1.9.2 where Array#uniq takes a block:

outer_array.uniq { |mtime, name, md5| md5 }.do |mtime, name, md5|
# do stuff here
end

Gary W.

James B. wrote in post #979694:

Now I wish to ensure that each distinct hexdigest is processed but once.
I can do this:

hex_array = []
outer_array.each do |inner_array|
next if hex_array.include?( inner_array[2] )
hex_array << inner_array[2]
. . .

I wonder if there is a better way? Any suggestions?

(1) auto-splat to avoid the [2] magic index

outer_array.each do |mtime, name, hexdigest|

(2) Use a hash, rather than an array, to record ones you’ve processed.
This avoids a linear search on every iteration

seen = {}
outer_array.each do |mtime, name, hexdigest|
next if seen[hexdigest]
seen[hexdigest] = true

end

Brian C. wrote in post #979745:

(1) auto-splat to avoid the [2] magic index

outer_array.each do |mtime, name, hexdigest|

(2) Use a hash, rather than an array, to record ones you’ve processed.
This avoids a linear search on every iteration

seen = {}
outer_array.each do |mtime, name, hexdigest|
next if seen[hexdigest]
seen[hexdigest] = true

end

Very nice. Thank you.