Nuby - help with Ruby object references

unknown · February 24, 2006, 1:36am

I’m very new to Ruby (as in, just started yesterday). As a learning
exercise I decided to write a short program that would traverse a
directory tree and take note of all duplicate files in that tree.

I’m using a hash of arrays to track all references. The key is the
filename and I push the path as an array to store the value. If I run
across a filename for which a key already exists in the hash, I do a
deeper equality check to see if they are really the same file or if
they are different. The “deeper check” is comparing file sizes.

If they are the same, I push this new path into my array of arrays so
I can check against it if I find yet another file with that name. If
they’re different, I push this new path onto a SECOND hash of arrays.

This is where I have trouble. As soon as I push any value into this
second hash, it takes on the identity of the first hash. I don’t
understand why because I am not doing any explicit operation to make
hash1 = hash2. Maybe it’s a side effect of some other operation.
Anyway, enough talk… the code is below.

I appreciate any and all insight.

cr

— code here —

#!/usr/bin/env ruby

require ‘find’

h = Hash.new { |h,k| h[k] = [] }
duplicates = Hash.new { |h,k| h[k] = [] }

working_path = ARGV[0] || ENV[“PWD”]

Find.find(working_path) do |path|

if it’s a dir, skip to the next path

if File.directory?(path)
next
end
file = File.basename(path)

if this key doesn’t exist in the hash, add it

if h.has_key?(file) == false
h[file].push([path])
else # key already exists in hash
# add file size to hash unless it was already grabbed
h[file].push([path])
h[file].each do |subarray|
subarray[1] = File.size(subarray[0]) unless subarray[1]
end

 # now compare the current file's size to the prior check
 h[file].each do |subarray|
   puts "subarray[0] = #{subarray[0]} and path = #{path}"
   if subarray[0].eql?(path) == false && subarray[1] == File.size

(path)
# add to dupe hash
puts “DUPLICATE DUPLICATE DUPLICATE DUPLICATE”
puts “DUP BEFORE h.id = #{h.object_id} and duplicates.id = #
{duplicates.object_id}”
duplicates[file].push([path]) # at this point “h” and
“duplicates” refer to the same object!
puts “DUP AFTER h.id = #{h.object_id} and duplicates.id = #
{duplicates.object_id}”
end
end
end
end

puts “\n\nThe duplicates are…”
duplicates.each do |key, value|
puts “key = #{key}”
value.each do |a|
print "#{a[0]} #{a[1]} "
end
print “\n”
end

unknown · February 24, 2006, 2:57am

unknown wrote:

I appreciate any and all insight.

cr

— code here —

#!/usr/bin/env ruby

require ‘find’

h = Hash.new { |h,k| h[k] = [] }
duplicates = Hash.new { |h,k| h[k] = [] }

I am pretty sure this is where the problem is. Blocks
are closures so the ‘h’ in the second block refers to
the first Hash. Just change the first variable name
from ‘h’ to ‘hash’ in the whole file (it is a bit
clearer anyway) and you should be OK.

E

unknown · February 24, 2006, 3:55am

On Feb 23, 2006, at 7:58 PM, Eero S. wrote:

require ‘find’

h = Hash.new { |h,k| h[k] = [] }
duplicates = Hash.new { |h,k| h[k] = [] }

I am pretty sure this is where the problem is. Blocks
are closures so the ‘h’ in the second block refers to
the first Hash. Just change the first variable name
from ‘h’ to ‘hash’ in the whole file (it is a bit
clearer anyway) and you should be OK.

Okay, I did as you suggested and it works now. But…

I thought the scoping rules were different. I just checked pages 105
and 106 in the Pickaxe book to confirm the rules and they match up
with what you said. I’m clear on this now.

Thanks very much…

cr