Hash access

dubstep · December 30, 2011, 6:38am

Hi, was playing around with an idea after reading the thread about
defining
#hash. My understanding was that #hash gives a unique identifier, and
that
#eql? allows the hash to determine whether the two objects are equal in
terms of being the same hash key. So I wrote some code that should take
an
equivalent instance, or a string for quick access. But it behaves in a
way
that I completely don’t understand. Hoping someone can help:

User = Struct.new :name, :age, :identifier do
def hash
name.hash
end

def eql?(other)
puts “#{name} was asked if they were equal to #{other.inspect}”
(other == name) || (other.name == name && other.age == age)
end
end

josh = User.new ‘Josh’, 28, ‘first Josh’
hash = {josh => josh}

hash[josh] # => #<struct User
name=“Josh”,
age=28, identifier=“first Josh”>
hash[User.new ‘Josh’, 28, ‘second Josh’] # => #<struct User
name=“Josh”,
age=28, identifier=“first Josh”>
hash[‘Josh’] # => nil

>> Josh was asked if they were equal to #<struct User name=“Josh”,

age=28, identifier=“first Josh”>

So I would have expected all three to go through eql? Instead, we see
that
only the case where the key was the same object goes through. However,
it
identifies that the second Josh is the same key, without invoking
User#eql?
How does it do this?

And why does the string “Josh” not find the instance?

This is all probably in my copy of the Pickaxe, but it’s in Chicago and
I’m
out of town

jcain · December 30, 2011, 8:42am

Hi Josh,

Here is how Hash in Ruby works when it tries to determine if two keys
are equal:

the #hash method on both objects are called to calculate their hash
codes
if their hash codes are not equal, they are not equal
if their hash codes are equal, then #== is called to determine if
two objects are equal

In your example, all three objects actually return the same hash
codes, so #== (instead of eql?) is used to check their equality.

The “first Josh” and the “second Josh” are equal because their #==
(inherited from Object#==) simply calls #eql? which you have
overridden to make them equal.

The “first Josh” is not equal to “Josh” because they are of different
classes, and User#== (inherited from Object#==) does not allow objects
of different classes to be equal.

As a side note: you should always define #hash and #== together and
make sure whenever #== returns true #hash mush return the same number,
otherwise, using these objects as hash keys will break the hash
semantics. Also, avoid using mutable objects as hash keys unless their
#hash number is immutable.

I hope this helps

jcain · December 30, 2011, 10:00am

On Fri, Dec 30, 2011 at 1:41 AM, Yong Li [email protected] wrote:

codes, so #== (instead of eql?) is used to check their equality.
make sure whenever #== returns true #hash mush return the same number,

#hash. My understanding was that #hash gives a unique identifier, and
def hash
hash = {josh => josh}

This is all probably in my copy of the Pickaxe, but it’s in Chicago and
I’m
out of town

I see. The confusion for me was that the comparison goes in the other
direction. (ie hash[“Josh”] turns into “Josh”.eql?(#<struct User …>)
but
I was thinking it would be #<struct User …>.eql?(“Josh”)). This
becomes
apparent if I change the log line to puts "#{inspect} was asked if they were #eql? to #{other.inspect}" I just didn’t do that in the name of
brevity, and it masked the discrepancy.

So it’s probably implemented something like this (ignoring nuances like
collisions and default values)

expectation

class Hash
def
potential_key, potential_value = at_hash key.hash
return potential_value if potential_key.eql? key
end
end

actual

class Hash
def
potential_key, potential_value = at_hash key.hash
return potential_value if key.equal? potential_key
return potential_value if key.eql? potential_key
end
end

jcain · December 30, 2011, 11:03pm

On Dec 30, 2011, at 00:59 , Josh C. wrote:

I see. The confusion for me was that the comparison goes in the other
direction. (ie hash[“Josh”] turns into “Josh”.eql?(#<struct User …>) but
I was thinking it would be #<struct User …>.eql?(“Josh”)). This becomes
apparent if I change the log line to puts "#{inspect} was asked if they were #eql? to #{other.inspect}" I just didn’t do that in the name of
brevity, and it masked the discrepancy.

The other confusion for you is insisting it is eql? instead of ==. Yong
Li nailed the description of how it works. Please read it again. It is
as close to perfect as we’re going to get.

Also, as pointed out on the other hash thread… There needs to be a
1:1 correlation between the result of #== and the result of #hash. You
cannot simply use the “most relevant attribute”. You must use all
the attributes that you use against equality tests. Doing this is
fundamental to ruby (and computer science) and must be thoroughly
understood.

jcain · December 31, 2011, 2:23am

On Dec 30, 2011, at 15:05 , Gary W. wrote:

I think there are some errors and/or misleading statements in this
discussion.

First of all, the implementation of Hash depends on testing the
equality of two objects via #eql? and not via #==. This is easy
to see by using 1 and 1.0 in a hash:

You’re right. I really should have something that prevents me from even
opening my email until it detects I’ve had my second espresso. I wonder
if that can be done without a blood sample.

jcain · December 31, 2011, 12:05am

On Dec 30, 2011, at 4:59 PM, Ryan D. wrote:

The other confusion for you is insisting it is eql? instead of ==. Yong Li
nailed the description of how it works. Please read it again. It is as close to
perfect as we’re going to get.

Also, as pointed out on the other hash thread… There needs to be a 1:1
correlation between the result of #== and the result of #hash. You cannot simply
use the “most relevant attribute”. You must use all the attributes that you
use against equality tests. Doing this is fundamental to ruby (and computer
science) and must be thoroughly understood.

I think there are some errors and/or misleading statements in this
discussion.

First of all, the implementation of Hash depends on testing the
equality of two objects via #eql? and not via #==. This is easy
to see by using 1 and 1.0 in a hash:

1.hash #=> 3943323080027384908
(1.0).hash #=> -6757032739833615
1 == 1.0 #=> true
1.eql?(1.0) #=> false
h = {} #=> {}
h[1] = ‘a’ #=> “a”
h[1.0] = ‘b’ #=> “b”
h #=> {1=>“a”, 1.0=>“b”}

If #== was being used by Hash, the hash at the end of that sequence
would only have one entry with a key of 1.0.

I don’t think it is correct to call the relationship between eql?
and == to be one-to-one.

(a == b) implies (a.hash == b.hash)

but the reverse is not true.

(a.hash == b.hash) does not imply (a == b)

If two objects have the same hash, they may or may not be equal.
If they aren’t equal, you just have a hash collision that has to
be disambiguated by doing a full equality test via eql?.

Finally, there is no hard requirement that a hash implementation
‘must use all the attributes’ used for the equality test. If there
is a subset of attributes that are generally different for non-equal
objects then the hash function will be more performant if it only
uses the subset of attributes.

The important point is that you don’t want your hash function to
create too many collisions where non-equal objects have the same
hash function.

For example:

def hash; 1; end

will ‘work’ but will cause performance problems when those objects
are stored in a Hash:

require ‘benchmark’

class A; end
class B; def hash; 1; end; end

n = 10000;

Benchmark.bm(20) do |x|
x.report(‘Object#hash’) { h = {}; n.times { |i| h[A.new] = i }; }
x.report(‘1’) { h = {}; n.times { |i| h[B.new] = i }; }
end

                       user     system      total        real

Object#hash 0.010000 0.000000 0.010000 ( 0.008124)
1 2.300000 0.000000 2.300000 ( 2.311228)

Gary W.