Robert K.:
2010/6/11 Shot (Piotr S.) [email protected]:
The difference is that in many, many cases it’s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether they’re surely the same (#eql? can be quite slow).
This is not necessarily true. Any reasonable implementation
of #eql? will bail out as soon as it sees a difference.
Sure, reasonable implementations of #eql? will test object properties in
the decreasing order of probability of a given property being different
between two objects (and bail out as soon as possible), but there are
cases where a fast #hash might be useful (partly immutable objects
which cache the hash based on the immutable parts, perhaps?), exactly
because it doesn’t have to reliably tell whether two objects are surely
the same (just whether they surely differ).
I agree I might’ve went over the top with the ‘many, many’
remark, though (but then I did not say ‘most’, just ‘many’…). 
On the contrary, you always need to look at the
complete state of an instance to calculate #hash.
This is definitely not true; you only should consider the parts
that differentiate two objects of a given class most often, but
you definitely do not ‘need’ to look at the complete state (even
a constant #hash is valid, albeit quite useless).
The whole point of #hash is that it acts only as a hint whether two
objects are ‘the same’ – it’s your choice how credible vs how performant
it needs to be. At the same time, #eql? has to be 100% credible
(although you’re right that it can take many of the same shortcuts
a given #hash takes, and that there are cases where #hash can be slower
than #eql?, as in your Array example, but it’s just because you want
that #hash to depend on the complete state of an instance).
I can easily construct an example where #eql? beats #hash: […]
Notice also how #eql? with equal arrays is not much slower than #hash.
Sure, because Array#hash is implemented in the way you describe (its
hash depends on all of its elements). I was pointing out that there
are cases where it doesn’t make sense to implement #hash like this,
and having both #hash and #eql? gives you more control and more choices.
class ImmutableString
 def initialize string
  @string = string.dup.freeze
  @length = string.length
 end
 def hash
  @length
 end
 alias eql? ==
end
Bad hash implementation. Why don’t you use String#hash?
Because String#hash depends on the contents of the string and is
recomputed every time, while in this particular scenario (where the
vast majority of very long strings differ in length) it might be faster
to refer to the cached length. Of course with immutable strings you
probably should just cache the hash, but I made the example immutable
to not have to add that @length needs to be recomputed on mutations
(I was also quite explicit that this is not an optimal example, just
a simple one).
Of course in this case a sane #eql? implementation would also bail out
as soon as the lengths differ, but my point was that #hash doesn’t have
to be credible on whether two objects really differ, while #eql? has
to, so in many cases #eql? has to start with checking all the properties
that #hash value depends upon anyway (but Array#eql? and Array#hash are
a good counterexample where such checks can bail out faster), plus it
often should check the class of ther ‘other’ as well (which is quick,
but one more check nevertheless).
If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection.
Sure, but I assume it’s not a very common situation; I’d think twice
before I designed an object with different ‘equality’ semantics. On
the other hand, crafting your own #==, #hash and #eql? is quite common
(at least I do it very often, because I often end up storing my objects
in Sets).
Note also that I was explicitely replying to the remark that it’s
‘odd that both [#hash and #eql?] are necessary’, not to the OP. 
— Shot