Comparing objects

andersonleite · June 11, 2010, 6:48pm

On 10.06.2010 18:27, Robert D. wrote:

On Thu, Jun 10, 2010 at 6:10 PM, Robert K.
[email protected] wrote:

http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html
I
You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why not redefining #hash and #eql? will
cause problems, because that was Wilson’s statement. I am still
waiting :(.

The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of
that class should be used as Hash keys or in Set. If not at least
equivalence can be defined other than via identity (which is the
default) then it is perfectly reasonable to not override both methods
and go with the default implementation.

Kind regards

robert

andersonleite · June 11, 2010, 7:41pm

Robert K.:

2010/6/11 Shot (Piotr S.) [email protected]:

The difference is that in many, many cases itâ€™s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether theyâ€™re surely the same (#eql? can be quite slow).

This is not necessarily true. Any reasonable implementation
of #eql? will bail out as soon as it sees a difference.

Sure, reasonable implementations of #eql? will test object properties in
the decreasing order of probability of a given property being different
between two objects (and bail out as soon as possible), but there are
cases where a fast #hash might be useful (partly immutable objects
which cache the hash based on the immutable parts, perhaps?), exactly
because it doesnâ€™t have to reliably tell whether two objects are surely
the same (just whether they surely differ).

I agree I mightâ€™ve went over the top with the â€˜many, manyâ€™
remark, though (but then I did not say â€˜mostâ€™, just â€˜manyâ€™â€¦).

On the contrary, you always need to look at the
complete state of an instance to calculate #hash.

This is definitely not true; you only should consider the parts
that differentiate two objects of a given class most often, but
you definitely do not â€˜needâ€™ to look at the complete state (even
a constant #hash is valid, albeit quite useless).

The whole point of #hash is that it acts only as a hint whether two
objects are â€˜the sameâ€™ â€“ itâ€™s your choice how credible vs how performant
it needs to be. At the same time, #eql? has to be 100% credible
(although youâ€™re right that it can take many of the same shortcuts
a given #hash takes, and that there are cases where #hash can be slower
than #eql?, as in your Array example, but itâ€™s just because you want
that #hash to depend on the complete state of an instance).

I can easily construct an example where #eql? beats #hash: [â€¦]
Notice also how #eql? with equal arrays is not much slower than #hash.

Sure, because Array#hash is implemented in the way you describe (its
hash depends on all of its elements). I was pointing out that there
are cases where it doesnâ€™t make sense to implement #hash like this,
and having both #hash and #eql? gives you more control and more choices.

class ImmutableString

Â def initialize string
Â Â @string = string.dup.freeze
Â Â @length = string.length
Â end

Â def hash
Â Â @length
Â end

Â alias eql? ==

end

Bad hash implementation. Why don’t you use String#hash?

Because String#hash depends on the contents of the string and is
recomputed every time, while in this particular scenario (where the
vast majority of very long strings differ in length) it might be faster
to refer to the cached length. Of course with immutable strings you
probably should just cache the hash, but I made the example immutable
to not have to add that @length needs to be recomputed on mutations
(I was also quite explicit that this is not an optimal example, just
a simple one).

Of course in this case a sane #eql? implementation would also bail out
as soon as the lengths differ, but my point was that #hash doesnâ€™t have
to be credible on whether two objects really differ, while #eql? has
to, so in many cases #eql? has to start with checking all the properties
that #hash value depends upon anyway (but Array#eql? and Array#hash are
a good counterexample where such checks can bail out faster), plus it
often should check the class of ther â€˜otherâ€™ as well (which is quick,
but one more check nevertheless).

If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection.

Sure, but I assume itâ€™s not a very common situation; Iâ€™d think twice
before I designed an object with different â€˜equalityâ€™ semantics. On
the other hand, crafting your own #==, #hash and #eql? is quite common
(at least I do it very often, because I often end up storing my objects
in Sets).

Note also that I was explicitely replying to the remark that itâ€™s
â€˜odd that both [#hash and #eql?] are necessaryâ€™, not to the OP.

â€” Shot

andersonleite · June 11, 2010, 8:16pm

On Fri, Jun 11, 2010 at 6:47 PM, Robert K.
[email protected] wrote:

You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why not redefining #hash and #eql? will
cause problems, because that was Wilson’s statement. I am still
waiting :(.

The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of that
class should be used as Hash keys or in Set. If not at least equivalence
can be defined other than via identity (which is the default) then it is
perfectly reasonable to not override both methods and go with the default
implementation.
But that was exactly my point.

OP wanted to use Array#&, and Array#&, for a reason not too clear to
me, uses Object#eql? instead of Object#== I did discourage the
overloading of Object#eql? and Object#hash for that purpose.

If you want to change Hash then it is the right thing to do.
Now I might strongly disagree about if one should do that, but that is
rather OT and I would never have made such strong statements about
that issue.
However the technique you suggest is not to be put into non expert
hands as I tried to show with the memory leaking code above.

Cheers
Robert

andersonleite · June 11, 2010, 9:15pm

On 6/11/10, Robert D. [email protected] wrote:

OP wanted to use Array#&, and Array#&, for a reason not too clear to
me, uses Object#eql? instead of Object#== I did discourage the
overloading of Object#eql? and Object#hash for that purpose.

Array#& uses eql? instead of == because internally, it works something
like this:

class Array
def &(other)
h1={}
other.each{|x| h1[x]=true}
select{|x| h1[x] }
end
end

In other words, it creates a (hash) index to get a speedup. (From
O(M*N) to O(M+N).)

andersonleite · June 12, 2010, 10:55am

On 06/11/2010 08:15 PM, Robert D. wrote:

You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why not redefining #hash and #eql? will
cause problems, because that was Wilson’s statement. I am still
waiting :(.
The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of that
class should be used as Hash keys or in Set. If not at least equivalence
can be defined other than via identity (which is the default) then it is
perfectly reasonable to not override both methods and go with the default
implementation.
But that was exactly my point.

I don’t think we disagree, nor do I argue with you. I just posted blog
links as illustration to Rein’s point about how to implement those
methods.

Kind regards

robert

andersonleite · June 12, 2010, 11:02am

On Sat, Jun 12, 2010 at 10:55 AM, Robert K.
[email protected] wrote:

I don’t think we disagree, nor do I argue with you. I just posted blog
links as illustration to Rein’s point about how to implement those methods.

Forgive my confusion then.
Cheers
Robert

andersonleite · June 17, 2010, 4:22pm

On 12.06.2010 11:01, Robert D. wrote:

On Sat, Jun 12, 2010 at 10:55 AM, Robert K.
[email protected] wrote:

I don’t think we disagree, nor do I argue with you. I just posted blog
links as illustration to Rein’s point about how to implement those methods.

Forgive my confusion then.

No problem. I think I fueled it by not including a comment in the
original posting. Sorry for that.

Kind regards

robert

andersonleite · June 11, 2010, 9:56pm

On Fri, Jun 11, 2010 at 9:11 PM, Caleb C. [email protected]
wrote:
I see, thanx