Comparing objects

andersonleite · June 7, 2010, 12:04am

Hi,

Just wanting to add my thoughts about this (I made a thread about this a
few
months ago).

I searched a bit and concluded this:

Array methods using comparison

with #hash and #eql?
&, |, uniq(!), -
with #==
include?, ®assoc, count, delete, (r,find_)index
(please say me if I forgot one)

I think Array methods should never have to look at #hash and #eql?
methods.
I suppose this is done for performance.

I think this should change, because:

it violates POLS
it can make unexpected behavior because you defined #hash and #eql? ,
for
objects which should not need that (when you manage objects in an Array,
you
do not expect to need to think about Hash’s keys).
it is not consistent with other Array’s methods

PS: Rein: I saw your implementation of #hash. I think to “add one” is
useless, because #eql? is always used (so even if #hash was always the
same,
it would work). It could maybe speed up a bit, but only if you have a
lot of
comparison of User and User’s instances, which is very unlikely.

andersonleite · June 7, 2010, 2:40am

I agree that using eql? and hash for some methods is surprising. I
would not mind seeing this changed to at least just eql?. == seems too
general for the semantics of these methods.

On 2010-06-06 15:03:30 -0700, Benoit D. said:

PS: Rein: I saw your implementation of #hash. I think to “add one” is
useless, because #eql? is always used (so even if #hash was always the same,
it would work). It could maybe speed up a bit, but only if you have a lot of
comparison of User and User’s instances, which is very unlikely.

Without the + 1, User.new(’’).hash == User.hash. I don’t like the
existance of a bug even if I don’t yet have a way to exercise it in my
code.

andersonleite · June 7, 2010, 3:00am

I think that this is now well outside the scope of the original topic,
so I will briefly say that implementing a semantically appropriate
#eql? and #hash on your class is the way to make them behave as
expected when used as hash keys and with methods like Array#&, honoring
the principle of least surprise. It is as normal (and useful) as
implementing #== for simple comparisons.

I hope that other Rubyists that may stumble upon this thread will take
Robert’s FUD with a grain of salt and will feel free to determine the
usefulness and any potential dangers of implementing #eql? and #hash –
along with other Ruby idioms like #each (for Enumerable) and #<=> (for
Comparable) – on their own. An ounce of critical thinking is better
than a pound of dogma.

andersonleite · June 10, 2010, 4:00pm

On Thu, Jun 10, 2010 at 3:41 PM, Wilson B. [email protected]
wrote:
Even if

your code lives in isolation, ensuring proper semantics via these
methods prevents a class of tricky bug that your successors may have
to deal with.
Hmm? Would you care to show an example where overloading those methods
(#eql? and #hash) is needed to ensure proper behavior? I am willing to
learn. But I am not willing to accept this statement as such.
Cheers
R.

andersonleite · June 10, 2010, 3:42pm

On Sun, Jun 6, 2010 at 9:00 PM, Rein H. [email protected] wrote:>

I hope that other Rubyists that may stumble upon this thread will take
Robert’s FUD with a grain of salt and will feel free to determine the
usefulness and any potential dangers of implementing #eql? and #hash –
along with other Ruby idioms like #each (for Enumerable) and #<=> (for
Comparable) – on their own. An ounce of critical thinking is better than a
pound of dogma.

Let me be clear, People of the Future: implement eql?, ===, and hash
on your own classes as appropriate. Doing so is the proper way to
allow your code to interact with other libraries and coders. Even if
your code lives in isolation, ensuring proper semantics via these
methods prevents a class of tricky bug that your successors may have
to deal with.

andersonleite · June 10, 2010, 4:15pm

Mark A. wrote:

[tl;dr]

Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.

andersonleite · June 10, 2010, 4:07pm

Benoit D. wrote:

I searched a bit and concluded this:

Array methods using comparison

with #hash and #eql?
&, |, uniq(!), -

with #==
include?, (r)assoc, count, delete, (r,find_)index
(please say me if I forgot one)

I think Array methods should never have to look at #hash and #eql?
methods.
I suppose this is done for performance.

I think this should change, because:

it violates POLS

it can make unexpected behavior because you defined #hash and #eql? ,
for
objects which should not need that (when you manage objects in an Array,
you
do not expect to need to think about Hash’s keys).

it is not consistent with other Array’s methods

For me it doesn’t work anyway.
Unsure how to paste code here, you could see an example here:
http://pastie.org/999353
I am still like “WTF?”

$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin9.8.0]

andersonleite · June 10, 2010, 5:31pm

On 2010-06-10 07:20:03 -0700, Mark A. said:

Mark A. wrote:

[tl;dr]

Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.

#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.

andersonleite · June 10, 2010, 5:30pm

On 2010-06-10 06:59:40 -0700, Robert D. said:

On Thu, Jun 10, 2010 at 3:41 PM, Wilson B. [email protected] wrote:
Even if

your code lives in isolation, ensuring proper semantics via these
methods prevents a class of tricky bug that your successors may have
to deal with.
Hmm? Would you care to show an example where overloading those methods
(#eql? and #hash) is needed to ensure proper behavior? I am willing to
learn. But I am not willing to accept this statement as such.
Cheers
R.

You have been presented with one in this very thread. The OP wants
objects of his class to have the correct semantics for Array#& and
Hash#[], etc. The correct answer is to implement #hash and #eql?, just
as implementing <=> provides objects of his class with the correct
semantics for Array#sort.

andersonleite · June 10, 2010, 5:55pm

Rein H. wrote:

On 2010-06-10 07:20:03 -0700, Mark A. said:

Mark A. wrote:

[tl;dr]

Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.

#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.

If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby P.ming Language book.

Thus, when you redefine eql?, the hash methods also should be redefined.

andersonleite · June 10, 2010, 5:57pm

Marcin W. wrote:

Rein H. wrote:

On 2010-06-10 07:20:03 -0700, Mark A. said:

Mark A. wrote:

[tl;dr]

Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.

#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.

If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby P.ming Language book.

Thus, when you redefine eql?, the hash methods also should be redefined.

Well, it doesn’t say much in core api

andersonleite · June 10, 2010, 6:10pm

On 06/10/2010 05:27 PM, Rein H. wrote:

learn. But I am not willing to accept this statement as such.
Cheers
R.

You have been presented with one in this very thread. The OP wants
objects of his class to have the correct semantics for Array#& and
Hash#[], etc. The correct answer is to implement #hash and #eql?, just
as implementing <=> provides objects of his class with the correct
semantics for Array#sort.

See also
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html

Cheers

robert

andersonleite · June 10, 2010, 6:27pm

On Thu, Jun 10, 2010 at 6:10 PM, Robert K.
[email protected] wrote:

http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html
I
You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why not redefining #hash and #eql? will
cause problems, because that was Wilson’s statement. I am still
waiting :(.

Cheers
R.

andersonleite · June 10, 2010, 6:19pm

On Thu, Jun 10, 2010 at 5:30 PM, Rein H. [email protected] wrote:

On 2010-06-10 06:59:40 -0700, Robert D. said:
You have been presented with one in this very thread. The OP wants objects
of his class to have the correct semantics for Array#& and Hash#[], etc. The
correct answer is to implement #hash and #eql?, just as implementing <=>
provides objects of his class with the correct semantics for Array#sort.
I guess you really do not know what I was talking about? Or do you
just repeat the same stuff over and over again in order to convince
me?
overwriting #hash and #eql? breaks Hash! Why the hack should OP’s
usecase justify this?
And it does not answer my question. Where would I like that Hash
behaves accordingly to the redefined #eql? and #hash. And BTW I asked
Wilson, did I not?
Cheers
Robert

andersonleite · June 11, 2010, 7:52am

On Thu, Jun 10, 2010 at 6:48 PM, Mark A. [email protected]
wrote:

Robert D. wrote:

On Thu, Jun 10, 2010 at 5:30 PM, Rein H. [email protected] wrote:
overwriting #hash and #eql? breaks Hash!
That’s not true, I think.
Judge for yourself

require “forwardable”

def count klass
ObjectSpace.each_object( klass ).to_a.size
end
class N
extend Forwardable
attr_reader :n
def_delegators :n, :hash
def eql? otha
n == otha.n
end
private
def initialize n
@n = n
end
end # class N

h = { N.new( 42 ) => true }
h[ N.new( 42 ) ] = 42
p h
GC.start
p count(N)

Cheers
R.

andersonleite · June 10, 2010, 6:48pm

Robert D. wrote:

On Thu, Jun 10, 2010 at 5:30 PM, Rein H. [email protected] wrote:
overwriting #hash and #eql? breaks Hash!
That’s not true, I think.

andersonleite · June 11, 2010, 8:15am

On 2010-06-10 22:52:14 -0700, Robert D. said:

ObjectSpace.each_object( klass ).to_a.size
@n = n
Cheers
R.

This breaks Hash? Quite the opposite!

This is precisely what is meant by “defining the semantics” of a class
for use by hashes and the very behavior you want when you define #eql?
and #hash in the first place!

You wouldn’t say that defining #<=> breaks Array#sort, so why would you
say that this “breaks Hash”? This doesn’t break Hash. If anything, it
fixes it when using N objects as keys!

andersonleite · June 11, 2010, 12:10pm

Rein H.:

#hash makes sense for Hash#[] and etc. #eql? makes more
sense for Array#&. I too find it odd that both are necessary.

Both are necessary because #eql? says whether two objects are surely
the same, while #hash says whether theyâ€™re surely different â€“ which,
perhaps counterintuitively, is not the same problem.

The difference is that in many, many cases itâ€™s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether theyâ€™re surely the same (#eql? can be quite slow).

The main difference betwen #eql? and #hash is that #hash can return the
same value for objects that are not #eql? (but if two objects are #eql?
then #hash must return the same value).

An untested, and definitely not optimal
(but hopefully simple) example follows.

Imagine that you want to implement a new immutable string class, one
which caches the string length (for performance reasons). Imagine also
that the vast majority of such strings you use are of different lenghts,
and that you want to use them as Hash keys.

class ImmutableString

def initialize string
@string = string.dup.freeze
@length = string.length
end

end

Given the above assumptions, it might make sense for #hash to
return the @length, while #eql? makes the â€˜properâ€™ comparison:

class ImmutableString

def hash
@length
end

alias eql? ==

end

This way in the vast majority of cases, when your ImmutableStrings will
be considered for Hash keys, the check whether a given key exists will
be very quick; only when two objects #hash to the same value (i.e.,
when theyâ€™re not surely different) the #eql? is called to tell whether
theyâ€™re surely the same.

â€” Shot

andersonleite · June 11, 2010, 6:29pm

On Jun 3, 4:29 pm, Anderson L. [email protected] wrote:

How can I compare two objects and get true if some of his atributes are
equals ?

include Comparable ?

Regards,

Dan

andersonleite · June 11, 2010, 3:25pm

2010/6/11 Shot (Piotr S.) [email protected]:

whether two objects are surely different (via a fast #hash function)
than whether they’re surely the same (#eql? can be quite slow).

This is not necessarily true. Any reasonable implementation of #eql?
will bail out as soon as it sees a difference. On the contrary, you
always need to look at the complete state of an instance to calculate
#hash. I can easily construct an example where #eql? beats #hash:

14:40:54 Temp$ ruby19 eql-test.rb
same
0.110000 0.000000 0.110000 ( 0.098000)
0.093000 0.000000 0.093000 ( 0.099000)
0.157000 0.000000 0.157000 ( 0.151000)
different early
0.093000 0.000000 0.093000 ( 0.101000)
0.094000 0.000000 0.094000 ( 0.096000)
0.000000 0.000000 0.000000 ( 0.000000)
different late
0.109000 0.000000 0.109000 ( 0.105000)
0.094000 0.000000 0.094000 ( 0.098000)
0.156000 0.000000 0.156000 ( 0.149000)
14:40:56 Temp$ cat eql-test.rb
require ‘benchmark’
a1 = Array.new 1_000_000
a2 = Array.new 1_000_000
puts “same”
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a1[0] = 1
a2[0] = 2
puts “different early”
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a2[0] = a1[0]
a2[999_999] = 1
puts “different late”
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
14:40:58 Temp$

Notice also how #eql? with equal arrays is not much slower than #hash.

and that you want to use them as Hash keys.

@length
Bad hash implementation. Why don’t you use String#hash?

be very quick; only when two objects #hash to the same value (i.e.,
when they’re not surely different) the #eql? is called to tell whether
they’re surely the same.

If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection. That’s why I suggested to use a Struct for
key fields (which has proper #hash and #eql? built in).

Kind regards

robert