Hi,
Just wanting to add my thoughts about this (I made a thread about this a
few
months ago).
I searched a bit and concluded this:
Array methods using comparison
- with #hash and #eql?
&, |, uniq(!), -
- with #==
include?, ®assoc, count, delete, (r,find_)index
(please say me if I forgot one)
I think Array methods should never have to look at #hash and #eql?
methods.
I suppose this is done for performance.
I think this should change, because:
- it violates POLS
- it can make unexpected behavior because you defined #hash and #eql? ,
for
objects which should not need that (when you manage objects in an Array,
you
do not expect to need to think about Hash’s keys).
- it is not consistent with other Array’s methods
PS: Rein: I saw your implementation of #hash. I think to “add one” is
useless, because #eql? is always used (so even if #hash was always the
same,
it would work). It could maybe speed up a bit, but only if you have a
lot of
comparison of User and User’s instances, which is very unlikely.
I agree that using eql? and hash for some methods is surprising. I
would not mind seeing this changed to at least just eql?. == seems too
general for the semantics of these methods.
On 2010-06-06 15:03:30 -0700, Benoit D. said:
PS: Rein: I saw your implementation of #hash. I think to “add one” is
useless, because #eql? is always used (so even if #hash was always the same,
it would work). It could maybe speed up a bit, but only if you have a lot of
comparison of User and User’s instances, which is very unlikely.
Without the + 1, User.new(’’).hash == User.hash. I don’t like the
existance of a bug even if I don’t yet have a way to exercise it in my
code.
I think that this is now well outside the scope of the original topic,
so I will briefly say that implementing a semantically appropriate
#eql? and #hash on your class is the way to make them behave as
expected when used as hash keys and with methods like Array#&, honoring
the principle of least surprise. It is as normal (and useful) as
implementing #== for simple comparisons.
I hope that other Rubyists that may stumble upon this thread will take
Robert’s FUD with a grain of salt and will feel free to determine the
usefulness and any potential dangers of implementing #eql? and #hash –
along with other Ruby idioms like #each (for Enumerable) and #<=> (for
Comparable) – on their own. An ounce of critical thinking is better
than a pound of dogma.
On Thu, Jun 10, 2010 at 3:41 PM, Wilson B. [email protected]
wrote:
Even if
your code lives in isolation, ensuring proper semantics via these
methods prevents a class of tricky bug that your successors may have
to deal with.
Hmm? Would you care to show an example where overloading those methods
(#eql? and #hash) is needed to ensure proper behavior? I am willing to
learn. But I am not willing to accept this statement as such.
Cheers
R.
On Sun, Jun 6, 2010 at 9:00 PM, Rein H. [email protected] wrote:>
I hope that other Rubyists that may stumble upon this thread will take
Robert’s FUD with a grain of salt and will feel free to determine the
usefulness and any potential dangers of implementing #eql? and #hash –
along with other Ruby idioms like #each (for Enumerable) and #<=> (for
Comparable) – on their own. An ounce of critical thinking is better than a
pound of dogma.
Let me be clear, People of the Future: implement eql?, ===, and hash
on your own classes as appropriate. Doing so is the proper way to
allow your code to interact with other libraries and coders. Even if
your code lives in isolation, ensuring proper semantics via these
methods prevents a class of tricky bug that your successors may have
to deal with.
Mark A. wrote:
[tl;dr]
Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.
Benoit D. wrote:
I searched a bit and concluded this:
Array methods using comparison
- with #hash and #eql?
&, |, uniq(!), -
- with #==
include?, (r)assoc, count, delete, (r,find_)index
(please say me if I forgot one)
I think Array methods should never have to look at #hash and #eql?
methods.
I suppose this is done for performance.
I think this should change, because:
- it violates POLS
- it can make unexpected behavior because you defined #hash and #eql? ,
for
objects which should not need that (when you manage objects in an Array,
you
do not expect to need to think about Hash’s keys).
- it is not consistent with other Array’s methods
For me it doesn’t work anyway.
Unsure how to paste code here, you could see an example here:
http://pastie.org/999353
I am still like “WTF?”
$ ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin9.8.0]
On 2010-06-10 07:20:03 -0700, Mark A. said:
Mark A. wrote:
[tl;dr]
Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.
#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.
On 2010-06-10 06:59:40 -0700, Robert D. said:
On Thu, Jun 10, 2010 at 3:41 PM, Wilson B. [email protected] wrote:
Even if
your code lives in isolation, ensuring proper semantics via these
methods prevents a class of tricky bug that your successors may have
to deal with.
Hmm? Would you care to show an example where overloading those methods
(#eql? and #hash) is needed to ensure proper behavior? I am willing to
learn. But I am not willing to accept this statement as such.
Cheers
R.
You have been presented with one in this very thread. The OP wants
objects of his class to have the correct semantics for Array#& and
Hash#[], etc. The correct answer is to implement #hash and #eql?, just
as implementing <=> provides objects of his class with the correct
semantics for Array#sort.
Rein H. wrote:
On 2010-06-10 07:20:03 -0700, Mark A. said:
Mark A. wrote:
[tl;dr]
Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.
#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.
If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby P.ming Language book.
Thus, when you redefine eql?, the hash methods also should be redefined.
Marcin W. wrote:
Rein H. wrote:
On 2010-06-10 07:20:03 -0700, Mark A. said:
Mark A. wrote:
[tl;dr]
Sorry, guys, didn’t notice how I used eql instead of eql?
Btw, without #hash it won’t work anyways which I consider weird at the
very least.
#hash makes sense for Hash#[] and etc. #eql? makes more sense for
Array#&. I too find it odd that both are necessary.
If two objects are set to be eql?, their hash methods must also return
the same value. More details in The Ruby P.ming Language book.
Thus, when you redefine eql?, the hash methods also should be redefined.
Well, it doesn’t say much in core api 
On 06/10/2010 05:27 PM, Rein H. wrote:
learn. But I am not willing to accept this statement as such.
Cheers
R.
You have been presented with one in this very thread. The OP wants
objects of his class to have the correct semantics for Array#& and
Hash#[], etc. The correct answer is to implement #hash and #eql?, just
as implementing <=> provides objects of his class with the correct
semantics for Array#sort.
See also
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html
Cheers
robert
On Thu, Jun 10, 2010 at 6:10 PM, Robert K.
[email protected] wrote:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html
I
You define #eql? and #hash for your convenience. So good, so bad. My
question simply was: Show my why not redefining #hash and #eql? will
cause problems, because that was Wilson’s statement. I am still
waiting :(.
Cheers
R.
On Thu, Jun 10, 2010 at 5:30 PM, Rein H. [email protected] wrote:
On 2010-06-10 06:59:40 -0700, Robert D. said:
You have been presented with one in this very thread. The OP wants objects
of his class to have the correct semantics for Array#& and Hash#[], etc. The
correct answer is to implement #hash and #eql?, just as implementing <=>
provides objects of his class with the correct semantics for Array#sort.
I guess you really do not know what I was talking about? Or do you
just repeat the same stuff over and over again in order to convince
me?
overwriting #hash and #eql? breaks Hash! Why the hack should OP’s
usecase justify this?
And it does not answer my question. Where would I like that Hash
behaves accordingly to the redefined #eql? and #hash. And BTW I asked
Wilson, did I not?
Cheers
Robert
On Thu, Jun 10, 2010 at 6:48 PM, Mark A. [email protected]
wrote:
Robert D. wrote:
On Thu, Jun 10, 2010 at 5:30 PM, Rein H. [email protected] wrote:
overwriting #hash and #eql? breaks Hash!
That’s not true, I think.
Judge for yourself
require “forwardable”
def count klass
ObjectSpace.each_object( klass ).to_a.size
end
class N
extend Forwardable
attr_reader :n
def_delegators :n, :hash
def eql? otha
n == otha.n
end
private
def initialize n
@n = n
end
end # class N
h = { N.new( 42 ) => true }
h[ N.new( 42 ) ] = 42
p h
GC.start
p count(N)
Cheers
R.
On 2010-06-10 22:52:14 -0700, Robert D. said:
ObjectSpace.each_object( klass ).to_a.size
@n = n
Cheers
R.
This breaks Hash? Quite the opposite!
This is precisely what is meant by “defining the semantics” of a class
for use by hashes and the very behavior you want when you define #eql?
and #hash in the first place!
You wouldn’t say that defining #<=> breaks Array#sort, so why would you
say that this “breaks Hash”? This doesn’t break Hash. If anything, it
fixes it when using N objects as keys!
Rein H.:
#hash makes sense for Hash#[] and etc. #eql? makes more
sense for Array#&. I too find it odd that both are necessary.
Both are necessary because #eql? says whether two objects are surely
the same, while #hash says whether they’re surely different – which,
perhaps counterintuitively, is not the same problem.
The difference is that in many, many cases it’s much faster to check
whether two objects are surely different (via a fast #hash function)
than whether they’re surely the same (#eql? can be quite slow).
The main difference betwen #eql? and #hash is that #hash can return the
same value for objects that are not #eql? (but if two objects are #eql?
then #hash must return the same value).
An untested, and definitely not optimal
(but hopefully simple) example follows. 
Imagine that you want to implement a new immutable string class, one
which caches the string length (for performance reasons). Imagine also
that the vast majority of such strings you use are of different lenghts,
and that you want to use them as Hash keys.
class ImmutableString
def initialize string
@string = string.dup.freeze
@length = string.length
end
end
Given the above assumptions, it might make sense for #hash to
return the @length, while #eql? makes the ‘proper’ comparison:
class ImmutableString
def hash
@length
end
alias eql? ==
end
This way in the vast majority of cases, when your ImmutableStrings will
be considered for Hash keys, the check whether a given key exists will
be very quick; only when two objects #hash to the same value (i.e.,
when they’re not surely different) the #eql? is called to tell whether
they’re surely the same.
— Shot
On Jun 3, 4:29 pm, Anderson L. [email protected] wrote:
How can I compare two objects and get true if some of his atributes are
equals ?
include Comparable ?
Regards,
Dan
2010/6/11 Shot (Piotr S.) [email protected]:
whether two objects are surely different (via a fast #hash function)
than whether they’re surely the same (#eql? can be quite slow).
This is not necessarily true. Any reasonable implementation of #eql?
will bail out as soon as it sees a difference. On the contrary, you
always need to look at the complete state of an instance to calculate
#hash. I can easily construct an example where #eql? beats #hash:
14:40:54 Temp$ ruby19 eql-test.rb
same
0.110000 0.000000 0.110000 ( 0.098000)
0.093000 0.000000 0.093000 ( 0.099000)
0.157000 0.000000 0.157000 ( 0.151000)
different early
0.093000 0.000000 0.093000 ( 0.101000)
0.094000 0.000000 0.094000 ( 0.096000)
0.000000 0.000000 0.000000 ( 0.000000)
different late
0.109000 0.000000 0.109000 ( 0.105000)
0.094000 0.000000 0.094000 ( 0.098000)
0.156000 0.000000 0.156000 ( 0.149000)
14:40:56 Temp$ cat eql-test.rb
require ‘benchmark’
a1 = Array.new 1_000_000
a2 = Array.new 1_000_000
puts “same”
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a1[0] = 1
a2[0] = 2
puts “different early”
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a2[0] = a1[0]
a2[999_999] = 1
puts “different late”
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
14:40:58 Temp$
Notice also how #eql? with equal arrays is not much slower than #hash.
and that you want to use them as Hash keys.
@length
Bad hash implementation. Why don’t you use String#hash?
be very quick; only when two objects #hash to the same value (i.e.,
when they’re not surely different) the #eql? is called to tell whether
they’re surely the same.
If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection. That’s why I suggested to use a Struct for
key fields (which has proper #hash and #eql? built in).
Kind regards
robert