What’s the standard way of implementing #hash for value objects in Ruby?

luislavena · December 29, 2011, 3:11pm

Hi!

For

class A
def initialize(a, b, c)
@a, @b, @c = a, b, c
end
end

is it

def A.hash
self.class.hash ^ @a.hash ^ @b.hash ^ @c.hash
end

or

def A.hash
self.class.hash ^ [@a, @b, @c].hash
end

or is it something else?

No classes in the standard library use self.class.hash, but I think it
makes sense to use it.

Nikolai_W · December 29, 2011, 3:58pm

On 12/29/2011 08:10 AM, Nikolai W. wrote:

Hi!

What’s the standard way of implementing #hash for value objects in Ruby?

I don’t know how good this is, but if your objects have a unique string
representation, you could use example_object.to_s.hash. I’m sure there
must be more efficient solutions, but this solution has the advantage
that it’s pretty easy to implement and should reduce the possibility of
collisions if you use your objects as keys in a Hash.

-Jeremy

Nikolai_W · December 29, 2011, 3:59pm

I think you can’t access instance variables from a class method, so
both methods are kinda useless.

2011/12/29 Nikolai W. [email protected]:

Nikolai_W · December 29, 2011, 4:11pm

On Thu, Dec 29, 2011 at 15:52, Gunther D. [email protected]
wrote:

I think you can’t access instance variables from a class method, so
both methods are kinda useless.

Ugh, that should of course be

class A
def hash
…
end
end

Nikolai_W · December 30, 2011, 12:28am

On Thu, Dec 29, 2011 at 4:06 PM, Nikolai W. [email protected] wrote:

end
Easy way:

class A
def hash
[@a, @b, @c].hash
end
end

Easiest:

also gives you == and eql?

A = Struct.new :a, :b, :c

Kind regards

robert

Nikolai_W · December 30, 2011, 4:24am

Hi Nikolai,

Writing a good hash function is actually language neutral. Some
literature suggested the following form:

p = <a big prime number> # e.g. p = 31
hash = 1
for each object you will use in equal do
   hash = hash * p + object.hash
end

I am not sure if this fits your desire of a ruby-flavor hash
implementation, but I do use this consistently in all my Java
projects.

Nikolai_W · December 30, 2011, 12:48am

On Fri, Dec 30, 2011 at 00:26, Robert K.
[email protected] wrote:

Easy way:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

If there isn’t one, perhaps one should be agreed upon?

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Another is the XOR of the hash value of the object’s class and the
hash value of an Array containing its instance variables.

Nikolai_W · December 30, 2011, 6:25am

On Thu, Dec 29, 2011 at 5:47 PM, Nikolai W. [email protected] wrote:

I seem to have failed in communicating what Im after. I wasnt after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, whats the
standard way of implementing #hash for value objects in Ruby?

If there isnt one, perhaps one should be agreed upon?

If there is one, I don’t know what it is, which implies there isn’t one.

I usually just delegate to one of my attributes’ hash methods (e.g. for
a
user, I might use its user’s name’s hash). I’m not sure what advantage
would be gained by establishing a standard.

Nikolai_W · December 30, 2011, 9:45am

On Dec 29, 2011, at 21:25 , Josh C. wrote:

If there is one, I don’t know what it is, which implies there isn’t one.

I usually just delegate to one of my attributes’ hash methods (e.g. for a
user, I might use its user’s name’s hash). I’m not sure what advantage
would be gained by establishing a standard.

You should be doing the same attributes you use in an equality test. I
usually throw all the attributes in an array and ask for it’s hash.

Nikolai_W · December 30, 2011, 10:43am

Maybe you will be interested in reading “Introduction to Algorithms”.
There is a whole chapter discussing how to implement hash tables
including criteria of good hash functions.

Nikolai_W · December 30, 2011, 10:09am

On Fri, Dec 30, 2011 at 06:25, Josh C. [email protected] wrote:

On Thu, Dec 29, 2011 at 5:47 PM, Nikolai W. [email protected] wrote:

So, again, what’s the standard way of implementing #hash for value objects in
Ruby?

If there is one, I don’t know what it is, which implies there isn’t one.

That’s a very bold statement to make, considering that you don’t seem
to know how #hash should be implemented:

I usually just delegate to one of my attributes’ hash methods (e.g. for a
user, I might use its user’s name’s hash). I’m not sure what advantage
would be gained by establishing a standard.

If there was a standard way, then no one would have to ask if there
was one. It would make implementing #hash, which you should/must do
if you implement #==, trivial, as there’s then only one way to do so.
It would make hashing conflicts less likely, as the correct set of
parameters would be used, whatever the object, and a uniform
distribution could be established across different classes of objects.

Perhaps Ruby’s internal hashing functions (rb_hash_start,
rb_hash_uint, rb_hash_end) could be exposed in some new object?

Note that these functions are currently not being used in a uniform
way internally. For example, Struct starts with the #hash of the
object’s class, while Array starts with #length.

Nikolai_W · December 30, 2011, 11:00am

On Fri, Dec 30, 2011 at 10:42, Yong Li [email protected] wrote:

Maybe you will be interested in reading “Introduction to Algorithms”.
There is a whole chapter discussing how to implement hash tables
including criteria of good hash functions.

What in what I’ve written in this thread suggests that I don’t know
anything about hash tables and hashing functions?

I have read said book, chapter 6 of The Art of Computer Programming,
and many other books and articles that cover this subject. I don’t
mention this to brag, but to point out the fact that this isn’t a
newbie asking a question on this mailing list.

Nikolai_W · December 30, 2011, 10:45am

On Fri, Dec 30, 2011 at 3:09 AM, Nikolai W. [email protected] wrote:

Not sure what gives this impression, while learning algorithms I’ve
implemented a couple different hash algorithms (granted the keys were
always strings).

I usually just delegate to one of my attributes’ hash methods (e.g. for a

user, I might use its user’s name’s hash). I’m not sure what advantage
would be gained by establishing a standard.

If there was a standard way, then no one would have to ask if there
was one.

That would change the answer, not the question.

It would make implementing #hash, which you should/must do

if you implement #==, trivial, as theres then only one way to do so.

Delegating to the most relevant attribute still seems more trivial.

It would make hashing conflicts less likely, as the correct set of

parameters would be used, whatever the object, and a uniform
distribution could be established across different classes of objects.

It still seems that delegating to an attribute like an id or a string
which
is likely unique, would yield sufficient uniqueness, and probably be
more
performant as it would require fewer calculations. You may be more
likely
to have a collision, but I’d expect that the likelihood of this is not
increased much, and the benefit saved on calculations and on complexity
of
implementation would more than make up for it.

I suppose one reason I take this view could be that the only viable
scenarios I can think of for making some arbitrary object into a hash
key
are for sets and Array#uniq. For me, these scenarios are exceedingly
rare,
and have always been trivially replaced with alternative keys.

Nikolai_W · December 30, 2011, 11:10am

On Fri, Dec 30, 2011 at 10:44, Josh C. [email protected] wrote:

On Fri, Dec 30, 2011 at 3:09 AM, Nikolai W. [email protected] wrote:

It would make implementing #hash, which you should/must do
if you implement #==, trivial, as there’s then only one way to do so.

Delegating to the most relevant attribute still seems more trivial.

Trivial – perhaps. Wrong – most definitely. That you don’t seem to
understand this is what tells me that you don’t understand how #hash
should be implemented.

I suppose one reason I take this view could be that the only viable
scenarios I can think of for making some arbitrary object into a hash key
are for sets and Array#uniq. For me, these scenarios are exceedingly rare,
and have always been trivially replaced with alternative keys.

How about having them as keys in an actual Hash (which is how Sets and
Array#uniq are currently implemented)?

Nikolai_W · December 30, 2011, 12:20pm

On Fri, Dec 30, 2011 at 4:09 AM, Nikolai W. [email protected] wrote:

should be implemented.

Feel free to explain why it is wrong.

I suppose one reason I take this view could be that the only viable
scenarios I can think of for making some arbitrary object into a hash key
are for sets and Array#uniq. For me, these scenarios are exceedingly
rare,
and have always been trivially replaced with alternative keys.

How about having them as keys in an actual Hash (which is how Sets and
Array#uniq are currently implemented)?

Which is why I mentioned it.

Nikolai_W · December 30, 2011, 3:03pm

On Fri, Dec 30, 2011 at 12:47 AM, Nikolai W. [email protected] wrote:

On Fri, Dec 30, 2011 at 00:26, Robert K. [email protected] wrote:

Easy way:

I seem to have failed in communicating what Im after. I wasnt after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, whats the
standard way of implementing #hash for value objects in Ruby?

I have covered this a bit earlier:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

Boils down to that the hash code should be derived from hash codes of
all fields used as key fields in determining equivalence.

If there isnt one, perhaps one should be agreed upon?

Why do you think a standard is needed? There are some well known
requirements for good hash codes and how they are calculated and since
most of the time Hash keys are from the same class whatever algorithm
chosen works.

One suggestion would be the XOR of the hash values of the objects
class and its instance variables.

Not good, because then the same value in different fields has
identical influence on the hash code. Consequently likelihood of
collisions increases and thus performance of Hash storage and
retrieval may decrease.

Another is the XOR of the hash value of the objects class and the
hash value of an Array containing its instance variables.

Better because that considers position.

But still, I don’t see the need. Note also that a proper Hash key
usually should be immutable because changing them causes all sorts of
trouble if not done carefully. And most of the time Hash keys are
String, Symbol, Fixnum and the like - which all do have their #hash
implementation already.

Kind regards

robert

Nikolai_W · December 30, 2011, 3:43pm

I found this recent discussion interesting in a somewhat related
manner, dealing with hash tables:
http://groups.google.com/group/nodejs/browse_thread/thread/d34ed2ec3526db5a

On Fri, Dec 30, 2011 at 9:02 AM, Robert K.

Nikolai_W · December 30, 2011, 3:45pm

On Fri, Dec 30, 2011 at 15:42, Zachary S. [email protected]
wrote:

I found this recent discussion interesting in a somewhat related
manner, dealing with hash tables:
http://groups.google.com/group/nodejs/browse_thread/thread/d34ed2ec3526db5a

This was the main reason for releasing Ruby 1.8.7p357.

Nikolai_W · December 30, 2011, 3:49pm

This was the main reason for releasing Ruby 1.8.7p357.

Oh, I hadn’t realized until I just checked ruby-lang.org

http://www.ruby-lang.org/en/news/2011/12/28/denial-of-service-attack-was-found-for-rubys-hash-algorithm/

Nikolai_W · December 30, 2011, 3:44pm

On Fri, Dec 30, 2011 at 15:02, Robert K.
[email protected] wrote:

On Fri, Dec 30, 2011 at 12:47 AM, Nikolai W. [email protected] wrote:

I seem to have failed in communicating what I’m after. I wasn’t after
different ways of implementing #hash, I was after the golden standard
of #hash implementations for value objects. So, again, what’s the
standard way of implementing #hash for value objects in Ruby?

I have covered this a bit earlier:
http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

Boils down to that the hash code should be derived from hash codes of
all fields used as key fields in determining equivalence.

Precisely. And how do you get a single value from the hash values of
those fields?

One suggestion would be the XOR of the hash values of the object’s
class and its instance variables.

Not good, because then the same value in different fields has
identical influence on the hash code. Consequently likelihood of
collisions increases and thus performance of Hash storage and
retrieval may decrease.

This is exactly what you suggest in your article, with the addition
(or rather XOR) of the hash value of the object’s class.

But still, I don’t see the need. Note also that a proper Hash key
usually should be immutable because changing them causes all sorts of
trouble if not done carefully.

Hence the use of “value object” in my question.

And most of the time Hash keys are
String, Symbol, Fixnum and the like - which all do have their #hash
implementation already.

But how do you combine them? That’s what this whole thread has been
about. I realize now that I made quite a few assumptions about what I
thought readers would understand from my original question that may
not have been as obvious to them as they were to me. First, let me
apologize for this lack of clarity. Second, let me rephrase my
question and add some additional context and examples:

What algorithm should one employ in the calculation of the hash value
of an arbitrary value object?

As an example, what algorithm should one employ in the calculation of
the hash value of an immutable class A containing three immutable
instance variables @a, @b, @c that contain, respectively, a String, a
Fixnum, and a Symbol and that are all used in the calculation of #==?

The semi-standard solutions seem to be

class A
def hash
@a ^ @b ^ @c
end
end

and

class A
def hash
[@a, @b, @c].hash
end
end

These are the two main implementations in the Standard Library,
anyway. The first is also the solution proposed by Robert K. in

http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

I would claim that the algorithm should take the class of the object
into account as well, both for consistency with #== (which should
check equality of the classes of the objects being compared) and for
added entropy.

Internally, Ruby (primarily) uses three C functions for the
calculation of combined hash values, namely rb_hash_start,
rb_hash_uint, and rb_hash_end. As an example, the hash value of a
Struct is calculated (in Ruby with these three functions wrapped in an
imaginary module C) as

class Struct
def hash
C.rb_hash_end(reduce(C.rb_hash_start(self.class.hash)){ |h, v|
C.rb_hash_uint(h, v.hash) })
end
end

Might it be useful to have Ruby expose a way to perform this
calculation from the Ruby realm so that other classes may employ this
algorithm?