Forum: Ruby String#hash changed in Ruby 1.9?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
6bed507c0085d39447171b95c515a890?d=identicon&s=25 David Palm (Guest)
on 2009-05-04 16:24
(Received via mailing list)
Hi all,
in ruby 1.8.7:
david@trince ~$ ruby -e 'puts "abc".hash'
833038373
david@trince ~$ ruby -e 'puts "abc".hash'
833038373
david@trince ~$ ruby -e 'puts "abc".hash'
833038373

[always the same number]

in ruby 1.9.1:
david@trince ~$ ruby -e 'puts "abc".hash'
402929305
david@trince ~$ ruby -e 'puts "abc".hash'
-403532784
david@trince ~$ ruby -e 'puts "abc".hash'
-650364342

What happened? Is this intentional? Rationale? Any tips on how to
replace it?
7a561ec0875fcbbe3066ea8fe288ec77?d=identicon&s=25 Sebastian Hungerecker (Guest)
on 2009-05-04 16:41
(Received via mailing list)
Am Montag 04 Mai 2009 16:22:01 schrieb David Palm:
> in ruby 1.9.1:
> david@trince ~$ ruby -e 'puts "abc".hash'
> 402929305
> david@trince ~$ ruby -e 'puts "abc".hash'
> -403532784
> david@trince ~$ ruby -e 'puts "abc".hash'
> -650364342
>
> What happened? Is this intentional?

1.9 uses murmurhash(http://murmurhash.googlepages.com/) with a random
seed
which is generated once per application-run.

> Any tips on how to replace it?

What does it hurt if the hash value of a string does not remain constant
between runs of the application?

HTH,
Sebastian
6bed507c0085d39447171b95c515a890?d=identicon&s=25 David Palm (Guest)
on 2009-05-04 16:49
(Received via mailing list)
>> Any tips on how to replace it?
>
> What does it hurt if the hash value of a string does not remain constant
> between runs of the application?
>

In my case it's pretty bad. I use it in a command line utility to cache
rake tasks. I create one cachefile for each directory, naming them using
the String#hash of the full path (Dir.pwd.hash). If the hash is
different the next time the program runs the cache lookup fails (and I
get a new cache file instead of the old one).

So, I don't need anything fancy, just an equivalent to Dir.pwd.hash that
stay consistent. Do I need to MD5 it? Feels like overkill. Why was this
changed in the first place?
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-05-04 19:15
(Received via mailing list)
On 04.05.2009 16:48, David Palm wrote:
>>> Any tips on how to replace it?
>> What does it hurt if the hash value of a string does not remain constant
>> between runs of the application?
>
> In my case it's pretty bad. I use it in a command line utility to cache rake tasks. I 
create one cachefile for each directory, naming them using the String#hash of the full 
path (Dir.pwd.hash). If the hash is different the next time the program runs the cache 
lookup fails (and I get a new cache file instead of the old one).

Hm...  But you do admit that this is a bit abusive, do you?   Especially
since there are no guarantees that you won't have any collisions with a
hash value like the one returned by #hash.

How about storing your cache files with a fixed name in the original
directory?  Or have a file with metadata (mapping from path to cache
file name)?

> So, I don't need anything fancy, just an equivalent to Dir.pwd.hash that stay 
consistent. Do I need to MD5 it? Feels like overkill. Why was this changed in the first 
place?

That's an interesting question.  I'm curious as well.  Maybe the changes
are just a side effect of a new - supposedly better - hashing algorithm.

Kind regards

  robert
6bed507c0085d39447171b95c515a890?d=identicon&s=25 David Palm (Guest)
on 2009-05-04 22:58
(Received via mailing list)
On Tue, 5 May 2009 02:15:09 +0900, Robert Klemme wrote:
>
> Hm...  But you do admit that this is a bit abusive, do you?
> Especially since there are no guarantees that you won't have any
> collisions with a hash value like the one returned by #hash.

Oh yes, it's just a quick and convenient way of doing it. Dunno if I'd
call it "abusive", but it's sure not military grade programming...

> How about storing your cache files with a fixed name in the original
> directory?  Or have a file with metadata (mapping from path to cache
> file name)?

Fixed name won't work; most directories are scm tracked so it'd be a
mess to keep the cache files out of the way. One big(ish) cache file
might work. Maybe even a sqlite db. Have to run some benchmark on that.

>> that stay consistent. Do I need to MD5 it? Feels like overkill. Why
>> was this changed in the first place?
>
> That's an interesting question.  I'm curious as well.  Maybe the
> changes are just a side effect of a new - supposedly better - hashing
> algorithm.

The link sebastian provided (http://murmurhash.googlepages.com/) was
interesting but not exhaustive and I still don't know when/how/why the
behaviour was changed. Perhaps the ml archives will tell?
This topic is locked and can not be replied to.