Hash value inconsistencies

ruby -e “puts ‘test’.hash”

Should this output the same integer value on all platforms where Ruby
can run?

  • Windows
    ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386- mswin32]
    ruby -e “puts ‘test’.hash” #=> -914358341

  • Mac 10.5
    ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i686- darwin8.10.3]
    ruby -e “puts ‘test’.hash” #=> -914358341

  • Linux 2.6 Kernel
    ruby --version #=> ruby 1.8.6 (2007-06-07 patchlevel 36) [x86_64-
    linux]
    ruby -e “puts ‘test’.hash” #=> 1233125307

ruby --version #=> ruby 1.8.5 (2006-12-04 patchlevel 2) [x86_64-
linux]
ruby -e “puts ‘test’.hash” #=> 1233125307

It appears not! So, any suggestions on generating an ID number for an
object that is unique yet consistent across different platforms? I’d
like to have some method that I could call on an object that would
return a reproducible value that would uniquely identify that object.

Thoughts?

TwP

Tim P. wrote:

ruby -e “puts ‘test’.hash”

Should this output the same integer value on all platforms where Ruby
can run?

  • Windows
    ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386- mswin32]
    ruby -e “puts ‘test’.hash” #=> -914358341

I can’t speak to the larger problem you ask, but I did verify that this
negative one million was the output for windows. Maybe we could alter
the test slightly to figure out what ruby is doing.

Cheers.

Tim P. wrote:

Should this output the same integer value on all platforms where Ruby
can run?

Perhaps, but if you read the below, you’ll see why you should never rely
on it.

It appears not! So, any suggestions on generating an ID number for an
object that is unique yet consistent across different platforms? I’d
like to have some method that I could call on an object that would
return a reproducible value that would uniquely identify that object.

That’s not possible. There is more entropy in an arbitrary object than
can be represented in a FixNum. Basic coding theory stuff. If it was
possible, then you could code all the data in all the databases in the
world into a single Fixnum :-).

If you want a fixed-length code that’s sufficiently likely to be unique
that you can be almost certain that you’ll never see a false duplicate,
you need to use a cryptographic hash function. I recommend SHA-256, but
you might survive with a weaker one like MD5 or SHA-1. They take a lot
more work to calculate than is justified for Ruby’s hash keys though!

With these functions, the probability of a population containing a false
duplicate is approximately 50% when the population contains sqrt(2^N),
(or 2*(N/2)) distinct items, where N is the number of bits in the
checksum. For SHA-256, that means you need 2^128 items before you have
a reasonable chance of a collision. All of the programs you’ll ever
write,
running for your entire life, will only create a tiny fraction of this
many objects, so the chance of you ever seeing a collision is tiny.

That might sound risky still, but all of e-commerce is built on the
principle. If it’s good enough for that, it’s good enough for you :slight_smile:

Clifford H…

On Feb 5, 2008, at 4:04 PM, Clifford H. wrote:

platforms? I’d like to have some method that I could call on an
object that would return a reproducible value that would uniquely
identify that object.

That’s not possible. There is more entropy in an arbitrary object than
can be represented in a FixNum. Basic coding theory stuff. If it was
possible, then you could code all the data in all the databases in the
world into a single Fixnum :-).

Darn information theory! I just need a fixnum. The number of objects
we are creating is pretty tiny – maybe 100.

I was quite surprised that the Ruby “hash” method is not consistent
across platforms. The solution is to roll my own hash function that
produces consistent results. Just wondering about the more general
questions regarding the built in hash function.

Blessings,
TwP

2008/2/6, Tim P. [email protected]:

Darn information theory! I just need a fixnum. The number of objects
we are creating is pretty tiny – maybe 100.

I was quite surprised that the Ruby “hash” method is not consistent
across platforms. The solution is to roll my own hash function that
produces consistent results.

A regular hash function is a bad candidate for a unique id anyway.
I’d rather use a MD5 or something like that. If your strings are
reasonably short you can as well convert them to Fixnums but then
again: why bother and not directly use the string?

Just wondering about the more general
questions regarding the built in hash function.

There is no need for a hash function to be consistent across
platforms. Why should it?

Kind regards

robert

Tim P. wrote:

On Feb 5, 2008, at 4:04 PM, Clifford H. wrote:

platforms? I’d like to have some method that I could call on an
object that would return a reproducible value that would uniquely
identify that object.

That’s not possible. There is more entropy in an arbitrary object than
can be represented in a FixNum. Basic coding theory stuff. If it was
possible, then you could code all the data in all the databases in the
world into a single Fixnum :-).

Darn information theory! I just need a fixnum. The number of objects
we are creating is pretty tiny – maybe 100.

I was quite surprised that the Ruby “hash” method is not consistent
across platforms. The solution is to roll my own hash function that
produces consistent results. Just wondering about the more general
questions regarding the built in hash function.
What in particular are you going to hash? Under what circumstances do
you want to bomb out?

Tim P. wrote:

ruby -e “puts ‘test’.hash”

Should this output the same integer value on all platforms where Ruby
can run?

  • Windows
    ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386- mswin32]
    ruby -e “puts ‘test’.hash” #=> -914358341

  • Mac 10.5
    ruby --version #=> ruby 1.8.6 (2007-03-13 patchlevel 0) [i686- darwin8.10.3]
    ruby -e “puts ‘test’.hash” #=> -914358341

  • Linux 2.6 Kernel
    ruby --version #=> ruby 1.8.6 (2007-06-07 patchlevel 36) [x86_64-
    linux]
    ruby -e “puts ‘test’.hash” #=> 1233125307

ruby --version #=> ruby 1.8.5 (2006-12-04 patchlevel 2) [x86_64-
linux]
ruby -e “puts ‘test’.hash” #=> 1233125307

It appears not! So, any suggestions on generating an ID number for an
object that is unique yet consistent across different platforms? I’d
like to have some method that I could call on an object that would
return a reproducible value that would uniquely identify that object.

Thoughts?

TwP

The problem is that some of your test machines are 64-bit and some are
32-bit. I Ran the same tests on some Macs running Snow Leopard(64-bit)
and Leopard(32-bit) and Linux (64-bit) and Linux (32-bit) and all
results were consistent over OSes with the same bits.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs