Hash

rgreen · September 9, 2007, 8:26pm

What is the purpose of string hash? What would you use it for?
EXAMPLE: “This is a test.”.hash RETURNS -649841898. WHAT EXACTLY DOES
THIS REPRESENT? WHAT IS IT GOOD FOR? COMPARISONS?

rgreen · September 9, 2007, 8:35pm

Ron G. wrote:

What is the purpose of string hash? What would you use it for?
EXAMPLE: “This is a test.”.hash RETURNS -649841898. WHAT EXACTLY DOES
THIS REPRESENT? WHAT IS IT GOOD FOR? COMPARISONS?

Run “another test”.hash - you get a different number.

It’s good for hashes; look them up in any “data structures” textbook.
It’s a
unique number useful for rapidly accessing that string.

And please DON’T SCREAM!

rgreen · September 9, 2007, 8:49pm

On Sep 9, 2007, at 1:34 PM, “Phlip” [email protected] wrote:

Ron G. wrote:

What is the purpose of string hash? What would you use it for?
EXAMPLE: “This is a test.”.hash RETURNS -649841898. WHAT EXACTLY DOES
THIS REPRESENT? WHAT IS IT GOOD FOR? COMPARISONS?

Run “another test”.hash - you get a different number.

It’s good for hashes; look them up in any “data structures”
textbook. It’s a unique number useful for rapidly accessing that

It should be noted though that String#hash isn’t garaunteed to be
unique.

rgreen · September 9, 2007, 9:00pm

Marcel Molina Jr. wrote:

On Sep 9, 2007, at 1:34 PM, “Phlip” [email protected] wrote:

Ron G. wrote:

What is the purpose of string hash? What would you use it for?
EXAMPLE: “This is a test.”.hash RETURNS -649841898. WHAT EXACTLY DOES
THIS REPRESENT? WHAT IS IT GOOD FOR? COMPARISONS?

Run “another test”.hash - you get a different number.

It’s good for hashes; look them up in any “data structures”
textbook. It’s a unique number useful for rapidly accessing that

It should be noted though that String#hash isn’t garaunteed to be
unique.

Then,again I ask, what is it good for?

rgreen · September 9, 2007, 9:08pm

Ron G. wrote:

It should be noted though that String#hash isn’t garaunteed to be
unique.

Then,again I ask, what is it good for?

Then I answer again: A tutorial on data structures tells how to use
them.

(Maybe I shouldn’t post when I’m having a bad day, huh?

rgreen · September 9, 2007, 9:23pm

Peter C. wrote:

On 9/9/07, Ron G. [email protected] wrote:

Marcel Molina Jr. wrote:

It should be noted though that String#hash isn’t garaunteed to be
unique.

Then,again I ask, what is it good for?

It’s still useful as a hash. Marcel wasn’t wrong, but no fixed size
hash
is “guaranteed” to be unique as that’s absolutely impossible, per the
pigeonhole principle
(Pigeonhole principle - Wikipedia).
String#hash’s hash is of a far lower “quality” than that offered by,
say,
SHA-1 or SHA-2.

Regards,
Peter C.
http://www.rubyinside.com/

Peter,
If Its not guaranteed to be unique, then it can’t be used for identity.
Can you give me an example of how i would use string.hash?

rgreen · September 9, 2007, 9:32pm

Ron G. wrote:

THIS REPRESENT? WHAT IS IT GOOD FOR? COMPARISONS?
Then,again I ask, what is it good for?

It’s for internal use, it’s used in Hash to make accessing and finding
strings in hash faster

rgreen · September 9, 2007, 9:11pm

On 9/9/07, Ron G. [email protected] wrote:

Marcel Molina Jr. wrote:

It should be noted though that String#hash isn’t garaunteed to be
unique.

Then,again I ask, what is it good for?

It’s still useful as a hash. Marcel wasn’t wrong, but no fixed size
hash
is “guaranteed” to be unique as that’s absolutely impossible, per the
pigeonhole principle
(Pigeonhole principle - Wikipedia).
String#hash’s hash is of a far lower “quality” than that offered by,
say,
SHA-1 or SHA-2.

Regards,
Peter C.

rgreen · September 9, 2007, 9:36pm

On Sunday 09 September 2007 12:23:43 pm Ron G. wrote:

It’s still useful as a hash. Marcel wasn’t wrong, but no fixed size
http://www.rubyinside.com/

Peter,
If Its not guaranteed to be unique, then it can’t be used for identity.
Can you give me an example of how i would use string.hash?

Let’s put it this way. MD5 and SHA-* hashes aren’t guaranteed to be
unique either. There’s just many more cases where strings will share a
hash
with String#hash as opposed to something like MD5/SHA-*.

Hashes are useful for identify strings in hashtables. You use this every
time you say something like:

foo = {“bar” => “baz”}
foo[“bar”] # => “baz”

HTH,

rgreen · September 9, 2007, 9:48pm

Alex Y. wrote:

Ron G. wrote:

pigeonhole principle
If Its not guaranteed to be unique, then it can’t be used for identity.
Can you give me an example of how i would use string.hash?
In general, you wouldn’t use String#hash, although you might conceivably
want to override it. It’s there for Hash. From the documentation on
Object#hash:

“Generates a Fixnum hash value for this object. This function must have
the property that a.eql?(b) implies a.hash == b.hash. The hash value is
used by class Hash.”

Note the direction of implication: a == b => a.hash == b.hash, not
a.hash == b.hash => a == b.

I think I understand. In other words it’s not something I would use
directly. I just ran across it in Peter’s book and wanted to make sure I
understood. Thanks everybody. Sorry if my ignorance pissed you off
Philip.

rgreen · September 9, 2007, 9:38pm

Ron G. wrote:

pigeonhole principle
If Its not guaranteed to be unique, then it can’t be used for identity.
Can you give me an example of how i would use string.hash?
In general, you wouldn’t use String#hash, although you might conceivably
want to override it. It’s there for Hash. From the documentation on
Object#hash:

“Generates a Fixnum hash value for this object. This function must have
the property that a.eql?(b) implies a.hash == b.hash. The hash value is
used by class Hash.”

Note the direction of implication: a == b => a.hash == b.hash, not
a.hash == b.hash => a == b.

rgreen · September 9, 2007, 10:21pm

No hash function is guaranteed to be unique for all inputs. I don’t
know
what exactly you mean by “can’t be used for identity”.

The hash method on string (and any other object) is used by ruby when
storing that type of object in a hash table. Haven’t looked through the
ruby sources, but here’s how this sort of thing generally works and the
problem that we’re trying to solve.

We want a data structure that lets us retrieve some key/value pair in
(hopefully) constant time. This means that, no matter how many items of
data we’re storing, it will always take us (roughly) the same amount of
time
to retrieve an arbitrary item. (IE: O(1)) A simple way to do this is
to
have some hash function that breaks our inputs up into “buckets”. Then,
when we get a given input, we feed it into that function and look into
the
“bucket” that it gives us back.

An example of a very simple hash function for ASCII strings would be
something like this:

Sum the ASCII values of each character in the string, then take he
modulus
of that value and 10. This would give us ten “buckets” to look in.
Obviously, there will be a whole lot of overlap here, as we have
arbitrary
length strings as inputs to our hash function and only 10 possible
outputs
(so this is not a reversible function.)

We get around this problem by storing the values in each bucket in some
other data structure, say a linked list. We then get the bucket (using
the
hash function) and search through the list for our value.

A simple example using the hash function I described above (which is
not
ruby’s builtin string hash function.)… Both “a” and “k” would fall
into
the same bucket:

irb(main):021:0> “a”[0] % 10
=> 7
irb(main):022:0> “k”[0] % 10
=> 7

So if we wanted to store the keys “a” and “k” in our hash table, they
would
both live in bucket seven. Within bucket seven, they would have some
internal organization that lets us retrieve them once we know the bucket
(say a linked list). So if we want to find the value associated with
the
key “a” we would perform our hash function on “a” and get bucket 7. We
would then look through the list located in bucket 7 until we find the
value
a, and retrieve the value associated with that key.

Ruby’s hash tables would work in a similar way, though I don’t know what
sort of data structure they use internally to the buckets. (It could be
anything, of course, not just a linked list.)

So you probably would not use string.hash directly often (if ever), but
every time you do something like aHash = {“foo” => 42), it’s being used
by
ruby.

Again, any decent data structures book will explain this better than I
can,
if you’re truly interested.

MBL

rgreen · September 9, 2007, 9:58pm

From: “Ron G.” [email protected]

I think I understand. In other words it’s not something I would use
directly. I just ran across it in Peter’s book and wanted to make sure I
understood.

More info:

Regards,

Bill

rgreen · September 10, 2007, 12:07am

Michael Bevilacqua-Linn wrote:

MBL
I think I’ll try Data Structures and Algorithms (Addison-Wesley Series
in Computer Science and Information Pr)

rgreen · September 9, 2007, 11:06pm

Bill K. wrote:

From: “Ron G.” [email protected]

I think I understand. In other words it’s not something I would use
directly. I just ran across it in Peter’s book and wanted to make sure I
understood.

More info:

Hash function - Wikipedia
Hash table - Wikipedia

Regards,

Bill

Thanks Bill.

rgreen · September 10, 2007, 12:09am

Ron G. wrote:

Michael Bevilacqua-Linn wrote:

MBL
I think I’ll try Data Structures and Algorithms (Addison-Wesley Series
in Computer Science and Information Pr)

When I said it couldn’t be used for identity I meant if you can’t
guarantee uniqueness.How would you know if you retrieved the correct
data.

rgreen · September 10, 2007, 12:54am

Ron G. wrote:

When I said it couldn’t be used for identity I meant if you can’t
guarantee uniqueness.How would you know if you retrieved the correct
data.

You don’t need uniqueness. The hash did its job when you can almost
instantly chop billions of strings down to a short list of candidate
strings. After the hash collision, you trivially search the list for the
actual target. The point is to access the stored value, at the target
location, quickly!

This is how Google works, for example…

rgreen · September 10, 2007, 12:56am

Phlip wrote:

Ron G. wrote:

When I said it couldn’t be used for identity I meant if you can’t
guarantee uniqueness.How would you know if you retrieved the correct
data.

You don’t need uniqueness. The hash did its job when you can almost
instantly chop billions of strings down to a short list of candidate
strings. After the hash collision, you trivially search the list for the
actual target. The point is to access the stored value, at the target
location, quickly!

This is how Google works, for example…

Thank you.

rgreen · September 10, 2007, 1:51am

Ron G. pisze:

location, quickly!

This is how Google works, for example…

Thank you.

Here is an example:

a class whose instances have always the same #hash and #eql? which
always returns true

class Foo
def hash
puts “hash called”
0
end

 def eql? other
     puts "eql? called"
     true
 end

end

h = {}
f1 = Foo.new
f2 = Foo.new

given

h[f1] = :blah

here, f1.hash is used to locate the bucket to be inserted into, so it
will only output “hash called”.

h[f1]

here, the f1.hash is used to locate the bucket and then references will
be compared (f1 is identical to f1 so eql? won’t have to be called).

h[f2]

here, the f2.hash is used to locate the bucket, it will be found, but f2
is not identical to f1, so eql? method will have to be used (which in
turn returns true, so the objects are considered equal) and finally the
lookup will be successfull.

In a hash, when two different objects return the same hash value, it’s
called a collision. Equality operation here just atcs like a guard to
make sure we are dealing with right object.

lopex

rgreen · September 10, 2007, 3:38am

On Sep 9, 6:49 pm, Marcin Miel y ski [email protected] wrote:

strings. After the hash collision, you trivially search the list for the
always returns true
end
here, f1.hash is used to locate the bucket to be inserted into, so it
is not identical to f1, so eql? method will have to be used (which in
turn returns true, so the objects are considered equal) and finally the
lookup will be successfull.

In a hash, when two different objects return the same hash value, it’s
called a collision. Equality operation here just atcs like a guard to
make sure we are dealing with right object.

lopex

Excellent! Thanks! I don’t know if I’ve ever seen it explained so
clearly. That makes a lot more sense to me now. (And naturally,
disregard where I said Object#equal in my other post. Couldn’t
remember positively if it was “equal” or “eql?”. )

Anyways, good stuff. I actually need to do this myself soon enough…