Maximum length of array#hash

jimb · October 7, 2010, 2:54pm

I have a large array and want to calculate its hash-code using
Array#hash.

i.e. [“a”,“b”,“c”].hash

I want to store this hash-code in a database table.

Can anyone tell me (or point me in the right direction to find out), the
maximum length (in characters) of such a hash-code.

On my development machine the hash-code is always small enough to fit
into a int(11) field, but I don’t know what factors influence its
calculation.

Thanks in advance.

jimb · October 7, 2010, 3:05pm

On Thu, Oct 7, 2010 at 2:54 PM, Jim B. [email protected]
wrote:

On my development machine the hash-code is always small enough to fit
into a int(11) field, but I don’t know what factors influence its
calculation.

Thanks in advance.

This:
http://ruby-doc.org/core/classes/Array.html#M002159

says that Array#hash returns a Fixnum, and this:

http://ruby-doc.org/core/classes/Fixnum.html#M001079

Tells you the size in bytes of a Fixnum in your platform.

Jesus.

jimb · October 7, 2010, 4:22pm

2010/10/7 Jesús Gabriel y Galán [email protected]:

http://ruby-doc.org/core/classes/Fixnum.html#M001079

Tells you the size in bytes of a Fixnum in your platform.

Having said that, what’s the point in storing a hash code in a
database table? Basically this is redundant information and when
querying you would want to check for the key fields anyway because
hash codes are by far not unique. So the hash code is not helpful
during querying. Jim, what’s the point?

Kind regards

robert

jimb · October 7, 2010, 4:31pm

Having said that, what’s the point in storing a hash code in a
database table? Basically this is redundant information and when
querying you would want to check for the key fields anyway because
hash codes are by far not unique. So the hash code is not helpful
during querying. Jim, what’s the point?

Hi,

Thanks both for the answers.

I’m using it to prevent double data submission in my rails app.
When a user submits my form, I create an array of everything he/she has
submitted. Then I create a hash-code of this array and compare it with
the hash code of the last successful submission (which is stored in the
db table).
This effectively eliminates double data submission (i.e. a user pressing
submit multiple times or using the back button and resubmitting).

I looked high and low for an effective method to stop double data
submission and couldn’t find anything that worked well, thus I came up
with this idea.

If this is a backwards method of preventing double data submission and I
am missing something obvious, please do let me know.

Cheers,
Jim

jimb · October 7, 2010, 4:40pm

On 10/7/2010 8:04 AM, Jesús Gabriel y Galán wrote:

http://ruby-doc.org/core/classes/Fixnum.html#M001079

Tells you the size in bytes of a Fixnum in your platform.

While this answers your primary question, take care with storing a hash
generated by the hash method. Such methods are generally treated as a
black box, so you can’t be assured that different versions of Ruby, much
less different implementations, will return the same hash given the same
object. It would be reasonable to expect for instance that platforms
with different sized Fixnum implementations would have different hash
implementations that take advantage of optimizations targeted at each
platform. Storing these hashes in your DB may strongly bind your data
to a particular build of Ruby.

Maybe this doesn’t matter for your needs, but if you want to have a
good, consistent hash of some data, it would be best to use a well
defined hashing algorithm over a well defined marshaling of the object
and its data:

require ‘digest/md5’

May be a big assumption that the YAML representation of an object

is well defined, so take this with a grain of salt.

require ‘yaml’

a = [1, 2, 3]
Digest::MD5.hexdigest(a.to_yaml)
# => “4088cb2d3462e59f1735319fa50747a0”

Like Robert said though, I’m not sure why you would want to store this
kind of data in your DB in the first place.

-Jeremy

jimb · October 7, 2010, 4:53pm

While this answers your primary question, take care with storing a hash
generated by the hash method. Such methods are generally treated as a
black box, so you can’t be assured that different versions of Ruby, much
less different implementations, will return the same hash given the same
object.

Oh right, I didn’t realize. I was just looking at what methods I could
call on array and hash did exactly the job I wanted.

Maybe this doesn’t matter for your needs
Nah, at the moment everything is working very smoothly. It is a small
app and doesn’t need to scale or move server.

a = [1, 2, 3]
Digest::MD5.hexdigest(a.to_yaml)

=> “4088cb2d3462e59f1735319fa50747a0”

Thanks for that. I will have a look at using that instead.
Essentially all I want to do is create a hash-code out of an array.
If this is better practice and makes things less likely to break I will
gladly use it.

Like Robert said though, I’m not sure why you would want to store this
kind of data in your DB in the first place.
So that I can compare current user input with the last successful user
input.
This way I can eliminate double data submission.

Is there a better way to tackle double data submission?

Thanks for your reply,
Jim

jimb · October 7, 2010, 4:56pm

On Thu, Oct 7, 2010 at 4:31 PM, Jim B. [email protected]
wrote:

Having said that, what’s the point in storing a hash code in a
database table? Basically this is redundant information and when
querying you would want to check for the key fields anyway because
hash codes are by far not unique. So the hash code is not helpful
during querying. Jim, what’s the point?

with this idea.

If this is a backwards method of preventing double data submission and I
am missing something obvious, please do let me know.

The RDBMS way to do it would be to define a unique index (or use the
primary key for that) in the table which simply prevents duplicate
insertions.

I am not a web developer but I am sure the discipline has invented
methods to avoid this already. My simple approach would be to add a
hidden field to the form with a string I generate (just a silly
example “#{rand 37}:#{Time.now.to_f}”). I would then store that
value in the table if necessary. You could as well use a submit
counter that you store in your session. Then you could detect
duplicate submission without going to the DB which is slower than a
check in memory.

I would certainly not rely on #hash because that is too fragile.

Kind regards

robert