Hash algorithm for nginx cache

Igor,

It looks like nginx now has the beginnings of caching which is exciting
as I am looking forward to comparing this with varnish.

I did notice in the docs that the key/filename is a md5 hash. Have you
looked at the recent MurmurHash/FNV-1a hashes?

They are at least 2x faster than md5 which would be useful in this
scenario.

See

for sample implementations.

–j

Superfast hash is an excellent hash too

http://www.azillionmonkeys.com/qed/hash.html

Marcus.

MurmurHash is 2x faster than Superfasthash.

Marcus C. wrote:

Superfast hash is an excellent hash too

Hash functions.

Marcus.

Cool, I’ll check it out. :slight_smile:

On Sat, Apr 18, 2009 at 08:27:11PM +0200, Joe Bofh wrote:

See
http://search.cpan.org/~tmaesaka/Digest-MurmurHash-0.10/lib/Digest/MurmurHash.pm#BENCHMARK
http://murmurhash.googlepages.com/

for sample implementations.

Thank you for information, but as I understand this hash produce 32- or
64-bit hash. I believe it should have more collisions as compared to
md5,
which is 128-bit hash.

Hey,

On Apr 21, 2009, at 17:46 , Joe Bofh wrote:

I guess if you put in an option to specify the hash, that would work
too. FNV and Murmur have been adopted on various open source
projects as
optional hashs because of the hashing performance.

For reference; libmemcached - another high performing project -
nowadays include most current hashing algorithms. Fnv1a and murmur are
two of them. See:
http://hg.tangent.org/libmemcached/file/048b32d9d1ae/docs/memcached_behavior.pod
(sorry for bad linkage, the site docs/manpage hasn’t been updated
for a while)

Two my knowledge, murmur/murmur2 is the best pick for a high
performance/low (but not zero) collision tradeoff.

Igor,

If you look at the docs, it is supposed to have excellent collision
resistance.
See MurmurHash, final version.: tanjent — LiveJournal

Given that cache objects are supposed to have a limited shelf life
(days, months), I would think that collision resistance is vs. hashing
performance is a decent tradeoff.

I guess if you put in an option to specify the hash, that would work
too. FNV and Murmur have been adopted on various open source projects as
optional hashs because of the hashing performance.

–J

Igor S. wrote:

On Sat, Apr 18, 2009 at 08:27:11PM +0200, Joe Bofh wrote:

See
Digest::MurmurHash - Perl XS interface to the MurmurHash algorithm - metacpan.org
http://murmurhash.googlepages.com/

for sample implementations.

Thank you for information, but as I understand this hash produce 32- or
64-bit hash. I believe it should have more collisions as compared to
md5,
which is 128-bit hash.

On Wed, Apr 22, 2009 at 01:22:52AM +0200, Joe Bofh wrote:

to produce a higher than expected number of collisions on keysets where
keys differ in only 2-3 bits - see
http://murmurhash.googlepages.com/statistics and the associated pages
for more info."

So it looks like collision-wise, it’s comparable.

I’m not expert in cryptography, but it seems to me that 128-bit MD5
should have though not 2^64 times less collisions, but at least 10^9
times
less than any 64-bit hash.

Igor,

Sure. But the question is how long will the entry stay in the cache? If
it’s a fairly finite amount of time, the possibility of a collision
drops dramatically.

I was suggesting murmurhash as it is significantly faster than md5/sha1
and for something like a object, the cryptographic requirements are
simply not there (i.e. the cached object does not need to be encrypted).

If there is concern over collisions, something like a bloom filter or a
cuckoo hash could be used. Cuckoo hashing - Wikipedia

–J

Igor S. wrote:

I’m not expert in cryptography, but it seems to me that 128-bit MD5
should have though not 2^64 times less collisions, but at least 10^9
times
less than any 64-bit hash.

FWIW, I got a follow up reply from Austin (the chap that created
murmurhash). He sez:

“MurmurHash2, MD5, Lookup3, and any cryptographic hash all have
statistically-correct collision behaviour - the average number of
collisions they create on a given keyset is the same as if the hash
function returned a totally random number. For comparison, FNV mixes
very poorly and tends to produce massively more collisions than expected
on “sparse” keys with only a few bits set, and SuperFastHash will tend
to produce a higher than expected number of collisions on keysets where
keys differ in only 2-3 bits - see
http://murmurhash.googlepages.com/statistics and the associated pages
for more info.”

So it looks like collision-wise, it’s comparable.

–J

Johan Bergström wrote:

Hey,

On Apr 21, 2009, at 17:46 , Joe Bofh wrote:

I guess if you put in an option to specify the hash, that would work
too. FNV and Murmur have been adopted on various open source
projects as
optional hashs because of the hashing performance.

For reference; libmemcached - another high performing project -
nowadays include most current hashing algorithms. Fnv1a and murmur are
two of them. See:
http://hg.tangent.org/libmemcached/file/048b32d9d1ae/docs/memcached_behavior.pod
(sorry for bad linkage, the site docs/manpage hasn’t been updated
for a while)

Two my knowledge, murmur/murmur2 is the best pick for a high
performance/low (but not zero) collision tradeoff.

On Fri, May 15, 2009 at 07:29:57AM +0200, Joe Bofh wrote:

Igor,

Sure. But the question is how long will the entry stay in the cache? If
it’s a fairly finite amount of time, the possibility of a collision
drops dramatically.

A cache entry may stay for years :slight_smile:

I was suggesting murmurhash as it is significantly faster than md5/sha1
and for something like a object, the cryptographic requirements are
simply not there (i.e. the cached object does not need to be encrypted).

As to speed, benchmarks usually test algorithms with preloaded cache,
but in real life code works with non cached data and memory latency
descreases difference.