Forum: NGINX hash algorithm for nginx cache

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Jb j. (Guest)
on 2009-04-18 22:27
Igor,

It looks like nginx now has the beginnings of caching which is exciting
as I am looking forward to comparing this with varnish.

I did notice in the docs that the key/filename is a md5 hash. Have you
looked at the recent MurmurHash/FNV-1a hashes?

They are at least 2x faster than md5 which would be useful in this
scenario.

See
http://search.cpan.org/~tmaesaka/Digest-MurmurHash...
http://murmurhash.googlepages.com/

for sample implementations.

--j
Marcus C. (Guest)
on 2009-04-19 00:38
(Received via mailing list)
Superfast hash is an excellent hash too

http://www.azillionmonkeys.com/qed/hash.html

Marcus.
Jb j. (Guest)
on 2009-04-19 07:42
MurmurHash is 2x faster than Superfasthash.

Marcus C. wrote:
> Superfast hash is an excellent hash too
>
> http://www.azillionmonkeys.com/qed/hash.html
>
> Marcus.
Marcus C. (Guest)
on 2009-04-19 15:26
(Received via mailing list)
Cool, I'll check it out. :-)
Igor S. (Guest)
on 2009-04-21 11:27
(Received via mailing list)
On Sat, Apr 18, 2009 at 08:27:11PM +0200, Joe Bofh wrote:

>
> See
> 
http://search.cpan.org/~tmaesaka/Digest-MurmurHash...
> http://murmurhash.googlepages.com/
>
> for sample implementations.

Thank you for information, but as I understand this hash produce 32- or
64-bit hash. I believe it should have more collisions as compared to
md5,
which is 128-bit hash.
Jb j. (Guest)
on 2009-04-21 19:46
Igor,

If you look at the docs, it is supposed to have excellent collision
resistance.
See http://tanjent.livejournal.com/756623.html

Given that cache objects are supposed to have a limited shelf life
(days, months), I would think that collision resistance is vs. hashing
performance is a decent tradeoff.

I guess if you put in an option to specify the hash, that would work
too. FNV and Murmur have been adopted on various open source projects as
optional hashs because of the hashing performance.


--J

Igor S. wrote:
> On Sat, Apr 18, 2009 at 08:27:11PM +0200, Joe Bofh wrote:
>
>>
>> See
>> 
http://search.cpan.org/~tmaesaka/Digest-MurmurHash...
>> http://murmurhash.googlepages.com/
>>
>> for sample implementations.
>
> Thank you for information, but as I understand this hash produce 32- or
> 64-bit hash. I believe it should have more collisions as compared to
> md5,
> which is 128-bit hash.
Johan Bergström (Guest)
on 2009-04-21 20:32
(Received via mailing list)
Hey,

On Apr 21, 2009, at 17:46 , Joe Bofh wrote:

>
> I guess if you put in an option to specify the hash, that would work
> too. FNV and Murmur have been adopted on various open source
> projects as
> optional hashs because of the hashing performance.

For reference; libmemcached - another high performing project -
nowadays include most current hashing algorithms. Fnv1a and murmur are
two of them. See:
http://hg.tangent.org/libmemcached/file/048b32d9d1...
  (sorry for bad linkage, the site docs/manpage hasn't been updated
for a while)

Two my knowledge, murmur/murmur2 is the best pick for a high
performance/low (but not zero) collision tradeoff.
Jb j. (Guest)
on 2009-04-22 03:22
FWIW, I got a follow up reply from Austin (the chap that created
murmurhash). He sez:

"MurmurHash2, MD5, Lookup3, and any cryptographic hash all have
statistically-correct collision behaviour - the average number of
collisions they create on a given keyset is the same as if the hash
function returned a totally random number. For comparison, FNV mixes
very poorly and tends to produce massively more collisions than expected
on "sparse" keys with only a few bits set, and SuperFastHash will tend
to produce a higher than expected number of collisions on keysets where
keys differ in only 2-3 bits - see
http://murmurhash.googlepages.com/statistics and the associated pages
for more info."

So it looks like collision-wise, it's comparable.

--J


Johan Bergström wrote:
> Hey,
>
> On Apr 21, 2009, at 17:46 , Joe Bofh wrote:
>
>>
>> I guess if you put in an option to specify the hash, that would work
>> too. FNV and Murmur have been adopted on various open source
>> projects as
>> optional hashs because of the hashing performance.
>
> For reference; libmemcached - another high performing project -
> nowadays include most current hashing algorithms. Fnv1a and murmur are
> two of them. See:
> http://hg.tangent.org/libmemcached/file/048b32d9d1...
>   (sorry for bad linkage, the site docs/manpage hasn't been updated
> for a while)
>
> Two my knowledge, murmur/murmur2 is the best pick for a high
> performance/low (but not zero) collision tradeoff.
Igor S. (Guest)
on 2009-04-22 15:21
(Received via mailing list)
On Wed, Apr 22, 2009 at 01:22:52AM +0200, Joe Bofh wrote:

> to produce a higher than expected number of collisions on keysets where
> keys differ in only 2-3 bits - see
> http://murmurhash.googlepages.com/statistics and the associated pages
> for more info."
>
> So it looks like collision-wise, it's comparable.

I'm not expert in cryptography, but it seems to me that 128-bit MD5
should have though not 2^64 times less collisions, but at least 10^9
times
less than any 64-bit hash.
Jb j. (Guest)
on 2009-05-15 09:29
Igor,

Sure. But the question is how long will the entry stay in the cache? If
it's a fairly finite amount of time, the possibility of a collision
drops dramatically.

I was suggesting murmurhash as it is significantly faster than md5/sha1
and for something like a object, the cryptographic requirements are
simply not there (i.e. the cached object does not need to be encrypted).

If there is concern over collisions, something like a bloom filter or a
cuckoo hash could be used. http://en.wikipedia.org/wiki/Cuckoo_hashing

--J

Igor S. wrote:

>
> I'm not expert in cryptography, but it seems to me that 128-bit MD5
> should have though not 2^64 times less collisions, but at least 10^9
> times
> less than any 64-bit hash.
Igor S. (Guest)
on 2009-05-16 19:53
(Received via mailing list)
On Fri, May 15, 2009 at 07:29:57AM +0200, Joe Bofh wrote:

> Igor,
>
> Sure. But the question is how long will the entry stay in the cache? If
> it's a fairly finite amount of time, the possibility of a collision
> drops dramatically.

A cache entry may stay for years :)

> I was suggesting murmurhash as it is significantly faster than md5/sha1
> and for something like a object, the cryptographic requirements are
> simply not there (i.e. the cached object does not need to be encrypted).

As to speed, benchmarks usually test algorithms with preloaded cache,
but in real life code works with non cached data and memory latency
descreases difference.
This topic is locked and can not be replied to.