Consistent hashing for upstreams

Has anybody used ngx_http_upstream_consistent_hash with newer nginx
releases (>1.0)?

If so, is it possible to use ngx_http_upstream_consistent_hash with
HTTP-based back-ends, or does it only work with memcahe backends? The
documentation isn’t at all clear.

We want to load-balance multiple static file servers behind nginx, and
basing the upstream chosen on consistent hash will drastically increase
the filesystem cache hit ratios the back-end servers, while preventing a
“thundering herd” issue when one fails. Basically you get to use all of
your RAM amongst the back ends for cache, rather than having the same
objects in the cache of multiple back end servers.

I thought of trying to use map directives and regexes to pick a back
end, but handling weighting and failover makes this a complex and
therefore brittle approach. Could consistent hashing for upstreams be
achieved with embedded lua or perl?

I know HAproxy has consistent hashing features for this use case (as do
almost all commercial load balancers), but I would prefer not adding
layers to my stack if it can be avoided. We’re already using nginx and
proxy_cache and it has been rock-solid and stable for us.

Posted at Nginx Forum:

achieved with embedded lua or perl?
Here’s a simple approach on consistent hashing with embedded perl:
2124034’s gists · GitHub

The only potential issues I foresee are:

  1. performance, as this perl will be called for 1000+ requests per
    second, and there are going to be potentially many upstream blocks.
    Maybe Digest::MurmurHash would help with performance instead of MD5
    (it’s supposedly 3x faster in Perl than Digest::MD5 while using far less
    state). A native hash ring implementation in C would obviously be far
    more performant.

Couple of microseconds per request isn’t something to worry about here.

  1. a single backup server is problematic, but that can be fixed by
    adding more backups to the upstream blocks I think, or doing an error
    location that hashes again to find a new upstream. Not sure if a server
    being down would cause it to fail inside all upstream blocks it appears
    though, which might mean some very slow responses when a server goes
    offline.

But at least it’s simple.

  1. Perl module is still marked as experimental, which scares me

Don’t be scared, it’s not really experimental. Build it with
relatively modern perl and you’ll be fine.

Alexandr G. Wrote:

Here’s a simple approach on consistent hashing
with embedded perl:
2124034’s gists · GitHub

Interesting. Clearly one could generate the upstream blocks via script.

The only potential issues I foresee are:
1) performance, as this perl will be called for 1000+ requests per
second, and there are going to be potentially many upstream blocks.
Maybe Digest::MurmurHash would help with performance instead of MD5
(it’s supposedly 3x faster in Perl than Digest::MD5 while using far less
state). A native hash ring implementation in C would obviously be far
more performant.
2) a single backup server is problematic, but that can be fixed by
adding more backups to the upstream blocks I think, or doing an error
location that hashes again to find a new upstream. Not sure if a server
being down would cause it to fail inside all upstream blocks it appears
though, which might mean some very slow responses when a server goes
offline.
3) Perl module is still marked as experimental, which scares me

I will give it a good long-term load test though, it might just be good
enough!

Thanks!

RPM

Posted at Nginx Forum:

Hi,

On Tue, Mar 20, 2012 at 12:54 AM, rmalayter [email protected]
wrote:

“thundering herd” issue when one fails. Basically you get to use all of
layers to my stack if it can be avoided. We’re already using nginx and
proxy_cache and it has been rock-solid and stable for us.

Please try this fork:

BTW, this module will be officially supported by the Tengine team soon.

Regards,