Consistent hashing for upstreams

dubstep · March 19, 2012, 5:54pm

Has anybody used ngx_http_upstream_consistent_hash with newer nginx
releases (>1.0)?

If so, is it possible to use ngx_http_upstream_consistent_hash with
HTTP-based back-ends, or does it only work with memcahe backends? The
documentation isn’t at all clear.

We want to load-balance multiple static file servers behind nginx, and
basing the upstream chosen on consistent hash will drastically increase
the filesystem cache hit ratios the back-end servers, while preventing a
“thundering herd” issue when one fails. Basically you get to use all of
your RAM amongst the back ends for cache, rather than having the same
objects in the cache of multiple back end servers.

I thought of trying to use map directives and regexes to pick a back
end, but handling weighting and failover makes this a complex and
therefore brittle approach. Could consistent hashing for upstreams be
achieved with embedded lua or perl?

I know HAproxy has consistent hashing features for this use case (as do
almost all commercial load balancers), but I would prefer not adding
layers to my stack if it can be avoided. We’re already using nginx and
proxy_cache and it has been rock-solid and stable for us.

Posted at Nginx Forum:

rmalayter · March 19, 2012, 8:10pm

achieved with embedded lua or perl?
Here’s a simple approach on consistent hashing with embedded perl:
2124034’s gists · GitHub

rmalayter · March 19, 2012, 11:09pm

The only potential issues I foresee are:

performance, as this perl will be called for 1000+ requests per
second, and there are going to be potentially many upstream blocks.
Maybe Digest::MurmurHash would help with performance instead of MD5
(it’s supposedly 3x faster in Perl than Digest::MD5 while using far less
state). A native hash ring implementation in C would obviously be far
more performant.

Couple of microseconds per request isn’t something to worry about here.

a single backup server is problematic, but that can be fixed by
adding more backups to the upstream blocks I think, or doing an error
location that hashes again to find a new upstream. Not sure if a server
being down would cause it to fail inside all upstream blocks it appears
though, which might mean some very slow responses when a server goes
offline.

But at least it’s simple.

Perl module is still marked as experimental, which scares me

Don’t be scared, it’s not really experimental. Build it with
relatively modern perl and you’ll be fine.

rmalayter · March 19, 2012, 10:16pm

Alexandr G. Wrote:

Here’s a simple approach on consistent hashing
with embedded perl:
2124034’s gists · GitHub

Interesting. Clearly one could generate the upstream blocks via script.

The only potential issues I foresee are:
1) performance, as this perl will be called for 1000+ requests per
second, and there are going to be potentially many upstream blocks.
Maybe Digest::MurmurHash would help with performance instead of MD5
(it’s supposedly 3x faster in Perl than Digest::MD5 while using far less
state). A native hash ring implementation in C would obviously be far
more performant.
2) a single backup server is problematic, but that can be fixed by
adding more backups to the upstream blocks I think, or doing an error
location that hashes again to find a new upstream. Not sure if a server
being down would cause it to fail inside all upstream blocks it appears
though, which might mean some very slow responses when a server goes
offline.
3) Perl module is still marked as experimental, which scares me

I will give it a good long-term load test though, it might just be good
enough!

Thanks!

RPM

Posted at Nginx Forum:

rmalayter · March 20, 2012, 2:55am

Hi,

On Tue, Mar 20, 2012 at 12:54 AM, rmalayter [email protected]
wrote:

“thundering herd” issue when one fails. Basically you get to use all of
layers to my stack if it can be avoided. We’re already using nginx and
proxy_cache and it has been rock-solid and stable for us.

Please try this fork:

BTW, this module will be officially supported by the Tengine team soon.

Regards,