This is primarily aimed at Grzegorz N., the author of the “fair”
proxy balancer patch for Nginx, but I’m posting this here in case
others want to chip in.
I posted the following on Ezra Z.'s blog recently, in
conjunction with the announcement of the patch. After I posted this,
we have been seeing some rather more extreme examples of non-uniform
request distributions, with some mongrels piling up lots and lots of
connections while other sit completely idle.
We have been running this patch on a live Rails site for a couple of
weeks. We switched from Lighttpd + FastCGI to Nginx + Mongrel for a
couple of technical reasons I won’t go into here. Generally
performance has been worse, but I have been unable to pin down what’s
wrong. From what I can see, the fair patch is not working
consistently. A large portion of the requests will go to a mongrel
which is already processing a request. Here is an output from “ps” on
one of our boxes:
Mongrel is running with a custom extension I have written that extends
the process title with status information. The three numbers are the
port, the number of concurrent requests ,and total number of requests
processed during the mongrel’s lifetime What is apparent from this
output is that a bunch of the mongrels are generally not used. This
would not be a problem if several other mongrels were not being forced
to process multiple concurrent requests. Because of the giant Rails
lock, this means certain requests will be queued after other requests,
which impairs response time. (We have a lot of fairly slow requests,
in the 5-10-second range.)
On Mon, Dec 03, 2007 at 02:10:44PM +0100, Alexander S. wrote:
This is primarily aimed at Grzegorz N., the author of the “fair”
proxy balancer patch for Nginx, but I’m posting this here in case
others want to chip in.
Well, here I am. Bullseye
I posted the following on Ezra Z.'s blog recently, in
conjunction with the announcement of the patch. After I posted this,
we have been seeing some rather more extreme examples of non-uniform
request distributions, with some mongrels piling up lots and lots of
connections while other sit completely idle.
The standard question – have you tried the latest snapshot? (though
it might not be any different, asking just in case). Also, as you
mention 5
10 second requests, please increase:
#define FS_TIME_SCALE_OFFSET 1000
in file src/http/modules/ngx_http_upstream_fair_module.c (line 407 in my
copy) to e.g. 20000. I’ll make it configurable without recompiling
nginx, too (or remove it at all, if I find an elegant solution).
Requests
running this long may confuse the module which might just result in the
behaviour you’re seeing.
If increasing FS_TIME_SCALE_OFFSET does not help, could you please
compile nginx --with-debug and gather the debug_http data?
The standard question – have you tried the latest snapshot? (though
it might not be any different, asking just in case). Also, as you mention 5
10 second requests, please increase:
I was using an older snapshot (there was no new snapshot at the time I
wrote my comment, I’m pretty sure).
The new version, in combination with FS_TIME_SCALE_OFFSET set to 60000
for good measure, seems to cause a more uniform the distribution. So
that’s a lot of help. Thanks!
Even so, sometimes the balancer seems to go into a state where it’s
not using all mongrels:
The first mongrel, as you can see, has 433 requests queued. This is
something that happened during the night.
Some of these requests time out; some of these requests are very
expensive legacy feeds that have never been optimized. Does the
balancer penalize upstreams that time out a lot, by any chance? Is
there a way to force the algorithm not using any weighting, but always
schedule connections to the upstream with the fewest queued requests?
If increasing FS_TIME_SCALE_OFFSET does not help, could you please
compile nginx --with-debug and gather the debug_http data?
I love very much the way how it recalculates where the request should go
on
the fly via its “candidacy functions”. Of course I would love to see
something similiar in nginx.
That would require a method of comunication between upstream servers
(php-cgi, mongrels, whatever) and nginx, as upstreams should be able to
inform nginx about their state and the state of the machine they’re on.
Anyone interested in implementing something in this direction?