Experiences using nginx to front-end apache

I thought I’d document our experiences trying to run nginx to
front-end a couple of apache servers. This is a experimental setup in
a lab to test the max. performance and as such our results may not
apply to you.
The test setup and load is described in
http://blogs.sun.com/shanti/entry/olio_on_nehalem.
We can drive a steady 5000 connection load against each apache
instance using a keepalive timeout of 10 seconds. This results in over
4500 httpd processes and obviously consumes a huge amount of memory.

Our intention in using nginx was to see if we can cut down the number
of apache processes - we expected that the additonal network latency
may not add much to the response time if we could get a corresponding
boost in performance by the system having to handle far less processes
with a reduction in memory to boot.
But in reality, nginx uses http 1.0 to connect to apache causing the
connection to be closed after every request (which I’m sure all of you
know). This results in a severe pressure on port availability as
closed ports go into time_wait for a period of 60 seconds (system
default). Tuning it down to 5 seconds, I could get 300 users to
connect but 500 failed. So I reduced the time_wait_interval further
down to 1 second. This didn’t work either as all the connections
simply went into CLOSE_WAIT and the clients were stuck without any
ports to connect to and I had to lower the smallest_anon_port. At this
point, I could get all 500 connections to stay up.

After all this struggle and tuning, I’m giving up. I just can’t
imagine how I will get this to scale to 5000 connections. I can
achieve the same reduction in apache processes if I simply turn off
keepalive or reduce it to a very low value (say 1 second). Essentially
with keepalive off, apache will behave exactly the same way in
handling clients directly as having nginx in front.

I can certainly see value in using nginx as a load balancer if one had
multiple small apache instances running. But for larger configurations
which tend to use a more sophisticated load balancer, I’m not exactly
sure what the value is ? I do understand that nginx does well serving
static content and one can certainly use that by re-directing all
static requests to an nginx instance. My question specifically is
w.r.t using nginx in front of apache.

Comments ?
Shanti

There is never time to do it right, but always time to do it over.
— Murphy’s law of computing

know). This results in a severe pressure on port availability as
closed ports go into time_wait for a period of 60 seconds (system

We experienced this ages back, but in relation to IMAP connections
rather
than web ones. A number of IMAP clients are really bad at
disconnecting/reconnecting frequently, causing the same TIME_WAIT
problem.

Two solutions we used that both work:

  1. On linux, enable tcp_tw_reuse
  2. Bind to multiple IPs and distribute connections across them

achieve the same reduction in apache processes if I simply turn off
keepalive or reduce it to a very low value (say 1 second). Essentially
with keepalive off, apache will behave exactly the same way in
handling clients directly as having nginx in front.

No. You’re assuming that your clients are fast and on a local network
and
can push/pull data effectively “instantly”.

The reality is that clients are at some distance from your server, so
there’s time for them to send the requests (especially any large POST
ones),
and to read back the results. That’s what nginx will hide, because it
will
wait until the entire request is available before connecting and pushing
it
to apache, and then it will slurp back the entire result and close the
apache connection, and then trickle it to the client, freeing the apache
process for another request. Even with keepalive off, this is quite
different to just having apache directly accept the requests, and with
keep
alive on, it’s vastly different.

So the point of nginx as a proxy is that it handles the keep alives +
buffering in both directions, which reduces the number of actual apache
processes required to only those actually doing any work, they never
have to
handle client network latencies/keepalives.

Rob