2048 worker_connections are not enough while connecting to upstream

I’m using nginx as a remote proxy server in front of some Tomcat
instances, with pretty high traffic. Suddenly yesterday afternoon we
started getting the error in the message subject, one I’d never seen
before, and it’s occurred tens of thousands of times since. Now, I’ve
upped the worker_connections count to a much higher figure and the
problem has gone away for now. But I’m concerned that it might be
indicative of something else - some kind of DOS attack, maybe? What does
this signify and why might we suddenly start getting it? Thanks.

John

Hello!

On Tue, May 10, 2011 at 10:31:06AM +0100, John M. wrote:

I’m using nginx as a remote proxy server in front of some Tomcat
instances, with pretty high traffic. Suddenly yesterday afternoon we
started getting the error in the message subject, one I’d never seen
before, and it’s occurred tens of thousands of times since. Now,
I’ve upped the worker_connections count to a much higher figure and
the problem has gone away for now. But I’m concerned that it might
be indicative of something else - some kind of DOS attack, maybe?
What does this signify and why might we suddenly start getting it?
Thanks.

Without proper history graphs (use stub_status, Luke!) it’s hard
to say if you are under attack or just your normal load reached
the bar (or your backend response time degraded and this resulted
in more connections used, or your network uplink has problems and
this caused more connections…).

On the other hand, 2048 is pretty low, and easily reached even
without keepalive. Usual production values for “high traffic”
sites is over 9000.

Please also note that using high worker_connections may also
require tuning of your OS to allow appropriate number of file
descriptors to be used.

Maxim D.

On 10/05/11 11:19, Maxim D. wrote:

the problem has gone away for now. But I’m concerned that it might
be indicative of something else - some kind of DOS attack, maybe?
What does this signify and why might we suddenly start getting it?
Thanks.

Without proper history graphs (use stub_status, Luke!) it’s hard
to say if you are under attack or just your normal load reached
the bar (or your backend response time degraded and this resulted
in more connections used, or your network uplink has problems and
this caused more connections…).

I’ll look into these.

On the other hand, 2048 is pretty low, and easily reached even
without keepalive. Usual production values for “high traffic”
sites is over 9000.

OK, I may bump it up a bit higher than the 8192 I set it as today.

Please also note that using high worker_connections may also
require tuning of your OS to allow appropriate number of file
descriptors to be used.

I made this change a while back, so we should be OK with that.

Thanks for your extremely lucid help!

John