Forum: NGINX Nginx capping at 320 requests per second

Posted by dpn (Guest)
on 2010-08-26 00:09
(Received via mailing list)
Hey, we run a website of fairly decent volume.. up to nearly 4m
pageviews a day.

At the moment we run a single machine with nginx and mysql and two
worker machines with memcached and tornado instances. The nginx server
is a reverse proxy to the workers and also serves static media.

The CPU load and memory usage on both of the worker boxes are well
within reasonable expectations.

What I am observing is that nginx gets to about 320 requests per second
then requests start backing up. Sometimes taking the server down, see
this image: http://dl.dropbox.com/u/367355/nginx.png

When the server doesn't go down, we see a flattening of requests around
the 320 mark, and the number of "waiting" requests and the memory usage
of nginx spikes considerably.

I've tried upping the number of workers in case all of them are blocking
for long enough to cause this cascading effect (the tornado db driver is
not async) but didn't really see an improvement by adding more. I've
also added lots of async memcached access to avoid hitting the db too
much.

I've included the configs below.. thanks for any help you may have!

[code]
user www-data;
worker_processes 4;
worker_rlimit_nofile 32768;


error_log  /dev/null crit;
pid        /var/run/nginx.pid;

events {
    worker_connections  8192;
    use epoll;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    access_log  /dev/null;

    sendfile        on;

    keepalive_timeout  0;
    tcp_nodelay        on;

    gzip  on;
    gzip_types text/css text/plain text/javascript
application/x-javascript application/json;
    gzip_comp_level 5;
    gzip_disable     "msie6";

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

[/code]


[code]
upstream bar {
 server worker1:8888 max_fails=1 fail_timeout=10s;
 server worker2:8888 max_fails=1 fail_timeout=10s;
}


server { # simple reverse-proxy
    listen       80;
    server_name  bar.net;
    #access_log   logs/bar.access.log;
    access_log /dev/null;


    location /nginx_status {
            stub_status on;
            access_log off;
            allow 127.0.0.1;
            deny all;
    }

    location ^~ /static/ {
        root /home/foo/bar;
        if ($query_string) {
                expires max;
            }
        }

    # pass requests for dynamic content to tornado
    location / {
            proxy_pass_header Server;
            proxy_set_header Host $http_host;
            proxy_redirect false;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Scheme $scheme;
            proxy_pass      http://tweete;
    }
    error_page 411 /411.html;
    location = /411.html {
         root  /home/foo/bar/static/error;
    }

    error_page 500 502 503 504  /500.html;
    location = /500.html {
         root  /home/foo/bar/static/error;
     }
  }

[/code]

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,123754#msg-123754
Posted by dpn (Guest)
on 2010-08-26 00:10
(Received via mailing list)
I should note that I'm running 15 workers on each worker box... I just
cut them out of the upstream section to make the config shorter.

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,123755#msg-123755
Posted by Cliff Wells (Guest)
on 2010-08-26 01:01
(Received via mailing list)
Are you certain it's Nginx and not Tornado?   You might try using

# issue warning if we block for over 200ms
tornado.ioloop.set_blocking_log_threshold (0.2)

Also you don't mention how many Tornado backends you have.   If you
don't have at least one Tornado backend per Nginx worker, you are
probably wasting your time trying to tune Nginx.

As an aside, you might check out ngx_postgres or ngx_drizzle for async
db access from Tornado (lets you use Tornado's async httpclient).

Cliff

On Wed, 2010-08-25 at 18:09 -0400, dpn wrote:
> What I am observing is that nginx gets to about 320 requests per second
> also added lots of async memcached access to avoid hitting the db too
> error_log  /dev/null crit;
> 
>     gzip_comp_level 5;
> upstream bar {
> 
>         if ($query_string) {
>             proxy_set_header X-Scheme $scheme;
>      }
> http://nginx.org/mailman/listinfo/nginx
--
Posted by dpn (Guest)
on 2010-08-26 01:42
(Received via mailing list)
Sorry, it seems my posts have been moderated.. I'll wait it out until
someone spots it. :)

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,123774#msg-123774
Posted by dpn (Guest)
on 2010-08-26 01:45
(Received via mailing list)
Hey Cliff, thanks for the reply. I mentioned in the second post to this
thread that I have a total of 30 workers, 15 on each machine... there
are 4 CPUs on each machine.. the extra processes are to pick up any
slack from blocking DB access.

I have indeed used the IO loop blocking debug... coming here really is a
last resort for me! The IOLoop debugging showed some areas I could
improve in, obviously the DB access is unavoidable, but there was also
some CPU intensive spots I could debug which I did. To avoid too many DB
accesses I'm using an async memcached driver. Now I'm in the situation
where the IOLoop debugging issues hardly any messages, the CPU usage is
fairly low, and I'm hardly touching the db!

That's why I'm here ... unfortunately I've covered the things you've
mentioned. :(

D

Cliff Wells Wrote:
-------------------------------------------------------
> probably wasting your time trying to tune Nginx.
> up to nearly 4m
> worker boxes are well
> flattening of requests around
> > also added lots of async memcached access to
> > 
> >     include       /etc/nginx/mime.types;
> >     gzip_types text/css text/plain
> > 
> > server { # simple reverse-proxy
> >             deny all;
> tornado
> >     location = /411.html {
> > 
> -- 
> 
> 
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://nginx.org/mailman/listinfo/nginx

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,123772#msg-123772
Posted by dpn (Guest)
on 2010-08-31 08:19
(Received via mailing list)
Hello again,  we added another machine with 15 workers and are still
getting the 320rps cap from nginx:
http://dl.dropbox.com/u/367355/nginxday.png

I've posted my config earlier... I don't suppose anyone has a suggestion
about what might be causing the issue?

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,125474#msg-125474
Posted by dpn (Guest)
on 2010-09-01 02:22
(Received via mailing list)
Kevin, thanks for your reply.. I've turned off keepalive because the app
is a mobile app with very simple js and css. There is very little reason
to have keepalive. I'll try putting it up to test though. Cheers!

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,125845#msg-125845
Posted by Piotr Sikora (Guest)
on 2010-09-01 02:29
(Received via mailing list)
Hi,

> Kevin, thanks for your reply.. I've turned off keepalive because the app
> is a mobile app with very simple js and css. There is very little reason
> to have keepalive. I'll try putting it up to test though. Cheers!

That .js and .css are good enough reason to have keepalive turned on.

Regarding your original issue: how much time does it take to generate 
single
response from Tornado? I'm asking, because you said that you've got 30
blocking workers, which means that if single response takes around 
100ms,
then you can handle only about 300req/s.

Best regards,
Piotr Sikora < piotr.sikora@frickle.com >
Posted by dpn (Guest)
on 2010-09-01 03:09
(Received via mailing list)
Piotr,

Most workers aren't blocking... they only block when they hit the db,
and we are doing lots of caching. At its peak mysql is registering 140
requests per second. I've added more workers which has had no effect on
the capacity going through nginx.. so either the DB itself is causing
problems (unlikely since it is such a simple schema with no joins) or
something is up with the nginx config.

Thanks for giving this some thought!

David

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,125864#msg-125864
Posted by dpn (Guest)
on 2010-09-01 03:15
(Received via mailing list)
post to subscribe to the thread.. please disregard.

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,125870#msg-125870
Posted by dpn (Guest)
on 2010-09-06 16:39
(Received via mailing list)
OK, I turned keepalive back on (5 seconds) and we are cooking with
gas... getting up to 400 requests per second. Thanks everyone :)

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,126986#msg-126986
Posted by dpn (Guest)
on 2010-09-06 16:39
(Received via mailing list)
Another update... with keepalive 5 the server gets to 450 requests per
second then does the weird levelling off thing I've mentioned before.

http://dl.dropbox.com/u/367355/nginx450.png

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,123754,127462#msg-127462
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.