Nginx recycles workers too often

mindfrost82 · March 1, 2010, 6:14pm

A couple of months ago I posted a topic because I thought my php-cgi
processes were recycling too often (you can view it here
http://forum.nginx.org/read.php?2,45516). After trying to troubleshoot
this issue for months and not getting anywhere, I now think its nginx
that is stalling or recycling, not php-cgi. I test this with a static
html page and kept refreshing it. At the same time my vbulletin forum
would take a long time to load, the same thing happened with the html
page. If I’m watching the top command, when this delay happens both
nginx and php-cgi will disappear from the list then reappear a few
seconds later.

When this happens, there are delays when browsing the sites of anywhere
from 3 to 5+ seconds while the workers seem to reload. Normally, nginx
with php-cgi is extremely fast for me, so when you’re used to <1 second
load times and it seems to delay with 5+ seconds sometimes, its very
noticeable. I could understand if this happens once every 5 or 10
minutes, but its happening 2-3 times per minute, sometimes even more
than that. We average about 1,000 concurrent users at any given time.
Usually anywhere from 800 to 1200+. We are running a quad core
processor with about 4gb of RAM on a dedicated server. We have nginx,
php-cgi and MySQL running on this machine. The PHP sites this server
runs are a Wordpress blog (for news), a vBulletin forum, and a rarely
used MediaWiki.

I have php-cgi running with 8 children right now, which seems to be
fine. I’ve tried 16 and even 32 and it didn’t change anything.

I usually leave my error_log to crit only, but I did turn it on debug
today. There’s nothing out of the ordinary going on in the logs
whenever this happens. It’ll show normal KeepAlive connections closed,
or connection to upstream closed by client, etc.

Any ideas are helpful. Here’s my nginx.conf file. I’ve included all of
my vhosts in the one file so its easier to read while I work on this
issue.

user nobody;
worker_processes 4;
error_log logs/error.log crit;
worker_rlimit_nofile 16384;

events {
worker_connections 1024; # you might need to increase this setting for
busy servers
#use rtsig; # Linux kernels 2.6.x change to epoll
use epoll; # Linux kernels 2.6.x change to epoll
multi_accept off;
}

http {
include mime.types;
#default_type application/octet-stream;

server_tokens off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
client_max_body_size 50m; # More if you need up upload files larger
than 50mb
client_header_buffer_size 8k;

keepalive_timeout 5;
#keepalive_requests 0;

gzip on;
gzip_min_length 1100;
gzip_buffers 16 8k;
gzip_http_version 1.0;
gzip_vary on;
gzip_comp_level 1;
gzip_proxied any;
gzip_types text/plain text/html text/css application/x-javascript
text/xml text/javascript;
#Disable gzip for certain browsers.
gzip_disable â€œMSIE [1-6].(?!.*SV1)â€;
ignore_invalid_headers on;

server_names_hash_max_size 4096;
server_names_hash_bucket_size 128;

limit_zone gulag $binary_remote_addr 5m;

client_header_timeout 2m;
client_body_timeout 2m;
client_body_buffer_size 128k;
send_timeout 2m;
connection_pool_size 256;
large_client_header_buffers 4 8k;
request_pool_size 4k;
output_buffers 1 32k;
postpone_output 1460;

fastcgi_buffers 64 4k;
fastcgi_buffer_size 64k;
fastcgi_read_timeout 240s;

#################### www.domain.com ####################
server {
access_log off;
limit_conn gulag 50;
#error_log logs/vhost-error_log warn;
listen 80 default;
server_name www.domain.com domain.com;
root /home/domain/public_html;
index index.php index.html index.htm;
#rewrite ^(.*)$ $scheme://www.domain.com$1 permanent;

pass the PHP scripts to FastCGI server

    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:8888;
   #fastcgi_pass    unix:/var/run/nginx-fcgi.sock;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME

/home/domain/public_html$fastcgi_script_name;
include /usr/local/nginx/conf/fastcgi_params;
fastcgi_intercept_errors on;
}

location / {
if (!-e $request_filename) {
rewrite ^.*$ /index.php last;
}
}
}

#################### forum.domain.com ####################
server {
access_log off;
limit_conn gulag 50;
#error_log logs/vhost-error_log warn;
listen 80;
server_name forum.domain.com www.forum.domain.com;
root /home/domain/public_html/forums;
index index.php index.html index.htm;
#rewrite ^(.*)$ $scheme://forum.domain.com$1 permanent;

location /archive/ {
if ($request_filename ~ “.html$”) {
rewrite ^/(.)/f-([0-9]+).html$ /$1?f-$2.html last;
rewrite ^/(.)/t-([0-9]+).html$ /showthread.php?t=$2 permanent;
}
}

pass the PHP scripts to FastCGI server

    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:8888;
   #fastcgi_pass    unix:/var/run/nginx-fcgi.sock;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME

/home/domain/public_html/forums$fastcgi_script_name;
include /usr/local/nginx/conf/fastcgi_params;
fastcgi_intercept_errors on;
}
}

#################### wiki.domain.com ####################
server {
access_log off;
limit_conn gulag 50;
#error_log logs/vhost-error_log warn;
listen 80;
server_name wiki.domain.com www.wiki.domain.com;
root /home/domain/public_html/wiki;
index index.php index.html index.htm;
#rewrite ^(.*)$ $scheme://forum.domain.com$1 permanent;

pass the PHP scripts to FastCGI server

    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:8888;
   #fastcgi_pass    unix:/var/run/nginx-fcgi.sock;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME

/home/domain/public_html/wiki$fastcgi_script_name;
include /usr/local/nginx/conf/fastcgi_params;
fastcgi_intercept_errors on;
}
}

}

Thanks in advance!

Posted at Nginx Forum:

mindfrost82 · March 2, 2010, 4:04am

I usually leave my error_log to crit only, but I did turn it on debug
today. There’s nothing out of the ordinary going on in the logs whenever
this happens. It’ll show normal KeepAlive connections closed, or
connection to upstream closed by client, etc.

Do you see “worker process exited on signal XX” in your error logs
(visible
even on critical level)? If you do then please post your debug logs from
that child prior to its crash. If you don’t then blame PHP

Best regards,
Piotr S. < [email protected] >

mindfrost82 · March 2, 2010, 5:09am

Piotr S. Wrote:

even on critical level)? If you do then please
nginx mailing list
[email protected]
nginx Info Page

I do not show that in any of my logs. If I have it set to crit, nothing
ever shows in the log except when I start the process. The reason I
don’t think its PHP is because the delay happens when I refresh a
standard HTML page.

I’m completely lost as to what’s going on. If I’m viewing top then
nginx and php-cgi disappears from the list, but that’s because its
cpu/ram usage goes down. If I view the running processes from within my
control panel (WHM), nginx and php-cgi never actually goes away. So its
almost like the requests are being held up somewhere for 5-10 seconds
and nginx resources goes down because its not actually getting the
requests. Any idea if this is possible and what the cause would be?

Posted at Nginx Forum:

mindfrost82 · March 2, 2010, 7:03am

So its almost like the requests are being held up somewhere for 5-10
seconds and nginx resources goes down because its not actually getting the
requests. Any idea if this is possible and what the cause would be?

Check debug logs, they’ve got timestamps, so if there is 5-10s gap in
the
request processing then you should notice it. Following your “stuck”
request
shouldn’t be hard.

Best regards,
Piotr S. < [email protected] >

mindfrost82 · March 2, 2010, 10:46pm

On Tue, Mar 2, 2010 at 5:14 PM, mindfrost82 [email protected]
wrote:

Apparently that’s not the issue either. The longest difference in my debug logs is 2 seconds, otherwise there’s an entry for every second. Any other ideas? I’m completely lost on this issue.

tcpdump is your friend. Try to capture all the traffic from your nginx
machine and your PHP backend. That must reveal what exactly is going
on.

mindfrost82 · March 2, 2010, 5:14pm

Piotr S. Wrote:

Following your “stuck” request
nginx Info Page
Apparently that’s not the issue either. The longest difference in my
debug logs is 2 seconds, otherwise there’s an entry for every second.
Any other ideas? I’m completely lost on this issue.

Posted at Nginx Forum:

mindfrost82 · March 30, 2010, 4:17am

I believe you should tweak PHP_FCGI_MAX_REQUESTS. You can try 5000 for
instance. In order to get this working you need to kill all (stop nginx,
pcp_cgi and killall nginx, the same with php_cgi or whatever you called
the init.d script) the processes running and start php_cgi and nginx
again. I’m saying this because I just read your other thread where it
does nothing PHP_FCGI_MAX_REQUESTS on your server.

I also run a wp blog and vb forum on my site. I’m getting over 170k
uniques per day. I run mysqld on a separate server. I have 8 workers. 15
children. I’m currently tweaking PHP_FCGI_MAX_REQUESTS.1000 seems to be
too low for me since php-cgi processes get restarted too often.

Posted at Nginx Forum: