Worker processes in hung state after reload

dubstep · July 2, 2011, 2:01am

We use Nginx as a reverse proxy. After 2 or 3 reloads (kill -HUP), the
parent’s one child worker process remains “stuck”; the parent spawns a
new one which is also often “stuck.”

The number of worker_processes we have configured is 1. After a reload,
there are 2 workers in the process list. After another reload, 3
workers, and so on. Usually all of these workers are in state R. When we
strace them, we see no system calls, but they are sucking on user CPU.
They are unable to serve requests. One time (seen in detail below), a
new worker was spawned and it was not in state R, but state S, and was
able to serve requests, until we issued another reload.

Of course what should happen is, after a reload, we should only ever
have one worker process which is able to serve requests.

Details below, including our config. We can provide lsof, strace, debug
log of the reproduction case if needed.

#############

07/01 23:10[root@proxy ~]# cat /etc/redhat-release
CentOS release 5.6 (Final)

07/01 23:10[root@proxy ~]# uname -a
Linux proxy 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686
i686 i386 GNU/Linux

07/01 23:10[root@proxy ~]# nginx -V
nginx version: nginx/0.8.54
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-48)
TLS SNI support disabled
configure arguments: --user=nginx --group=nginx
–prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx
–conf-path=/etc/nginx/nginx.conf
–error-log-path=/var/log/nginx/error.log
–http-log-path=/var/log/nginx/access.log
–http-client-body-temp-path=/var/lib/nginx/tmp/client_body
–http-proxy-temp-path=/var/lib/nginx/tmp/proxy
–http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
–http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi
–http-scgi-temp-path=/var/lib/nginx/tmp/scgi
–pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx
–with-http_ssl_module --with-http_realip_module
–with-http_addition_module --with-http_xslt_module
–with-http_image_filter_module --with-http_sub_module
–with-http_gzip_static_module --with-http_random_index_module
–with-http_secure_link_module --with-http_degradation_module
–with-http_stub_status_module --with-http_perl_module --with-ipv6
–with-cc-opt=‘-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386
-mtune=generic -fasynchronous-unwind-tables’ --with-cc-opt=‘-O2 -g -pipe
-Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
–param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
-fasynchronous-unwind-tables’
–add-module=src/http/modules/nginx_syslog_patch
–add-module=src/http/modules/nginx_upstream_module
–add-module=src/http/modules/nginx_ajp_module

(Note above that we have a few third-party modules added for syslog and
upstream health checking.)

#############

From fresh start:

07/01 22:42[root@proxy ~]# /etc/init.d/nginx stop
Stopping nginx: [ OK ]
07/01 22:43[root@proxy ~]# ps wwwaxuf | grep nginx
root 23433 0.0 0.0 4884 704 pts/2 S+ 22:43 0:00 |
_ grep nginx

It’s not running. Now, start it:

07/01 22:43[root@proxy ~]# /etc/init.d/nginx start
Starting nginx: [ OK ]
07/01 22:43[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.0 13080 1564 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23464 0.0 0.1 13224 2124 ? S 22:43 0:00 _
nginx: worker process

07/01 22:44[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:45[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13612 4580 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23580 0.0 0.1 13744 2684 ? S 22:45 0:00 _
nginx: worker process

07/01 22:45[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:46[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13652 4600 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23787 0.0 0.1 13652 2600 ? S 22:46 0:00 _
nginx: worker process
07/01 22:46[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13652 4600 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23787 0.0 0.1 13652 2600 ? S 22:46 0:00 _
nginx: worker process

07/01 22:46[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:46[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 0.0 0.1 13784 2672 ? S 22:46 0:00 _
nginx: worker process
07/01 22:46[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 27.3 0.2 14336 4096 ? R 22:46 0:11 _
nginx: worker process

07/01 22:47[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:47[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 28.0 0.2 14336 4096 ? R 22:46 0:13 _
nginx: worker process
nginx 23907 0.0 0.1 13784 2620 ? S 22:47 0:00 _
nginx: worker process
07/01 22:47[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 29.4 0.2 14336 4096 ? R 22:46 0:15 _
nginx: worker process
nginx 23907 0.0 0.1 13784 2704 ? S 22:47 0:00 _
nginx: worker process
07/01 22:47[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 34.4 0.2 14336 4096 ? R 22:46 0:30 _
nginx: worker process
nginx 23907 0.0 0.1 13784 2916 ? S 22:47 0:00 _
nginx: worker process

07/01 22:49[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:49[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 38.1 0.2 14336 4096 ? R 22:46 1:12 _
nginx: worker process
nginx 24145 0.0 0.1 13784 2712 ? S 22:49 0:00 _
nginx: worker process
07/01 22:49[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:50[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4616 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 37.5 0.2 14336 4096 ? R 22:46 1:16 _
nginx: worker process
nginx 24185 38.6 0.1 13784 3424 ? R 22:50 0:01 _
nginx: worker process

07/01 22:50[root@proxy ~]# /etc/init.d/nginx reload
Reloading nginx: [ OK ]
07/01 22:50[root@proxy ~]# ps wwwaxuf | grep nginx | egrep -v
‘grep|strace’
root 23463 0.0 0.2 13784 4620 ? Ss 22:43 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 23830 35.5 0.2 14336 4096 ? R 22:46 1:20 _
nginx: worker process
nginx 24185 19.9 0.1 13784 3424 ? R 22:50 0:04 _
nginx: worker process
nginx 24318 31.0 0.2 13924 3620 ? R 22:50 0:01 _
nginx: worker process

#############

07/01 23:06[root@proxy ~]# egrep -v ‘^#|^$’ /etc/nginx/nginx.conf
user nginx;
worker_processes 1;
syslog local2 nginx;
error_log syslog:warn|/var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local]
“$request” ’
'$status $body_bytes_sent “$http_referer” ’
‘“$http_user_agent” “$http_x_forwarded_for”’;
access_log syslog:warn|/var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
gzip on;
server {
listen 80;
server_name _;
location /nginx-status {
stub_status on;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
location /upstream-status {
check_status;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
set $global_ssl_redirect ‘yes’;
if ($request_filename ~ “nginx-status”) {
set $global_ssl_redirect ‘no’;
}
if ($request_filename ~ “upstream-status”) {
set $global_ssl_redirect ‘no’;
}
if ($global_ssl_redirect ~* ‘^yes$’) {
rewrite ^ https://$host$request_uri? permanent;
break;
}
}
include upstream.conf;
server {
listen 443;
server_name _;
ssl on;
ssl_certificate certs/certchain.crt;
ssl_certificate_key certs/certchain.key;
ssl_protocols SSLv3 TLSv1;
ssl_ciphers HIGH;
set_real_ip_from 10.0.0.0/8;
real_ip_header X-Forwarded-For;
add_header X-Forwarded-For $proxy_add_x_forwarded_for;
add_header Cache-Control public;
location / {
proxy_pass http://appserver_http;
proxy_connect_timeout 10s;
proxy_next_upstream error timeout invalid_header http_500
http_503 http_502 http_504;
proxy_set_header Host $host;
if ($request_uri ~* “.(ico|css|js|gif|jpe?g|png)”) {
expires 365d;
break;
}
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
}
}

Posted at Nginx Forum:

adam · July 2, 2011, 3:46am

Use gdb attaching to the running worker, and provide the backstrace
information?

Posted at Nginx Forum:

adam · July 2, 2011, 9:37am

Hello!

On Fri, Jul 01, 2011 at 08:01:11PM -0400, adam wrote:

able to serve requests, until we issued another reload.
CentOS release 5.6 (Final)
–prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx
–with-http_addition_module --with-http_xslt_module
–add-module=src/http/modules/nginx_syslog_patch
–add-module=src/http/modules/nginx_upstream_module
–add-module=src/http/modules/nginx_ajp_module

(Note above that we have a few third-party modules added for syslog and
upstream health checking.)

Are you able to reproduce the problem without third party
modules/patches and with latest stable release (1.0.4)?

Maxim D.

adam · July 2, 2011, 5:58pm

Are you able to reproduce the problem without third party
modules/patches and with latest stable release (1.0.4)?

Yes, that was going to be my first stop unless somebody knew of a config
fix right away. I will give tha a try.

I had seen this in the changelog for version 1.0.0:
“Bugfix: a cache manager might hog CPU after reload.”

Is that what you’re thinking?

Posted at Nginx Forum:

adam · July 8, 2011, 8:30am

HI Adam,

That may be my fault. I’m not quite sure. Can your try the latest
development branch of nginx_upstream_check_module(
GitHub - yaoweibin/nginx_upstream_check_module at development)?

Thanks for your report.

2011/7/2 adam [email protected]

adam · July 11, 2011, 7:57pm

That may be my fault. I’m not quite sure. Can your try the latest
development branch of nginx_upstream_check_module(

GitHub - yaoweibin/nginx_upstream_check_module at development)?

Actually since I posted this question, we’ve been unable to reproduce
the condition, no matter how hard we try.

If we can reproduce the condition, I will try that branch. I still need
to redeploy with your check_status page patch.

Posted at Nginx Forum:

adam · July 2, 2011, 6:19pm

Hello!

On Sat, Jul 02, 2011 at 11:58:18AM -0400, adam wrote:

Are you able to reproduce the problem without third party
modules/patches and with latest stable release (1.0.4)?

Yes, that was going to be my first stop unless somebody knew of a config
fix right away. I will give tha a try.

I had seen this in the changelog for version 1.0.0:
“Bugfix: a cache manager might hog CPU after reload.”

Is that what you’re thinking?

No, this particular problem can’t affect you as you don’t have
cache enabled. I mostly suspect third party modules/patches.

Maxim D.