Nginx worker process hang,cpu load 100%

Hi,
I have faced a trouble with nginx runs as a http revers proxy server,the
worker process sometimes hanging there, cpu usage up to 100%,it’s never
recovey until i kill the process,below is the detail informations:

system environment:
[root@host-22 ~]# lsb_release -a
LSB Version:
:core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
Distributor ID: CentOS
Description: CentOS release 5.5 (Final)
Release: 5.5
Codename: Final
[root@host-22 ~]#
[root@host-22 ~]# uname -a
Linux host-22 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64
x86_64 x86_64 GNU/Linux

nginx version:
[root@host-22 ~]# /usr/local/nginx/sbin/nginx -V
nginx: nginx version: nginx/1.0.4
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-46)
nginx: TLS SNI support disabled
nginx: configure arguments: --user=www --group=www
–prefix=/usr/local/nginx --with-http_stub_status_module
–with-http_ssl_module --with-openssl-opt=enable-tlsext
–with-http_sub_module --with-cc-opt=-O2 --with-cpu-opt=opteron
[root@host-22 ~]#
(also tested under 1.0.6 and 1.0.9,have the same problem)

nginx config( nginx runs as a http revers proxy server):
worker_processes 8;

events {
use epoll;
worker_connections 5120;
}

http {
sendfile on;
keepalive_timeout 15;

upstream 2012_servers {
server 10.0.7.5:80 max_fails=2 fail_timeout=30s;
server 10.0.7.6:80 max_fails=2 fail_timeout=30s;
server 10.0.7.7:80 max_fails=2 fail_timeout=30s;
server 10.0.7.8:80 max_fails=2 fail_timeout=30s;
}

server {
    listen       80;
    server_name  test.2012.com ;
    ...

    location / {
            include proxy.conf;
            proxy_pass    http://2012_servers;
    }

}

trouble:
[root@host-22 ~]# ps aux|grep -e CPU -e nginx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND
root 936 0.0 0.0 49328 7572 ? Ss Nov11 2:37 nginx:
master process /usr/local/nginx/sbin/nginx
www 1130 99.9 0.0 55764 13472 ? R Nov11 2664:28 nginx:
worker process
www 1216 99.9 0.0 53668 11092 ? R Nov11 2660:23 nginx:
worker process
www 31057 0.0 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process
www 31058 0.0 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process
www 31059 0.0 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process
www 31060 0.0 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process
www 31061 0.0 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process
www 31062 0.8 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process
www 31063 0.1 0.0 50816 9012 ? S 19:40 0:00 nginx:
worker process
www 31064 0.2 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process

two nginx worker processes(pid 1130,1216) are hanging. there is nothing
significant message i can found in error.log or strace (-p 1130|1216).

Grateful for any advice.

thanks.

Posted at Nginx Forum:

Hello!

On Sun, Nov 13, 2011 at 07:06:02AM -0500, Long Wan wrote:

I have faced a trouble with nginx runs as a http revers proxy server,the
worker process sometimes hanging there, cpu usage up to 100%,it’s never
recovey until i kill the process,below is the detail informations:

[…]

www 31057 0.0 0.0 50816 8820 ? S 19:40 0:00 nginx:
worker process

[…]

two nginx worker processes(pid 1130,1216) are hanging. there is nothing
significant message i can found in error.log or strace (-p 1130|1216).

Please try attaching to a runaway process with gdb and check where
it loops, i.e.

gdb /path/to/nginx
bt
n
… (repeat ‘n’ several times to see loop)

Maxim D.

Hello Maxim,Thanks for your reply, I tried gdb as you tolde me , it
reported something :

[root@host-22 ~]# gdb /usr/local/nginx/sbin/nginx 1130
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /usr/local/nginx/sbin/nginx…done.
Attaching to program: /usr/local/nginx/sbin/nginx, process 1130
Reading symbols from /lib64/libpthread.so.0…(no debugging symbols
found)…done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libpcre.so.0…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libpcre.so.0
Reading symbols from /lib64/libssl.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libssl.so.6
Reading symbols from /lib64/libcrypto.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libcrypto.so.6
Reading symbols from /lib64/libdl.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libz.so.1…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/libc.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libgssapi_krb5.so.2…(no debugging
symbols found)…done.
Loaded symbols for /usr/lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libkrb5.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /usr/lib64/libk5crypto.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib64/libk5crypto.so.3
Reading symbols from /usr/lib64/libkrb5support.so.0…(no debugging
symbols found)…done.
Loaded symbols for /usr/lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libsepol.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libsepol.so.1
Reading symbols from /lib64/libnss_files.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libnss_files.so.2
ngx_http_upstream_get_round_robin_peer (pc=0x78166f0, data=) at src/http/ngx_http_upstream_round_robin.c:413
413 src/http/ngx_http_upstream_round_robin.c: No such file or
directory.
in src/http/ngx_http_upstream_round_robin.c
(gdb) bt
#0 ngx_http_upstream_get_round_robin_peer (pc=0x78166f0, data=) at src/http/ngx_http_upstream_round_robin.c:413
#1 0x000000000041a8fc in ngx_event_connect_peer (pc=0x78166f0) at
src/event/ngx_event_connect.c:24
#2 0x000000000043d1e8 in ngx_http_upstream_connect (r=0x7cf6310,
u=0x78166e0) at src/http/ngx_http_upstream.c:1089
#3 0x000000000043ea3a in ngx_http_upstream_init_request (r=0x7cf6310)
at src/http/ngx_http_upstream.c:628
#4 0x0000000000435185 in ngx_http_read_client_request_body
(r=0x7cf6310, post_handler=0x43eec0 <ngx_http_upstream_init>)
at src/http/ngx_http_request_body.c:153
#5 0x0000000000456b46 in ngx_http_proxy_handler (r=0x7cf6310) at
src/http/modules/ngx_http_proxy_module.c:617
#6 0x000000000042b15c in ngx_http_core_content_phase (r=0x7cf6310,
ph=0x7b5a4e0) at src/http/ngx_http_core_module.c:1339
#7 0x0000000000426817 in ngx_http_core_run_phases (r=0x7cf6310) at
src/http/ngx_http_core_module.c:837
#8 0x000000000042f6d6 in ngx_http_process_request (r=0x7cf6310) at
src/http/ngx_http_request.c:1650
#9 0x0000000000430314 in ngx_http_process_request_line (rev=0x7c65578)
at src/http/ngx_http_request.c:893
#10 0x0000000000420a04 in ngx_epoll_process_events (cycle=, timer=, flags=)
at src/event/modules/ngx_epoll_module.c:635
#11 0x0000000000419bad in ngx_process_events_and_timers
(cycle=0x784e770) at src/event/ngx_event.c:245
#12 0x000000000041f528 in ngx_worker_process_cycle (cycle=0x784e770,
data=) at src/os/unix/ngx_process_cycle.c:800
#13 0x000000000041dc89 in ngx_spawn_process (cycle=0x784e770,
proc=0x41f470 <ngx_worker_process_cycle>, data=0x0, name=0x4652e1
“worker process”,
respawn=-4) at src/os/unix/ngx_process.c:196
#14 0x000000000041eb0b in ngx_start_worker_processes (cycle=0x784e770,
n=8, type=-4) at src/os/unix/ngx_process_cycle.c:360
#15 0x000000000041fea8 in ngx_master_process_cycle (cycle=0x784e770) at
src/os/unix/ngx_process_cycle.c:249
#16 0x0000000000406069 in main (argc=1, argv=) at
src/core/nginx.c:405
(gdb) n

there is no output when type ‘n’, should i recompile nginx with
‘–with-debug’ configure option ?

Posted at Nginx Forum:

Hello!

On Sun, Nov 13, 2011 at 08:28:49PM -0500, Long Wan wrote:

Hello Maxim,Thanks for your reply, I tried gdb as you tolde me , it
reported something :

[…]

(gdb) bt
#0 ngx_http_upstream_get_round_robin_peer (pc=0x78166f0, data=) at src/http/ngx_http_upstream_round_robin.c:413
#1 0x000000000041a8fc in ngx_event_connect_peer (pc=0x78166f0) at
src/event/ngx_event_connect.c:24
#2 0x000000000043d1e8 in ngx_http_upstream_connect (r=0x7cf6310,
u=0x78166e0) at src/http/ngx_http_upstream.c:1089
[…]
(gdb) n

there is no output when type ‘n’, should i recompile nginx with
‘–with-debug’ configure option ?

This looks very similar to this problem, fixed in 1.1.1/1.0.7:

*) Bugfix: nginx hogged CPU if all servers in an upstream were 

marked as
“down”.

Are you sure you see the same problem in 1.0.9?

Maxim D.

Hello!

On Mon, Nov 14, 2011 at 09:37:29AM -0500, Long Wan wrote:

[…]

I found i made a mistake in nginx.conf, i include a virtual host
configuation like this:
upstream test_servers {
#server 10.0.7.4:80 ;
server 10.0.7.5:80 backup;
#server 10.0.7.6:80 ;
#server 10.0.7.7:80 ;
}

[…]

there was only one server in upstream,which marked ‘backup’. after some
test,i found this is the reason.

Yes, thank you for report. This is somewhat known issue, ‘backup’
handling needs attention.

Maxim D.

Do you plan to fix this issue in next release?

Posted at Nginx Forum:

Hello,Maxim. Thanks for you help.

I reproduce the problem in nginx-1.0.9,

[root@host-22 ~]# /usr/local/nginx/sbin/nginx -V
nginx: nginx version: nginx/1.0.9
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-51)
nginx: TLS SNI support disabled
nginx: configure arguments: --user=www --group=www
–prefix=/usr/local/nginx --with-http_stub_status_module
–with-http_ssl_module --with-openssl-opt=enable-tlsext
–with-http_sub_module --with-cc-opt=-O2 --with-cpu-opt=opteron
–add-module=…/ngx_cache_purge-1.4
[root@host-22 ~]#
[root@host-22 ~]#
[root@host-22 ~]# ps aux|grep -e CPU -e nginx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND
root 569 0.0 0.0 49016 6884 ? Ss 21:59 0:00 nginx:
master process /usr/local/nginx/sbin/nginx
www 587 97.3 0.0 49380 7556 ? R 22:00 10:16 nginx:
worker process
www 588 93.8 0.0 49380 7556 ? R 22:00 9:54 nginx:
worker process
www 614 43.1 0.0 49412 7584 ? T 22:01 4:07 nginx:
worker process
root 781 0.0 0.1 95984 19684 pts/0 S+ 22:06 0:00 gdb
/usr/local/nginx/sbin/nginx 614
www 876 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
www 877 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
www 878 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
www 879 0.0 0.0 50504 8660 ? S 22:10 0:00 nginx:
worker process
www 880 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
www 881 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
www 882 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
www 883 0.0 0.0 50504 8464 ? S 22:10 0:00 nginx:
worker process
root 954 0.0 0.0 61168 788 pts/1 S+ 22:10 0:00 grep -e
CPU -e nginx
[root@host-22 ~]#

[root@host-22 ~]# gdb /usr/local/nginx/sbin/nginx 614
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /usr/local/nginx/sbin/nginx…done.
Attaching to program: /usr/local/nginx/sbin/nginx, process 614
Reading symbols from /lib64/libpthread.so.0…(no debugging symbols
found)…done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libpcre.so.0…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libpcre.so.0
Reading symbols from /lib64/libssl.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libssl.so.6
Reading symbols from /lib64/libcrypto.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libcrypto.so.6
Reading symbols from /lib64/libdl.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libz.so.1…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/libc.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libgssapi_krb5.so.2…(no debugging
symbols found)…done.
Loaded symbols for /usr/lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libkrb5.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /usr/lib64/libk5crypto.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib64/libk5crypto.so.3
Reading symbols from /usr/lib64/libkrb5support.so.0…(no debugging
symbols found)…done.
Loaded symbols for /usr/lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libsepol.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libsepol.so.1
Reading symbols from /lib64/libnss_files.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libnss_files.so.2
ngx_http_upstream_get_peer (pc=0x54ae260, data=) at
src/http/ngx_http_upstream_round_robin.c:632
632 if (reset++) {
(gdb) bt
#0 ngx_http_upstream_get_peer (pc=0x54ae260, data=) at src/http/ngx_http_upstream_round_robin.c:632
#1 ngx_http_upstream_get_round_robin_peer (pc=0x54ae260, data=) at src/http/ngx_http_upstream_round_robin.c:425
#2 0x000000000041a99c in ngx_event_connect_peer (pc=0x54ae960) at
src/event/ngx_event_connect.c:24
#3 0x000000000043d5a8 in ngx_http_upstream_connect (r=0x54c3b30,
u=0x54ae250) at src/http/ngx_http_upstream.c:1103
#4 0x000000000043ee0a in ngx_http_upstream_init_request (r=0x54c3b30)
at src/http/ngx_http_upstream.c:631
#5 0x00000000004354a5 in ngx_http_read_client_request_body
(r=0x54c3b30, post_handler=0x43f310 <ngx_http_upstream_init>)
at src/http/ngx_http_request_body.c:154
#6 0x00000000004572d6 in ngx_http_proxy_handler (r=0x54c3b30) at
src/http/modules/ngx_http_proxy_module.c:617
#7 0x000000000042b47c in ngx_http_core_content_phase (r=0x54c3b30,
ph=0x583ffd8) at src/http/ngx_http_core_module.c:1365
#8 0x0000000000426967 in ngx_http_core_run_phases (r=0x54c3b30) at
src/http/ngx_http_core_module.c:861
#9 0x000000000042fa66 in ngx_http_process_request (r=0x54c3b30) at
src/http/ngx_http_request.c:1665
#10 0x00000000004306a4 in ngx_http_process_request_line (rev=0x5843fc0)
at src/http/ngx_http_request.c:911
#11 0x0000000000419e86 in ngx_event_process_posted (cycle=, posted=0x68bd88) at src/event/ngx_event_posted.c:39
#12 0x000000000041f608 in ngx_worker_process_cycle (cycle=0x560f670,
data=) at src/os/unix/ngx_process_cycle.c:801
#13 0x000000000041dd69 in ngx_spawn_process (cycle=0x560f670,
proc=0x41f550 <ngx_worker_process_cycle>, data=0x0, name=0x466581
“worker process”,
respawn=-4) at src/os/unix/ngx_process.c:196
#14 0x000000000041ebeb in ngx_start_worker_processes (cycle=0x560f670,
n=8, type=-4) at src/os/unix/ngx_process_cycle.c:360
#15 0x000000000041ff88 in ngx_master_process_cycle (cycle=0x560f670) at
src/os/unix/ngx_process_cycle.c:249
#16 0x00000000004060d9 in main (argc=1, argv=) at
src/core/nginx.c:405
(gdb) n
425 rrp->current =
ngx_http_upstream_get_peer(rrp->peers);
(gdb) n
435 if (!(rrp->tried[n] & m)) {
(gdb) n
460 if (pc->tries == 0) {
(gdb) n
464 if (–i == 0) {
(gdb) n
425 rrp->current =
ngx_http_upstream_get_peer(rrp->peers);
(gdb) n
435 if (!(rrp->tried[n] & m)) {
(gdb) n
460 if (pc->tries == 0) {
(gdb) n
464 if (–i == 0) {
(gdb) n
425 rrp->current =
ngx_http_upstream_get_peer(rrp->peers);
(gdb) n
435 if (!(rrp->tried[n] & m)) {
(gdb) n
460 if (pc->tries == 0) {
(gdb) n
464 if (–i == 0) {
(gdb)

I found i made a mistake in nginx.conf, i include a virtual host
configuation like this:
upstream test_servers {
Server 10.0.7.4:80 ;
server 10.0.7.5:80 backup;
Server 10.0.7.6:80 ;
Server 10.0.7.7:80 ;
}

server {
    listen       80;
    server_name  test.org ;

    access_log  /data1/logs/$host.access.log  main;

    location / {
            include proxy.conf;
            proxy_pass    http://test_servers;
    }


}

there was only one server in upstream,which marked ‘backup’. after some
test,i found this is the reason.

but when i test the nginx.conf syntax by using nginx -t,the result is
ok.

[root@host-22 ~]# /usr/local/nginx/sbin/nginx -t
nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is
ok
nginx: configuration file /usr/local/nginx/conf/nginx.conf test is
successful
[root@host-22 ~]#

i think nginx should warn me when that situation,haha…

Posted at Nginx Forum: