After reload, nginx's CPU usage jumped to 100 percent(backtrace provided)

After reload, all the worker processes, including newly forked ones,
goes
straight to 100% CPU:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

17301 nobody 25 0 2670m 107m 1128 R 99.6 0.7 2:00.87 nginx

17306 nobody 25 0 2651m 86m 444 R 98.6 0.5 2:00.82 nginx
(cache
manager)

17303 nobody 25 0 2670m 106m 788 R 49.8 0.7 1:01.31 nginx

17302 nobody 25 0 2670m 106m 792 R 47.8 0.7 0:59.96 nginx

17305 nobody 25 0 2670m 106m 652 R 47.8 0.7 1:01.08 nginx
17304 nobody 25 0 2670m 106m 652 R 100.6 0.7 2:01.23 nginx

15759 nobody 25 0 2670m 197m 91m R 45.8 1.2 2:37.60 nginx
15761 nobody 25 0 2651m 172m 86m R 54.8 1.1 1:01.59 nginx
(cache
manager)
15760 nobody 25 0 2670m 197m 91m R 51.8 1.2 2:43.36 nginx
15756 nobody 25 0 2670m 197m 91m R 99.6 1.2 3:41.27 nginx

15757 nobody 25 0 2670m 197m 91m R 49.8 1.2 2:37.74 nginx

15758 nobody 25 0 2670m 197m 91m R 49.8 1.2 2:37.31 nginx

Using GDB, it seems worker processes stuck at this point:

(gdb) bt
#0 0x00000033ad4badd7 in sched_yield () from /lib64/libc.so.6
#1 0x000000000040e849 in ngx_shmtx_lock (mtx=0x2b3410964048) at
src/core/ngx_shmtx.c:136
#2 0x0000000000444712 in ngx_http_file_cache_update (r=0xe50f3b0,
tf=0x9e60270) at src/http/ngx_http_file_cache.c:831
#3 0x000000000043af8e in ngx_http_upstream_process_request
(r=0xe50f3b0) at
src/http/ngx_http_upstream.c:2657
#4 0x000000000043b08e in ngx_http_upstream_process_upstream
(r=0xe50f3b0,
u=0x9e5f1f8) at src/http/ngx_http_upstream.c:2599
#5 0x000000000043b192 in ngx_http_upstream_handler (ev=) at src/http/ngx_http_upstream.c:903
#6 0x0000000000418eaa in ngx_event_process_posted (cycle=, posted=0x7fff3f606520)
at src/event/ngx_event_posted.c:39
#7 0x0000000000418d6f in ngx_process_events_and_timers
(cycle=0xd2a7710) at
src/event/ngx_event.c:272
#8 0x000000000041e800 in ngx_worker_process_cycle (cycle=0xd2a7710,
data=)
at src/os/unix/ngx_process_cycle.c:800
#9 0x000000000041d137 in ngx_spawn_process (cycle=0xd2a7710,
proc=0x41e738
<ngx_worker_process_cycle>, data=0x0,
name=0x5652a1 “worker process”, respawn=-4) at
src/os/unix/ngx_process.c:196
#10 0x000000000041dea3 in ngx_start_worker_processes (cycle=0xd2a7710,
n=5,
type=-4) at src/os/unix/ngx_process_cycle.c:360
#11 0x000000000041f351 in ngx_master_process_cycle (cycle=0xd2a7710) at
src/os/unix/ngx_process_cycle.c:249
#12 0x0000000000404dda in main (argc=7, argv=0x972556d) at
src/core/nginx.c:405

Attaching to process 17301
(gdb) bt
#0 0x00000033ad4badd7 in sched_yield () from /lib64/libc.so.6
#1 0x000000000040e849 in ngx_shmtx_lock (mtx=0x2b3410964048) at
src/core/ngx_shmtx.c:136
#2 0x00000000004442ce in ngx_http_file_cache_free (c=0xc036b50, tf=0x0)
at
src/http/ngx_http_file_cache.c:916
#3 0x0000000000444473 in ngx_http_file_cache_cleanup (data=) at src/http/ngx_http_file_cache.c:978
#4 0x0000000000405924 in ngx_destroy_pool (pool=0xc035ac0) at
src/core/ngx_palloc.c:54
#5 0x000000000042c0da in ngx_http_free_request (r=0xaee17b0, rc=0) at
src/http/ngx_http_request.c:2996
#6 0x000000000042dd6d in ngx_http_set_keepalive (r=0xaee17b0) at
src/http/ngx_http_request.c:2450
#7 ngx_http_finalize_connection (r=0xaee17b0) at
src/http/ngx_http_request.c:2146
#8 0x000000000042e2d9 in ngx_http_finalize_request (r=0x2b3410964048,
rc=0)
at src/http/ngx_http_request.c:1879
#9 0x000000000042a5dc in ngx_http_core_content_phase (r=0xaee17b0,
ph=0xeb273b0) at src/http/ngx_http_core_module.c:1339
#10 0x0000000000425d03 in ngx_http_core_run_phases (r=0xaee17b0) at
src/http/ngx_http_core_module.c:837
#11 0x0000000000425e00 in ngx_http_handler (r=0x2b3410964048) at
src/http/ngx_http_core_module.c:820
#12 0x000000000042ebfd in ngx_http_process_request (r=0xaee17b0) at
src/http/ngx_http_request.c:1650
#13 0x000000000042f420 in ngx_http_process_request_headers (rev=) at src/http/ngx_http_request.c:1093
#14 0x000000000042f922 in ngx_http_process_request_line
(rev=0x2b33ce0c4690)
at src/http/ngx_http_request.c:893
#15 0x000000000042cefe in ngx_http_init_request (rev=0x2b33ce0c4690) at
src/http/ngx_http_request.c:518
#16 0x0000000000418eaa in ngx_event_process_posted (cycle=, posted=0x0) at src/event/ngx_event_posted.c:39
#17 0x0000000000418d6f in ngx_process_events_and_timers
(cycle=0x9b4da50) at
src/event/ngx_event.c:272
#18 0x000000000041e800 in ngx_worker_process_cycle (cycle=0x9b4da50,
data=)
at src/os/unix/ngx_process_cycle.c:800
#19 0x000000000041d137 in ngx_spawn_process (cycle=0x9b4da50,
proc=0x41e738
<ngx_worker_process_cycle>, data=0x0,
name=0x5652a1 “worker process”, respawn=-4) at
src/os/unix/ngx_process.c:196
#20 0x000000000041dea3 in ngx_start_worker_processes (cycle=0x9b4da50,
n=5,
type=-4) at src/os/unix/ngx_process_cycle.c:360
#21 0x000000000041f351 in ngx_master_process_cycle (cycle=0x9b4da50) at
src/os/unix/ngx_process_cycle.c:249
#22 0x0000000000404dda in main (argc=7, argv=0x972556d) at
src/core/nginx.c:405

Also, we took backtrace for cache manger process:

Attaching to process 17306
ngx_http_file_cache_expire (data=) at
src/http/ngx_http_file_cache.c:1096
1096 if (fcn->count == 0) {
(gdb) bt
#0 ngx_http_file_cache_expire (data=) at
src/http/ngx_http_file_cache.c:1096
#1 ngx_http_file_cache_manager (data=) at
src/http/ngx_http_file_cache.c:1191
#2 0x000000000041e92a in ngx_cache_manager_process_handler
(ev=0x7fff3f606670) at src/os/unix/ngx_process_cycle.c:1346
#3 0x0000000000418e15 in ngx_event_expire_timers () at
src/event/ngx_event_timer.c:149
#4 0x0000000000418d44 in ngx_process_events_and_timers
(cycle=0x9b4da50) at
src/event/ngx_event.c:261
#5 0x000000000041f5f8 in ngx_cache_manager_process_cycle
(cycle=0x9b4da50,
data=)
at src/os/unix/ngx_process_cycle.c:1328
#6 0x000000000041d137 in ngx_spawn_process (cycle=0x9b4da50,
proc=0x41f489
<ngx_cache_manager_process_cycle>, data=0x7e9040,
name=0x56525f “cache manager process”, respawn=-4) at
src/os/unix/ngx_process.c:196
#7 0x000000000041dd8a in ngx_start_cache_manager_processes
(cycle=0x9b4da50, respawn=1) at src/os/unix/ngx_process_cycle.c:398
#8 0x000000000041f35e in ngx_master_process_cycle (cycle=0x9b4da50) at
src/os/unix/ngx_process_cycle.c:251
#9 0x0000000000404dda in main (argc=7, argv=0x972556d) at
src/core/nginx.c:405

The nginx version is 1.0.4 with following configuration parameters:

nginx: nginx version: nginx/1.0.4
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-50)
nginx: TLS SNI support enabled
nginx: configure arguments: --prefix=/opt/app/nginx-urlhash
–without-select_module --without-poll_module --with-http_ssl_module
–with-http_realip_module --with-http_addition_module
–with-http_sub_module
–with-http_flv_module --with-http_gzip_static_module
–with-http_stub_status_module --with-pcre=/opt/tmp/pcre-8.10
–with-openssl=/opt/tmp/openssl-1.0.0a
–add-module=/opt/tmp/substitutions4nginx
–add-module=/opt/tmp/ngx_cache_purge-1.3

Hello!

On Tue, Jul 12, 2011 at 09:31:07PM +0800, 杨镭 wrote:

After reload, all the worker processes, including newly forked ones, goes
straight to 100% CPU:

[…]

Also, we took backtrace for cache manger process:

Attaching to process 17306
ngx_http_file_cache_expire (data=) at
src/http/ngx_http_file_cache.c:1096
1096 if (fcn->count == 0) {
(gdb) bt
#0 ngx_http_file_cache_expire (data=) at
src/http/ngx_http_file_cache.c:1096

Could you please show

p *fcn

?

I think I see the problem, but just to make sure.

If you are able to reproduce the problem please try the following
patch:

diff --git a/src/http/ngx_http_file_cache.c
b/src/http/ngx_http_file_cache.c
— a/src/http/ngx_http_file_cache.c
+++ b/src/http/ngx_http_file_cache.c
@@ -1099,7 +1099,8 @@ ngx_http_file_cache_expire(ngx_http_file
}

     if (fcn->deleting) {
  •        continue;
    
  •        wait = 1;
    
  •        break;
       }
    
       p = ngx_hex_dump(key, (u_char *) &fcn->node.key,
    

Maxim D.

Hi,

we didn’t take core dump. If we can reproduce the problem, I’ll bring
you
the p *fcn content.

Hi,

we reproduced the problem and below is the content of fcn:

(gdb) p *fcn
$1 = {node = {key = 2417939894599905030, left = 0x2b822d64f780, right =
0x2b822ada9a00, parent = 0x2b822bac7080, color = 0 ‘\000’,
data = 0 ‘\000’}, queue = {prev = 0x2b822afd63a8, next =
0x2b822ad45040}, key = “\177\215\220\215h!?T”, count = 1, uses = 1,
valid_msec = 0, error = 0, exists = 1, updating = 0, deleting = 1,
uniq =
7209021, expire = 1311055359, valid_sec = 0,
body_start = 905, fs_size = 2}
(gdb)

So I guess we should apply the patch?

Hello!

On Tue, Jul 19, 2011 at 02:16:25PM +0800, 杨镭 wrote:

7209021, expire = 1311055359, valid_sec = 0,
body_start = 905, fs_size = 2}
(gdb)

So I guess we should apply the patch?

Yes, this is exactly the case I was thinking about.

With fcn->deleting == 1 cache manger will loop waiting for cache
item to be actually deleted, but it’s not going to happen as cache
manager holds cache lock.

Maxim D.

On Tue, Jul 12, 2011 at 10:08 PM, Maxim D. [email protected] wrote:

  •        break;
    

nginx Info Page

lei yang