Error: deadlock happened in ngx_http_file_cache_expire()

{{{
proxy_cache_path /nginx/cache/three levels=1:1:2
keys_zone=three:1000m;
proxy_cache_valid 200 304 302 60m;
}}}

Nginx version: 1.0.1
nginx: configure arguments: --with-pcre --group=admin --user=admin
–prefix=/var/www/nginx --with-http_ssl_module --with-http_perl_module
–with-http_stub_status_module
–with-perl_modules_path=./src/http/modules/perl/
–add-module=…/ngx_cache_purge

The cache entries timeout is set as 60 minutes, on production
environment(for website hosting) service became unavailable after
running several hours. After dive into backtraces and source code, it’s
a deadlock issue on reload nginx(reload occurs every several minutes),
here is the bt information.

{{{
root 28779 0.0 0.1 1124788 6464 ? Ss 11:15 0:00 nginx:
master process /var/www/nginx//sbin/nginx
nobody 28781 48.8 0.2 1126424 8268 ? R 11:15 77:41 nginx:
worker process is shutting down
nobody 28784 47.2 0.0 1122576 3460 ? R 11:15 75:05 nginx:
cache manager process
nobody 28785 47.6 0.0 1122724 3496 ? R 11:15 75:45 nginx:
cache loader process
nobody 30156 50.8 0.2 1128800 9648 ? R 11:43 66:27 nginx:
worker process
nobody 30157 54.1 0.2 1128800 9648 ? R 11:43 70:46 nginx:
worker process
nobody 30158 54.9 0.2 1128800 9572 ? R 11:43 71:52 nginx:
worker process
nobody 30159 55.3 0.2 1128800 9648 ? R 11:43 72:22 nginx:
worker process
nobody 30160 8.5 26.3 1124788 1069964 ? R 11:43 11:12 nginx:
cache manager process
}}}

From the ps list Nginx is under reloading, when all worker process and a
cache manager process is spinlock to get the cache, and another cache
manager acquired the lock and fall into a while loop.

{{{
#0 ngx_http_file_cache_expire (cache=0x1ea7dc00) at
src/http/ngx_http_file_cache.c:1103
1103 if (fcn->deleting) {
(gdb) bt
#0 ngx_http_file_cache_expire (cache=0x1ea7dc00) at
src/http/ngx_http_file_cache.c:1103
#1 0x0000000000474e8b in ngx_http_file_cache_manager (data=0x1ea7dc00)
at src/http/ngx_http_file_cache.c:1193
#2 0x0000000000439e9d in ngx_cache_manager_process_handler
(ev=0x7fff5f5f3120) at src/os/unix/ngx_process_cycle.c:1346
#3 0x00000000004303f0 in ngx_event_expire_timers () at
src/event/ngx_event_timer.c:149
#4 0x000000000042e537 in ngx_process_events_and_timers
(cycle=0x1e9b5a70) at src/event/ngx_event.c:261
#5 0x0000000000439d6e in ngx_cache_manager_process_cycle
(cycle=0x1e9b5a70, data=0x6c4440) at
src/os/unix/ngx_process_cycle.c:1328
#6 0x00000000004361bd in ngx_spawn_process (cycle=0x1e9b5a70,
proc=0x439bdc <ngx_cache_manager_process_cycle>, data=0x6c4440,
name=0x4abc88 “cache manager process”,
respawn=-4) at src/os/unix/ngx_process.c:196
#7 0x00000000004382ca in ngx_start_cache_manager_processes
(cycle=0x1e9b5a70, respawn=1) at src/os/unix/ngx_process_cycle.c:398
#8 0x0000000000437dc9 in ngx_master_process_cycle (cycle=0x1e9b5a70) at
src/os/unix/ngx_process_cycle.c:251
#9 0x000000000040e36c in main (argc=1, argv=0x7fff5f5f36d8) at
src/core/nginx.c:405
(gdb) p fcn->deleting
$1 = 1
(gdb) p wait
$2 = -67
}}}

Source code
{{{
1075 for ( ;; ) {
1076
1077 if (ngx_queue_empty(&cache->sh->queue)) {
1078 wait = 10;
1079 break;
1080 }
1081
1082 q = ngx_queue_last(&cache->sh->queue);
1083
1084 fcn = ngx_queue_data(q, ngx_http_file_cache_node_t,
queue);
1085
1086 wait = fcn->expire - now;
1087
1088 if (wait > 0) {
1089 wait = wait > 10 ? 10 : wait;
1090 break;
1091 }

}}}

The for loop never exit if “wait” is a negative number and unfortunately
it becomes -67, I hope decrease the timeout to 10m would work around it,
and it failed, either.

Posted at Nginx Forum:

Hello!

On Fri, Nov 11, 2011 at 03:56:09AM -0500, zhenwei wrote:

Nginx version: 1.0.1

[…]

#0 ngx_http_file_cache_expire (cache=0x1ea7dc00) at
src/http/ngx_http_file_cache.c:1103
1103 if (fcn->deleting) {

[…]

Please upgreade, this was fixed in 1.0.5:

*) Bugfix: worker processes may got caught in an endless loop during
   reconfiguration, if a caching was used; the bug had appeared in
   0.8.48.

Maxim D.

Wonderful, thanks a lot!

Posted at Nginx Forum: