Nginx cache exceeded, cache manager busy

Hi,
We are using nginx/0.8.15 on our front web server, mainly for
caching and reverse proxy. Our cache is located in /dev/shm disk in
order to obtain a better performance. Recently, we’ve found that
sometimes the total size of the cache path would exceed the “max_size”
in the configuration, and the process of “cache manager process” is
quite busy, using 100% for one CPU.
We presume that the cache manager process might not be able to
delete those expired files in time, because the read/write frequency on
the cache disk is very high in our production environment, so the total
size of the cache path sometimes exceeds the configuration. Is it
correct? The configuration about it is “proxy_cache_path
/dev/shm/nginx/cache/ levels=2:2 keys_zone=proxy-cache:154m max_size=4g
inactive=72h ;”

 Could someone tell us how to fix this problem?  Thanks a lot.

Posted at Nginx Forum:

Hi, we’re experiencing exactly the same issue using the local filesystem
as cache store. We had the same issue with nginx 0.7.67 and now with
0.8.54. The cache grows outside the specified boundaries and nginx
process uses the 100% of a single core.

I tried to strace the malfunctioning running process but all I got is
tons of gettimeofday
gettimeofday({1300181425, 698292}, NULL) = 0
gettimeofday({1300181425, 698360}, NULL) = 0
gettimeofday({1300181425, 698410}, NULL) = 0
gettimeofday({1300181425, 698459}, NULL) = 0
gettimeofday({1300181425, 698508}, NULL) = 0

Are you using CentOS 5 64bit? Can you check nginx error log and see if
you have crashed workers please?

Thanks.

nginx error log:

2011/03/12 09:09:09 [notice] 5979#0: signal 17 (SIGCHLD) received
2011/03/12 09:09:09 [alert] 5979#0: worker process 5988 exited on signal
11
2011/03/12 09:09:09 [notice] 5979#0: start worker process 29906
2011/03/12 09:09:09 [notice] 5979#0: signal 29 (SIGIO) received
*** glibc detected *** nginx: worker process: free(): invalid next size
(normal): 0x00000000006e13c0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2b197d5f630f]
/lib64/libc.so.6(cfree+0x4b)[0x2b197d5f676b]
nginx: worker process[0x404ce4]
nginx: worker process[0x42701b]
nginx: worker process[0x427065]
nginx: worker process[0x42929f]
nginx: worker process[0x4346aa]
nginx: worker process[0x435056]
nginx: worker process[0x435130]
nginx: worker process[0x436c03]
nginx: worker process[0x435237]
nginx: worker process[0x41732e]
nginx: worker process[0x4171f0]
nginx: worker process[0x41c951]
nginx: worker process[0x41b2a7]
nginx: worker process[0x41d139]
nginx: worker process[0x40419a]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b197d5a1994]
nginx: worker process[0x4029b9]
======= Memory map: ========
00400000-00468000 r-xp 00000000 ca:02 5210299
/usr/local/nginx/sbin/nginx
00668000-00675000 rw-p 00068000 ca:02 5210299
/usr/local/nginx/sbin/nginx
00675000-0074d000 rw-p 00675000 00:00 0
[heap]
2b197c9ad000-2b197c9c9000 r-xp 00000000 ca:02 4047069
/lib64/ld-2.5.so
2b197c9c9000-2b197c9ca000 rw-p 2b197c9c9000 00:00 0
2b197c9d1000-2b197c9ff000 rw-p 2b197c9d1000 00:00 0
2b197cbc8000-2b197cbc9000 r–p 0001b000 ca:02 4047069
/lib64/ld-2.5.so
2b197cbc9000-2b197cbca000 rw-p 0001c000 ca:02 4047069
/lib64/ld-2.5.so
2b197cbca000-2b197cbd3000 r-xp 00000000 ca:02 606255
/lib64/libcrypt-2.5.so
2b197cbd3000-2b197cdd2000 —p 00009000 ca:02 606255
/lib64/libcrypt-2.5.so
2b197cdd2000-2b197cdd3000 r–p 00008000 ca:02 606255
/lib64/libcrypt-2.5.so
2b197cdd3000-2b197cdd4000 rw-p 00009000 ca:02 606255
/lib64/libcrypt-2.5.so
2b197cdd4000-2b197ce02000 rw-p 2b197cdd4000 00:00 0
2b197ce02000-2b197ce1d000 r-xp 00000000 ca:02 606375
/lib64/libpcre.so.0.0.1
2b197ce1d000-2b197d01d000 —p 0001b000 ca:02 606375
/lib64/libpcre.so.0.0.1
2b197d01d000-2b197d01e000 rw-p 0001b000 ca:02 606375
/lib64/libpcre.so.0.0.1
2b197d01e000-2b197d14b000 r-xp 00000000 ca:02 4047961
/lib64/libcrypto.so.0.9.8e
2b197d14b000-2b197d34a000 —p 0012d000 ca:02 4047961
/lib64/libcrypto.so.0.9.8e
2b197d34a000-2b197d36b000 rw-p 0012c000 ca:02 4047961
/lib64/libcrypto.so.0.9.8e
2b197d36b000-2b197d370000 rw-p 2b197d36b000 00:00 0
2b197d370000-2b197d384000 r-xp 00000000 ca:02 5033221
/usr/lib64/libz.so.1.2.3
2b197d384000-2b197d583000 —p 00014000 ca:02 5033221
/usr/lib64/libz.so.1.2.3
2b197d583000-2b197d584000 rw-p 00013000 ca:02 5033221
/usr/lib64/libz.so.1.2.3
2b197d584000-2b197d6d2000 r-xp 00000000 ca:02 606251
/lib64/libc-2.5.so
2b197d6d2000-2b197d8d1000 —p 0014e000 ca:02 606251
/lib64/libc-2.5.so
2b197d8d1000-2b197d8d5000 r–p 0014d000 ca:02 606251
/lib64/libc-2.5.so
2b197d8d5000-2b197d8d6000 rw-p 00151000 ca:02 606251
/lib64/libc-2.5.so
2b197d8d6000-2b197d8db000 rw-p 2b197d8d6000 00:00 0
2b197d8db000-2b197d8dd000 r-xp 00000000 ca:02 606257
/lib64/libdl-2.5.so
2b197d8dd000-2b197dadd000 —p 00002000 ca:02 606257
/lib64/libdl-2.5.so
2b197dadd000-2b197dade000 r–p 00002000 ca:02 606257
/lib64/libdl-2.5.so
2b197dade000-2b197dadf000 rw-p 00003000 ca:02 606257
/lib64/libdl-2.5.so
2b197dadf000-2b197dae1000 rw-p 2b197dadf000 00:00 0
2b197dae1000-2b197daeb000 r-xp 00000000 ca:02 606383
/lib64/libnss_files-2.5.so
2b197daeb000-2b197dcea000 —p 0000a000 ca:02 606383
/lib64/libnss_files-2.5.so
2b197dcea000-2b197dceb000 r–p 00009000 ca:02 606383
/lib64/libnss_files-2.5.so
2b197dceb000-2b197dcec000 rw-p 0000a000 ca:02 606383
/lib64/libnss_files-2.5.so
2b197dcec000-2b198a4ec000 rw-s 00000000 00:08 1855417355
/dev/zero (deleted)
2b198a4ec000-2b198acec000 rw-s 00000000 00:08 1855417356
/dev/zero (deleted)
2b198acec000-2b198aced000 rw-s 00000000 00:08 1855417359
/dev/zero (deleted)
2b198aced000-2b198acfa000 r-xp 00000000 ca:02 4047067
/lib64/libgcc_s-4.1.2-20080825.so.1
2b198acfa000-2b198aefa000 —p 0000d000 ca:02 4047067
/lib64/libgcc_s-4.1.2-20080825.so.1
2b198aefa000-2b198aefb000 rw-p 0000d000 ca:02 4047067
/lib64/libgcc_s-4.1.2-20080825.so.1
7fff2e0e7000-7fff2e0fd000 rw-p 7fff2e0e7000 00:00 0
[stack]
ffffffffff600000-ffffffffffe00000 —p 00000000 00:00 0
[vdso]

Posted at Nginx Forum:

Our nginx -V outputs:

nginx: nginx version: nginx/1.0.0
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-48)
nginx: TLS SNI support disabled
nginx: configure arguments:
–with-pcre=/home/ashtray/build_nginx/pcre-8.12
–add-module=/home/ashtray/build_nginx/ngx_cache_purge-1.2
–with-http_stub_status_module --with-http_ssl_module --with-debug
–prefix=/usr/local/nginx --with-http_realip_module

we have core dumps almost everyday, this happens from 0.8.15. We tried
to disable module cache_purge (but deleting the cached file outside of
nginx using rm when needed), and the worker still cores.

[xxx nginx]# ls -lhtr core/
total 69G
-rw------- 1 nobody nobody 888M Apr 26 07:23 core.32104
-rw------- 1 nobody nobody 1.1G Apr 26 09:36 core.32102
-rw------- 1 nobody nobody 1.3G Apr 26 20:38 core.32103
-rw------- 1 nobody nobody 1.3G Apr 28 22:13 core.4659
-rw------- 1 nobody nobody 1.3G Apr 28 23:53 core.9502
-rw------- 1 nobody nobody 1011M May 6 02:27 core.26681
-rw------- 1 nobody nobody 985M May 6 02:27 core.26688
-rw------- 1 nobody nobody 1.2G May 6 20:02 core.7113
-rw------- 1 nobody nobody 1.2G May 7 23:14 core.1774
-rw------- 1 nobody nobody 1.2G May 8 00:22 core.26687
-rw------- 1 nobody nobody 1.1G May 8 02:31 core.26683
-rw------- 1 nobody nobody 991M May 8 10:01 core.26346
-rw------- 1 nobody nobody 969M May 10 10:23 core.15038
-rw------- 1 nobody nobody 1.1G May 12 20:44 core.30863
-rw------- 1 nobody nobody 962M May 13 02:57 core.30860
-rw------- 1 nobody nobody 1.1G May 15 12:58 core.30861
-rw------- 1 nobody nobody 873M May 16 06:42 core.30858
-rw------- 1 nobody nobody 1.2G May 18 00:11 core.15061
-rw------- 1 nobody nobody 1.1G May 18 13:43 core.15065
-rw------- 1 nobody nobody 1.2G May 19 21:18 core.18918
-rw------- 1 nobody nobody 1.1G May 20 00:03 core.18917
-rw------- 1 nobody nobody 1.1G May 20 01:54 core.18921
-rw------- 1 nobody nobody 994M May 22 08:06 core.27758
-rw------- 1 nobody nobody 1.2G May 22 14:02 core.27754
-rw------- 1 nobody nobody 857M May 23 07:23 core.24189
-rw------- 1 nobody nobody 1.2G May 23 22:11 core.27760
-rw------- 1 nobody nobody 1.2G May 23 22:11 core.27753
-rw------- 1 nobody nobody 1.1G May 25 01:24 core.12213
-rw------- 1 nobody nobody 1.1G May 25 15:05 core.24518
-rw------- 1 nobody nobody 1.1G May 26 13:44 core.28284
-rw------- 1 nobody nobody 1.1G May 28 11:15 core.25958
-rw------- 1 nobody nobody 1.1G May 29 10:28 core.25988
-rw------- 1 nobody nobody 1.1G May 30 11:34 core.25989
-rw------- 1 nobody nobody 1.2G May 30 21:33 core.25310
-rw------- 1 nobody nobody 1.2G May 30 22:52 core.13874
-rw------- 1 nobody nobody 1.1G Jun 1 00:45 core.25148
-rw------- 1 nobody nobody 1.1G Jun 1 13:20 core.25152
-rw------- 1 nobody nobody 1.1G Jun 1 13:47 core.25149
-rw------- 1 nobody nobody 1.1G Jun 1 16:08 core.10719
-rw------- 1 nobody nobody 1.2G Jun 1 21:47 core.15534
-rw------- 1 nobody nobody 1.2G Jun 3 19:23 core.4725
-rw------- 1 nobody nobody 1.2G Jun 3 22:32 core.5872
-rw------- 1 nobody nobody 1.2G Jun 4 00:05 core.4718
-rw------- 1 nobody nobody 1.1G Jun 4 00:30 core.4723
-rw------- 1 nobody nobody 1.1G Jun 4 05:31 core.4719
-rw------- 1 nobody nobody 1.1G Jun 4 18:45 core.29156
-rw------- 1 nobody nobody 1.2G Jun 4 18:56 core.29157
-rw------- 1 nobody nobody 1.2G Jun 4 20:32 core.29160
-rw------- 1 nobody nobody 1.2G Jun 4 20:32 core.29158
-rw------- 1 nobody nobody 1.2G Jun 4 20:32 core.29155
-rw------- 1 nobody nobody 1.2G Jun 4 22:32 core.29996
-rw------- 1 nobody nobody 1.2G Jun 4 23:08 core.29989
-rw------- 1 nobody nobody 1.2G Jun 6 01:07 core.20367
-rw------- 1 nobody nobody 970M Jun 6 05:44 core.20366
-rw------- 1 nobody nobody 1.2G Jun 6 16:46 core.20369
-rw------- 1 nobody nobody 1.2G Jun 6 16:46 core.20368
-rw------- 1 nobody nobody 1.2G Jun 6 17:16 core.20365
-rw------- 1 nobody nobody 1.2G Jun 6 18:04 core.16095
-rw------- 1 nobody nobody 1.1G Jun 7 03:15 core.20363
-rw------- 1 nobody nobody 987M Jun 7 03:16 core.28162
-rw------- 1 nobody nobody 1.1G Jun 7 10:35 core.28252
-rw------- 1 nobody nobody 1.1G Jun 7 10:57 core.10354
-rw------- 1 nobody nobody 1.1G Jun 7 11:45 core.10352
-rw------- 1 nobody nobody 773M Jun 8 05:47 core.14587

Posted at Nginx Forum:

On Wed, Jun 08, 2011 at 08:51:53AM -0400, zls wrote:

we have core dumps almost everyday, this happens from 0.8.15. We tried
to disable module cache_purge (but deleting the cached file outside of
nginx using rm when needed), and the worker still cores.

Could you

  1. build nginx-1.0.4 without cache_purge;
  2. ensure that debug information in nginx binary is not stripped
    “file nginx”
  3. send backtrace of coredump ?

P.S. rm file from cache is safe.


Igor S.

We’ll try it. Thanks`

Posted at Nginx Forum:

Still got core files from a clean nginx

[xxx nginx]# ./sbin/nginx -V
nginx: nginx version: nginx/1.0.4
nginx: TLS SNI support disabled
nginx: configure arguments: --with-pcre=/search/zls/nginx/pcre-8.12
–with-http_stub_status_module --with-http_ssl_module --with-debug
–prefix=/usr/local/nginx --with-http_realip_module

-rw------- 1 nobody nobody 1.2G Jun 12 15:02 core.25256
-rw------- 1 nobody nobody 1.2G Jun 12 18:18 core.30101

[xxx nginx]# file sbin/nginx
sbin/nginx: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for
GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux
2.6.9, not stripped

bt1
(gdb) bt
#0 0x00000000004946a4 in ngx_http_proxy_add_x_forwarded_for_variable
(r=0x1037d250, v=0x214f80b8, data=0)
at src/http/modules/ngx_http_proxy_module.c:1441
#1 0x0000000000459ced in ngx_http_get_indexed_variable (r=0x1037d250,
index=32) at src/http/ngx_http_variables.c:406
#2 0x000000000045eebf in ngx_http_script_copy_var_len_code
(e=0x7fffedf73280) at src/http/ngx_http_script.c:816
#3 0x0000000000493108 in ngx_http_proxy_create_request (r=0x1037d250)
at src/http/modules/ngx_http_proxy_module.c:928
#4 0x0000000000461888 in ngx_http_upstream_init_request (r=0x1037d250)
at src/http/ngx_http_upstream.c:488
#5 0x0000000000461625 in ngx_http_upstream_init (r=0x1037d250) at
src/http/ngx_http_upstream.c:429
#6 0x0000000000458c34 in ngx_http_do_read_client_request_body
(r=0x1037d250) at src/http/ngx_http_request_body.c:378
#7 0x0000000000458774 in ngx_http_read_client_request_body_handler
(r=0x1037d250) at src/http/ngx_http_request_body.c:251
#8 0x000000000044d8ee in ngx_http_request_handler (ev=0x2ac299c70248)
at src/http/ngx_http_request.c:1800
#9 0x00000000004361f3 in ngx_epoll_process_events (cycle=0x10171bc0,
timer=11, flags=1)
at src/event/modules/ngx_epoll_module.c:635
#10 0x000000000042780c in ngx_process_events_and_timers
(cycle=0x10171bc0) at src/event/ngx_event.c:245
#11 0x0000000000434058 in ngx_worker_process_cycle (cycle=0x10171bc0,
data=0x0) at src/os/unix/ngx_process_cycle.c:800
#12 0x0000000000430e3d in ngx_spawn_process (cycle=0x10171bc0,
proc=0x433eac <ngx_worker_process_cycle>, data=0x0,
name=0x4c174e “worker process”, respawn=-3) at
src/os/unix/ngx_process.c:196
#13 0x0000000000432f5e in ngx_start_worker_processes (cycle=0x10171bc0,
n=8, type=-3) at src/os/unix/ngx_process_cycle.c:360
#14 0x000000000043264a in ngx_master_process_cycle (cycle=0x10171bc0) at
src/os/unix/ngx_process_cycle.c:136
#15 0x0000000000405018 in main (argc=3, argv=0x7fffedf73c28) at
src/core/nginx.c:405
(gdb) p r->headers_in
$3 = {headers = {last = 0x1037d2c0, part = {elts = 0x214f8148, nelts =
10, next = 0x0}, size = 6073104097509113904,
nalloc = 3611940975614578772, pool = 0x440a0d4b4f203030}, host =
0x6e7553203a657461, connection = 0x6e754a203231202c,
if_modified_since = 0x3730203131303220, if_unmodified_since =
0x472031323a32303a, user_agent = 0x767265530a0d544d,
referer = 0x63617041203a7265, content_length = 0x6f502d580a0d6568,
content_type = 0x79422d6465726577,
range = 0x2e352f504850203a, if_range = 0x7078450a0d362e31,
transfer_encoding = 0x312d203a73657269, expect = 0x6e65746e6f430a0d,
accept_encoding = 0x6874676e654c2d74, via = 0x6e6f430a0d30203a,
authorization = 0x3a6e6f697463656e,
keep_alive = 0xa0d65736f6c6320, x_forwarded_for = 0x2d746e65746e6f43,
x_real_ip = 0x6574203a65707954, user = {
len = 967268366709388408, data = 0xa0d0a <Address 0xa0d0a out of
bounds>}, passwd = {len = 0, data = 0x0}, cookies = {
elts = 0x214f8508, nelts = 0, size = 8, nalloc = 2, pool =
0x214f7970}, server = {len = 24,
data = 0x214f855d
“xxxxxxx.xxxxx.xxxxu.comcontent-lengthconnectioncache-control”},
content_length_n = 728893,
keep_alive_n = -1, connection_type = 2, msie = 0, msie4 = 0, msie6 =
0, opera = 0, gecko = 0, chrome = 0, safari = 0,
konqueror = 0}

bt2

(gdb) bt
#0 0x0000003b13e7273e in free () from /lib64/libc.so.6
#1 0x0000000000407dc9 in ngx_destroy_pool (pool=0x10a18f80) at
src/core/ngx_palloc.c:63
#2 0x00000000004502f9 in ngx_http_close_connection (c=0x2ac29930e810)
at src/http/ngx_http_request.c:3047
#3 0x000000000044f7d4 in ngx_http_keepalive_handler
(rev=0x2ac299c0e910) at src/http/ngx_http_request.c:2693
#4 0x0000000000429ba0 in ngx_event_process_posted (cycle=0x105e24d0,
posted=0x6ee908) at src/event/ngx_event_posted.c:39
#5 0x0000000000427905 in ngx_process_events_and_timers
(cycle=0x105e24d0) at src/event/ngx_event.c:272
#6 0x0000000000434058 in ngx_worker_process_cycle (cycle=0x105e24d0,
data=0x0) at src/os/unix/ngx_process_cycle.c:800
#7 0x0000000000430e3d in ngx_spawn_process (cycle=0x105e24d0,
proc=0x433eac <ngx_worker_process_cycle>, data=0x0,
name=0x4c174e “worker process”, respawn=-4) at
src/os/unix/ngx_process.c:196
#8 0x0000000000432f5e in ngx_start_worker_processes (cycle=0x105e24d0,
n=8, type=-4) at src/os/unix/ngx_process_cycle.c:360
#9 0x0000000000432b52 in ngx_master_process_cycle (cycle=0x105e24d0) at
src/os/unix/ngx_process_cycle.c:249
#10 0x0000000000405018 in main (argc=3, argv=0x7fffedf73c28) at
src/core/nginx.c:405

Posted at Nginx Forum:

We’ve found the problem and sent email to “[email protected]”, but got no
answers.

It’s caused in ngx_http_upstream_process_header. If
r->cache->header_start is bigger than u->conf->buffer_size,
u->buffer.last will be bigger than u->buffer.end, and the response
header will be written to invalid memory for this buffer. But the
written memory may be valid for this process, so core dump usually
happens some time later, and backstrace shows somewhere else.

a patch based on nginx-1.0.4:

diff -ruNp nginx-1.0.4/src/http/ngx_http_upstream.c
nginx-1.0.4_zls/src/http/ngx_http_upstream.c
— nginx-1.0.4/src/http/ngx_http_upstream.c 2011-06-01
16:02:34.000000000 +0800
+++ nginx-1.0.4_zls/src/http/ngx_http_upstream.c 2011-06-22
00:41:32.282201935 +0800
@@ -661,6 +661,16 @@ ngx_http_upstream_cache(ngx_http_request

     ngx_http_file_cache_create_key(r);
  •    if (r->cache->header_start >= u->conf->buffer_size){
    
  •      ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
    
  •          "cache disabled because of too big cache_key (%uz vs
    

%uz)",

  •          r->cache->header_start, u->conf->buffer_size
    
  •          );
    
  •      r->cache = NULL;
    
  •      return NGX_DECLINED;
    
  •    }
    
  •    switch (ngx_http_test_predicates(r, u->conf->cache_bypass)) {
    
       case NGX_ERROR:
    

Posted at Nginx Forum: