Hi,
We are using nginx/0.8.15 on our front web server, mainly for
caching and reverse proxy. Our cache is located in /dev/shm disk in
order to obtain a better performance. Recently, we’ve found that
sometimes the total size of the cache path would exceed the “max_size”
in the configuration, and the process of “cache manager process” is
quite busy, using 100% for one CPU.
We presume that the cache manager process might not be able to
delete those expired files in time, because the read/write frequency on
the cache disk is very high in our production environment, so the total
size of the cache path sometimes exceeds the configuration. Is it
correct? The configuration about it is “proxy_cache_path
/dev/shm/nginx/cache/ levels=2:2 keys_zone=proxy-cache:154m max_size=4g
inactive=72h ;”
Could someone tell us how to fix this problem? Thanks a lot.
Hi, we’re experiencing exactly the same issue using the local filesystem
as cache store. We had the same issue with nginx 0.7.67 and now with
0.8.54. The cache grows outside the specified boundaries and nginx
process uses the 100% of a single core.
I tried to strace the malfunctioning running process but all I got is
tons of gettimeofday
gettimeofday({1300181425, 698292}, NULL) = 0
gettimeofday({1300181425, 698360}, NULL) = 0
gettimeofday({1300181425, 698410}, NULL) = 0
gettimeofday({1300181425, 698459}, NULL) = 0
gettimeofday({1300181425, 698508}, NULL) = 0
…
Are you using CentOS 5 64bit? Can you check nginx error log and see if
you have crashed workers please?
nginx: nginx version: nginx/1.0.0
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-48)
nginx: TLS SNI support disabled
nginx: configure arguments:
–with-pcre=/home/ashtray/build_nginx/pcre-8.12
–add-module=/home/ashtray/build_nginx/ngx_cache_purge-1.2
–with-http_stub_status_module --with-http_ssl_module --with-debug
–prefix=/usr/local/nginx --with-http_realip_module
we have core dumps almost everyday, this happens from 0.8.15. We tried
to disable module cache_purge (but deleting the cached file outside of
nginx using rm when needed), and the worker still cores.
[xxx nginx]# ls -lhtr core/
total 69G
-rw------- 1 nobody nobody 888M Apr 26 07:23 core.32104
-rw------- 1 nobody nobody 1.1G Apr 26 09:36 core.32102
-rw------- 1 nobody nobody 1.3G Apr 26 20:38 core.32103
-rw------- 1 nobody nobody 1.3G Apr 28 22:13 core.4659
-rw------- 1 nobody nobody 1.3G Apr 28 23:53 core.9502
-rw------- 1 nobody nobody 1011M May 6 02:27 core.26681
-rw------- 1 nobody nobody 985M May 6 02:27 core.26688
-rw------- 1 nobody nobody 1.2G May 6 20:02 core.7113
-rw------- 1 nobody nobody 1.2G May 7 23:14 core.1774
-rw------- 1 nobody nobody 1.2G May 8 00:22 core.26687
-rw------- 1 nobody nobody 1.1G May 8 02:31 core.26683
-rw------- 1 nobody nobody 991M May 8 10:01 core.26346
-rw------- 1 nobody nobody 969M May 10 10:23 core.15038
-rw------- 1 nobody nobody 1.1G May 12 20:44 core.30863
-rw------- 1 nobody nobody 962M May 13 02:57 core.30860
-rw------- 1 nobody nobody 1.1G May 15 12:58 core.30861
-rw------- 1 nobody nobody 873M May 16 06:42 core.30858
-rw------- 1 nobody nobody 1.2G May 18 00:11 core.15061
-rw------- 1 nobody nobody 1.1G May 18 13:43 core.15065
-rw------- 1 nobody nobody 1.2G May 19 21:18 core.18918
-rw------- 1 nobody nobody 1.1G May 20 00:03 core.18917
-rw------- 1 nobody nobody 1.1G May 20 01:54 core.18921
-rw------- 1 nobody nobody 994M May 22 08:06 core.27758
-rw------- 1 nobody nobody 1.2G May 22 14:02 core.27754
-rw------- 1 nobody nobody 857M May 23 07:23 core.24189
-rw------- 1 nobody nobody 1.2G May 23 22:11 core.27760
-rw------- 1 nobody nobody 1.2G May 23 22:11 core.27753
-rw------- 1 nobody nobody 1.1G May 25 01:24 core.12213
-rw------- 1 nobody nobody 1.1G May 25 15:05 core.24518
-rw------- 1 nobody nobody 1.1G May 26 13:44 core.28284
-rw------- 1 nobody nobody 1.1G May 28 11:15 core.25958
-rw------- 1 nobody nobody 1.1G May 29 10:28 core.25988
-rw------- 1 nobody nobody 1.1G May 30 11:34 core.25989
-rw------- 1 nobody nobody 1.2G May 30 21:33 core.25310
-rw------- 1 nobody nobody 1.2G May 30 22:52 core.13874
-rw------- 1 nobody nobody 1.1G Jun 1 00:45 core.25148
-rw------- 1 nobody nobody 1.1G Jun 1 13:20 core.25152
-rw------- 1 nobody nobody 1.1G Jun 1 13:47 core.25149
-rw------- 1 nobody nobody 1.1G Jun 1 16:08 core.10719
-rw------- 1 nobody nobody 1.2G Jun 1 21:47 core.15534
-rw------- 1 nobody nobody 1.2G Jun 3 19:23 core.4725
-rw------- 1 nobody nobody 1.2G Jun 3 22:32 core.5872
-rw------- 1 nobody nobody 1.2G Jun 4 00:05 core.4718
-rw------- 1 nobody nobody 1.1G Jun 4 00:30 core.4723
-rw------- 1 nobody nobody 1.1G Jun 4 05:31 core.4719
-rw------- 1 nobody nobody 1.1G Jun 4 18:45 core.29156
-rw------- 1 nobody nobody 1.2G Jun 4 18:56 core.29157
-rw------- 1 nobody nobody 1.2G Jun 4 20:32 core.29160
-rw------- 1 nobody nobody 1.2G Jun 4 20:32 core.29158
-rw------- 1 nobody nobody 1.2G Jun 4 20:32 core.29155
-rw------- 1 nobody nobody 1.2G Jun 4 22:32 core.29996
-rw------- 1 nobody nobody 1.2G Jun 4 23:08 core.29989
-rw------- 1 nobody nobody 1.2G Jun 6 01:07 core.20367
-rw------- 1 nobody nobody 970M Jun 6 05:44 core.20366
-rw------- 1 nobody nobody 1.2G Jun 6 16:46 core.20369
-rw------- 1 nobody nobody 1.2G Jun 6 16:46 core.20368
-rw------- 1 nobody nobody 1.2G Jun 6 17:16 core.20365
-rw------- 1 nobody nobody 1.2G Jun 6 18:04 core.16095
-rw------- 1 nobody nobody 1.1G Jun 7 03:15 core.20363
-rw------- 1 nobody nobody 987M Jun 7 03:16 core.28162
-rw------- 1 nobody nobody 1.1G Jun 7 10:35 core.28252
-rw------- 1 nobody nobody 1.1G Jun 7 10:57 core.10354
-rw------- 1 nobody nobody 1.1G Jun 7 11:45 core.10352
-rw------- 1 nobody nobody 773M Jun 8 05:47 core.14587
On Wed, Jun 08, 2011 at 08:51:53AM -0400, zls wrote:
we have core dumps almost everyday, this happens from 0.8.15. We tried
to disable module cache_purge (but deleting the cached file outside of
nginx using rm when needed), and the worker still cores.
Could you
build nginx-1.0.4 without cache_purge;
ensure that debug information in nginx binary is not stripped
“file nginx”
-rw------- 1 nobody nobody 1.2G Jun 12 15:02 core.25256
-rw------- 1 nobody nobody 1.2G Jun 12 18:18 core.30101
[xxx nginx]# file sbin/nginx
sbin/nginx: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for
GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux
2.6.9, not stripped
bt1
(gdb) bt #0 0x00000000004946a4 in ngx_http_proxy_add_x_forwarded_for_variable
(r=0x1037d250, v=0x214f80b8, data=0)
at src/http/modules/ngx_http_proxy_module.c:1441 #1 0x0000000000459ced in ngx_http_get_indexed_variable (r=0x1037d250,
index=32) at src/http/ngx_http_variables.c:406 #2 0x000000000045eebf in ngx_http_script_copy_var_len_code
(e=0x7fffedf73280) at src/http/ngx_http_script.c:816 #3 0x0000000000493108 in ngx_http_proxy_create_request (r=0x1037d250)
at src/http/modules/ngx_http_proxy_module.c:928 #4 0x0000000000461888 in ngx_http_upstream_init_request (r=0x1037d250)
at src/http/ngx_http_upstream.c:488 #5 0x0000000000461625 in ngx_http_upstream_init (r=0x1037d250) at
src/http/ngx_http_upstream.c:429 #6 0x0000000000458c34 in ngx_http_do_read_client_request_body
(r=0x1037d250) at src/http/ngx_http_request_body.c:378 #7 0x0000000000458774 in ngx_http_read_client_request_body_handler
(r=0x1037d250) at src/http/ngx_http_request_body.c:251 #8 0x000000000044d8ee in ngx_http_request_handler (ev=0x2ac299c70248)
at src/http/ngx_http_request.c:1800 #9 0x00000000004361f3 in ngx_epoll_process_events (cycle=0x10171bc0,
timer=11, flags=1)
at src/event/modules/ngx_epoll_module.c:635 #10 0x000000000042780c in ngx_process_events_and_timers
(cycle=0x10171bc0) at src/event/ngx_event.c:245 #11 0x0000000000434058 in ngx_worker_process_cycle (cycle=0x10171bc0,
data=0x0) at src/os/unix/ngx_process_cycle.c:800 #12 0x0000000000430e3d in ngx_spawn_process (cycle=0x10171bc0,
proc=0x433eac <ngx_worker_process_cycle>, data=0x0,
name=0x4c174e “worker process”, respawn=-3) at
src/os/unix/ngx_process.c:196 #13 0x0000000000432f5e in ngx_start_worker_processes (cycle=0x10171bc0,
n=8, type=-3) at src/os/unix/ngx_process_cycle.c:360 #14 0x000000000043264a in ngx_master_process_cycle (cycle=0x10171bc0) at
src/os/unix/ngx_process_cycle.c:136 #15 0x0000000000405018 in main (argc=3, argv=0x7fffedf73c28) at
src/core/nginx.c:405
(gdb) p r->headers_in
$3 = {headers = {last = 0x1037d2c0, part = {elts = 0x214f8148, nelts =
10, next = 0x0}, size = 6073104097509113904,
nalloc = 3611940975614578772, pool = 0x440a0d4b4f203030}, host =
0x6e7553203a657461, connection = 0x6e754a203231202c,
if_modified_since = 0x3730203131303220, if_unmodified_since =
0x472031323a32303a, user_agent = 0x767265530a0d544d,
referer = 0x63617041203a7265, content_length = 0x6f502d580a0d6568,
content_type = 0x79422d6465726577,
range = 0x2e352f504850203a, if_range = 0x7078450a0d362e31,
transfer_encoding = 0x312d203a73657269, expect = 0x6e65746e6f430a0d,
accept_encoding = 0x6874676e654c2d74, via = 0x6e6f430a0d30203a,
authorization = 0x3a6e6f697463656e,
keep_alive = 0xa0d65736f6c6320, x_forwarded_for = 0x2d746e65746e6f43,
x_real_ip = 0x6574203a65707954, user = {
len = 967268366709388408, data = 0xa0d0a <Address 0xa0d0a out of
bounds>}, passwd = {len = 0, data = 0x0}, cookies = {
elts = 0x214f8508, nelts = 0, size = 8, nalloc = 2, pool =
0x214f7970}, server = {len = 24,
data = 0x214f855d
“xxxxxxx.xxxxx.xxxxu.comcontent-lengthconnectioncache-control”},
content_length_n = 728893,
keep_alive_n = -1, connection_type = 2, msie = 0, msie4 = 0, msie6 =
0, opera = 0, gecko = 0, chrome = 0, safari = 0,
konqueror = 0}
bt2
(gdb) bt #0 0x0000003b13e7273e in free () from /lib64/libc.so.6 #1 0x0000000000407dc9 in ngx_destroy_pool (pool=0x10a18f80) at
src/core/ngx_palloc.c:63 #2 0x00000000004502f9 in ngx_http_close_connection (c=0x2ac29930e810)
at src/http/ngx_http_request.c:3047 #3 0x000000000044f7d4 in ngx_http_keepalive_handler
(rev=0x2ac299c0e910) at src/http/ngx_http_request.c:2693 #4 0x0000000000429ba0 in ngx_event_process_posted (cycle=0x105e24d0,
posted=0x6ee908) at src/event/ngx_event_posted.c:39 #5 0x0000000000427905 in ngx_process_events_and_timers
(cycle=0x105e24d0) at src/event/ngx_event.c:272 #6 0x0000000000434058 in ngx_worker_process_cycle (cycle=0x105e24d0,
data=0x0) at src/os/unix/ngx_process_cycle.c:800 #7 0x0000000000430e3d in ngx_spawn_process (cycle=0x105e24d0,
proc=0x433eac <ngx_worker_process_cycle>, data=0x0,
name=0x4c174e “worker process”, respawn=-4) at
src/os/unix/ngx_process.c:196 #8 0x0000000000432f5e in ngx_start_worker_processes (cycle=0x105e24d0,
n=8, type=-4) at src/os/unix/ngx_process_cycle.c:360 #9 0x0000000000432b52 in ngx_master_process_cycle (cycle=0x105e24d0) at
src/os/unix/ngx_process_cycle.c:249 #10 0x0000000000405018 in main (argc=3, argv=0x7fffedf73c28) at
src/core/nginx.c:405
We’ve found the problem and sent email to “[email protected]”, but got no
answers.
It’s caused in ngx_http_upstream_process_header. If
r->cache->header_start is bigger than u->conf->buffer_size,
u->buffer.last will be bigger than u->buffer.end, and the response
header will be written to invalid memory for this buffer. But the
written memory may be valid for this process, so core dump usually
happens some time later, and backstrace shows somewhere else.