Nginx run into Stat D

Hi community,

We are using Nginx as static http server to cache/send static smal file.
The average file size is about 10k ~ 200k -, and nginx server can
process
~10000 request per second. It runs well but some times most of nginx
worker
go into Stat D. and there’s no way to kill them but restart system.

Here’s the call stack for kernel/user space:

With AIO enabled
-----Kernel Space Call Stack-----
0xffffffff8111f700 : sync_page+0x0/0x50 [kernel]
0xffffffff8111f75e : sync_page_killable+0xe/0x40 [kernel]
0xffffffff81529e7a : __wait_on_bit_lock+0x5a/0xc0 [kernel]
0xffffffff8111f667 : __lock_page_killable+0x67/0x70 [kernel]
0xffffffff81121394 : generic_file_aio_read+0x4b4/0x700 [kernel]
0xffffffff811d58d4 : aio_rw_vect_retry+0x84/0x200 [kernel]
0xffffffff811d7294 : aio_run_iocb+0x64/0x170 [kernel]
0xffffffff811d86c1 : do_io_submit+0x291/0x920 [kernel]
0xffffffff811d8d60 : sys_io_submit+0x10/0x20 [kernel]
0xffffffff8100b288 : tracesys+0xd9/0xde [kernel]
-----User Space Call Stack-----
0x3c362e50c9 : syscall+0x19/0x40 [/lib64/libc-2.12.so]
0x4d2232 : ngx_linux_sendfile_chain+0xc2a/0xc2c
[/opt/soft/nginx/sbin/nginx]
0x4d24ea : ngx_file_aio_read+0x2b6/0x528 [/opt/soft/nginx/sbin/nginx]
0x515247 : ngx_http_file_cache_open+0xbef/0x1437
[/opt/soft/nginx/sbin/nginx]
0x514df6 : ngx_http_file_cache_open+0x79e/0x1437
[/opt/soft/nginx/sbin/nginx]
0x514abb : ngx_http_file_cache_open+0x463/0x1437
[/opt/soft/nginx/sbin/nginx]
0x5033d0 : ngx_http_upstream_init+0xbc5/0x7eb4
[/opt/soft/nginx/sbin/nginx]
0x502916 : ngx_http_upstream_init+0x10b/0x7eb4
[/opt/soft/nginx/sbin/nginx]
0x5028ad : ngx_http_upstream_init+0xa2/0x7eb4
[/opt/soft/nginx/sbin/nginx]
0x4f8313 : ngx_http_read_client_request_body+0x117/0xd1c
[/opt/soft/nginx/sbin/nginx]
0x5351da : ngx_http_ssi_map_uri_to_path+0x1aff0/0x3a5e0
[/opt/soft/nginx/sbin/nginx]
0x4df55b : ngx_http_core_content_phase+0x41/0x1c9
[/opt/soft/nginx/sbin/nginx]
0x4de484 : ngx_http_core_run_phases+0x87/0xc2
[/opt/soft/nginx/sbin/nginx]
0x4de3fb : ngx_http_handler+0x1c3/0x1c5 [/opt/soft/nginx/sbin/nginx]
0x4ec468 : ngx_http_process_request+0x304/0xa98
[/opt/soft/nginx/sbin/nginx]
0x4eaed8 : ngx_http_process_request_uri+0x95e/0x1876
[/opt/soft/nginx/sbin/nginx]
0x4ea414 : ngx_http_ssl_servername+0x6d5/0x83b
[/opt/soft/nginx/sbin/nginx]
0x4e94ea : ngx_http_init_connection+0x785/0x78f
[/opt/soft/nginx/sbin/nginx]
0x4d11b3 : ngx_os_specific_status+0xe55/0x12aa
[/opt/soft/nginx/sbin/nginx]
0x4c3e5a : ngx_process_events_and_timers+0xd6/0x165
[/opt/soft/nginx/sbin/nginx]

Without AIO enabled:

-----Kernel Space Call Stack-----
0xffffffff8111f700 : sync_page+0x0/0x50 [kernel]
0xffffffff8111f75e : sync_page_killable+0xe/0x40 [kernel]
0xffffffff81529e7a : __wait_on_bit_lock+0x5a/0xc0 [kernel]
0xffffffff8111f667 : __lock_page_killable+0x67/0x70 [kernel]
0xffffffff81121394 : generic_file_aio_read+0x4b4/0x700 [kernel]
0xffffffff81188c8a : do_sync_read+0xfa/0x140 [kernel]
0xffffffff81189645 : vfs_read+0xb5/0x1a0 [kernel]
0xffffffff81189972 : sys_pread64+0x82/0xa0 [kernel]
0xffffffff8100b288 : tracesys+0xd9/0xde [kernel]
-----User Space Call Stack-----
0x3c36e0f043 : __pread_nocancel+0xa/0x67 [/lib64/libpthread-2.12.so]
0x4c9ab1 : ngx_read_file+0x35/0xb9 [/opt/soft/nginx/sbin/nginx]
0x5152e9 : ngx_http_file_cache_open+0xc91/0x1437
[/opt/soft/nginx/sbin/nginx]
0x514df6 : ngx_http_file_cache_open+0x79e/0x1437
[/opt/soft/nginx/sbin/nginx]
0x514abb : ngx_http_file_cache_open+0x463/0x1437
[/opt/soft/nginx/sbin/nginx]
0x5033d0 : ngx_http_upstream_init+0xbc5/0x7eb4
[/opt/soft/nginx/sbin/nginx]
0x502916 : ngx_http_upstream_init+0x10b/0x7eb4
[/opt/soft/nginx/sbin/nginx]
0x5028ad : ngx_http_upstream_init+0xa2/0x7eb4
[/opt/soft/nginx/sbin/nginx]
0x4f8313 : ngx_http_read_client_request_body+0x117/0xd1c
[/opt/soft/nginx/sbin/nginx]
0x5351da : ngx_http_ssi_map_uri_to_path+0x1aff0/0x3a5e0
[/opt/soft/nginx/sbin/nginx]
0x4df55b : ngx_http_core_content_phase+0x41/0x1c9
[/opt/soft/nginx/sbin/nginx]
0x4de484 : ngx_http_core_run_phases+0x87/0xc2
[/opt/soft/nginx/sbin/nginx]
0x4de3fb : ngx_http_handler+0x1c3/0x1c5 [/opt/soft/nginx/sbin/nginx]
0x4ec468 : ngx_http_process_request+0x304/0xa98
[/opt/soft/nginx/sbin/nginx]
0x4eaed8 : ngx_http_process_request_uri+0x95e/0x1876
[/opt/soft/nginx/sbin/nginx]
0x4ea414 : ngx_http_ssl_servername+0x6d5/0x83b
[/opt/soft/nginx/sbin/nginx]
0x4e94ea : ngx_http_init_connection+0x785/0x78f
[/opt/soft/nginx/sbin/nginx]
0x4d11b3 : ngx_os_specific_status+0xe55/0x12aa
[/opt/soft/nginx/sbin/nginx]
0x4c3e5a : ngx_process_events_and_timers+0xd6/0x165
[/opt/soft/nginx/sbin/nginx]
0x4cf1a3 : ngx_single_process_cycle+0x1053/0x2114
[/opt/soft/nginx/sbin/nginx]

Is anyone see these issue (pending on Stat D) before? how can I resolve
it?
it seems that not the kernel issue.

I have done some basic statistics in 3 seconds:
Kernel function: call times
sync_page: 91
sync_buffer:2
generic_file_aio_read: 3240
sys_read: 3240
sys_write: 698

Thanks
Qiang

Hello!

On Fri, Sep 26, 2014 at 03:45:40PM +0800, Zhang Qiang wrote:

-----Kernel Space Call Stack-----
-----User Space Call Stack-----
0x3c362e50c9 : syscall+0x19/0x40 [/lib64/libc-2.12.so]
0x4d2232 : ngx_linux_sendfile_chain+0xc2a/0xc2c
[/opt/soft/nginx/sbin/nginx]
0x4d24ea : ngx_file_aio_read+0x2b6/0x528 [/opt/soft/nginx/sbin/nginx]
0x515247 : ngx_http_file_cache_open+0xbef/0x1437
[/opt/soft/nginx/sbin/nginx]
0x514df6 : ngx_http_file_cache_open+0x79e/0x1437
[/opt/soft/nginx/sbin/nginx]
0x514abb : ngx_http_file_cache_open+0x463/0x1437
[/opt/soft/nginx/sbin/nginx]

Just a side note: it looks like there are very wrong symbols shown
in the stack, likely due to optimizations used.

[…]

Is anyone see these issue (pending on Stat D) before? how can I resolve it?
it seems that not the kernel issue.

What makes you think so? It looks like the kernel issue from here -
“unkillable” proceses just can’t happen due to userland code
unless there is a problem in the kernel.


Maxim D.
http://nginx.org/