D state when high load on same static file

Hi guys,

I have some file hosting servers running nginx, serving static files
which size average from 500MB to 6GB.
The servers use Lustre1.8 as cluster filesystem.
Files stay on some raid6 array with stripe size = 512KB.

At normal situation, nginx works very well.

_ The disk i/o :
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdc 727.00 136.23 5.75 136 5
sdd 627.00 124.36 0.01 124 0

_ Traffic out :
iface Rx Tx
Total

===========================================================================
eth2: 480.00 b/s 551.30 Mb/s 551.30
Mb/s
eth3: 480.00 b/s 481.72 Mb/s 481.72
Mb/s
eth4: 136.47 Mb/s 525.45 Mb/s 661.92
Mb/s
eth5: 480.00 b/s 497.82 Mb/s 497.82
Mb/s
bond0: 136.47 Mb/s 2.01 Gb/s 2.14
Gb/s

_ Number of files are serving :

lsof -u nginx -n | grep storagefile | wc -l

3982

But, when in hot situation (when a new hot file, may be a porn dvd or
JAV recently uploaded), there are a lot of clients (more than 800)
download that file at the same time, and nginx processes are going to D
state.
At last, nearly all processes are D state ! This makes the download
speed very slow :frowning:

root 25821 0.0 0.0 33032 584 ? Ss Dec20 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 25823 0.8 1.5 158384 126396 ? S Dec20 23:53 nginx:
worker process
nginx 25824 0.7 1.8 182276 150420 ? D Dec20 21:58 nginx:
worker process
nginx 25825 0.7 2.1 207584 175728 ? D Dec20 22:01 nginx:
worker process
nginx 25826 0.8 1.8 186052 154196 ? D Dec20 23:32 nginx:
worker process
nginx 25827 0.7 1.9 191448 159464 ? D Dec20 23:03 nginx:
worker process
nginx 25828 0.8 1.6 166044 134188 ? D Dec20 24:56 nginx:
worker process
nginx 25829 0.7 1.3 139308 107452 ? S Dec20 23:00 nginx:
worker process
nginx 25830 0.7 1.7 176652 144796 ? D Dec20 21:08 nginx:
worker process
nginx 25832 0.7 1.2 136648 104788 ? D Dec20 20:25 nginx:
worker process
nginx 25833 0.8 1.7 178948 146964 ? D Dec20 23:27 nginx:
worker process
nginx 25834 0.7 2.0 195828 163968 ? D Dec20 21:45 nginx:
worker process
nginx 25835 0.8 1.6 166200 134344 ? S Dec20 23:30 nginx:
worker process
nginx 25836 0.8 1.3 144624 112640 ? D Dec20 23:50 nginx:
worker process
nginx 25837 0.7 1.3 143644 111784 ? D Dec20 22:02 nginx:
worker process
nginx 25838 0.7 1.3 141912 110056 ? D Dec20 21:17 nginx:
worker process
nginx 25839 0.6 1.4 150580 118724 ? S Dec20 20:12 nginx:
worker process
nginx 25840 0.8 1.5 158916 126928 ? D Dec20 23:48 nginx:
worker process

I have tried many time tuning the number of process worker, but I didn’t
work !
How can I fix this ? I think when a lot off client access the same file,
it suppose to be better because of caching ?!

Here is the config of nginx :

############
worker_processes 48;
worker_rlimit_nofile 800000;
events
{
worker_connections 51200;
use epoll;
}
http
{
sendfile off;
directio 1m;
output_buffers 1 512k;

    tcp_nopush      off;
    tcp_nodelay     on;

    keepalive_timeout  5;

nginx -V

nginx: nginx version: nginx/1.0.0
nginx: built by gcc 4.1.2 20080704 (Red Hat 4.1.2-50)

uname -a

Linux OST 2.6.18-194.17.1.el5_lustre.1.8.5 #1 SMP Mon Nov 15 15:48:43
MST 20

If you need more info, tell me.

Posted at Nginx Forum:

Guys !!
Is there anybody help me overcomes this ?

Posted at Nginx Forum:

Have you seen this link?

Oh yes I’ve read it before, but in my situation, the traffic is already
reach to 2Gbps and may be more.
It’s just bad when a lot client access on the same file.

And what is your point ?

Atrus@

Posted at Nginx Forum:

Hello!

On Wed, Dec 21, 2011 at 11:22:58PM -0500, atrus wrote:

I have some file hosting servers running nginx, serving static files
which size average from 500MB to 6GB.
The servers use Lustre1.8 as cluster filesystem.
Files stay on some raid6 array with stripe size = 512KB.

At normal situation, nginx works very well.

[…]

But, when in hot situation (when a new hot file, may be a porn dvd or
JAV recently uploaded), there are a lot of clients (more than 800)
download that file at the same time, and nginx processes are going to D
state.
At last, nearly all processes are D state ! This makes the download
speed very slow :frowning:

[…]

I have tried many time tuning the number of process worker, but I didn’t
work !
How can I fix this ? I think when a lot off client access the same file,
it suppose to be better because of caching ?!

No, filesystem cache isn’t supposed to work as you specifically
requested directio:

[…]

    directio        1m;

[…]

You may try removing the “directio” directive to see if it helps,
but with large working set it make make things worse.

Also it’s a good idea to try using AIO, see
http://nginx.org/en/docs/http/ngx_http_core_module.html#aio

Maxim D.

I have tried : sendfile, aio, directIO with the same scenario, same
buffer size, …
But it seems that directIO is the best, when I turn to Aio or sendfile,
the output bandwidth become less than directIO (about 200Mbps, just 10%
compares to directIO )

Is there anything tweak that relates to this problem ?

Thanks all guys.
Atrus@

Posted at Nginx Forum:

Hey Atrus,

I thought that article was relevant and was just checking if you read
it.
It looked like a very similar scenario to yours.

And what is your point ?

You are using Linux 2.6.18, which is old. You should upgrade to a recent
kernel.

As from the nginx documentation, 2.6.18 doesn’t even work with AIO:

http://nginx.org/en/docs/http/ngx_http_core_module.html#aio

On Linux, AIO is usable starting from kernel version 2.6.22; plus,
it is also necessary to enable directio, otherwise reading will be blocking"