Re: All workers in 'D' state using sendfile

Hi,

I am facing the same exact issue as explained by Drew,

is there any working solution to tune nginx for higher throughput?

or how to deal with sleeping D state nginx processes ?

i can post my server specs and nginx conf is needed but I would like to
ask
Drew if he found the working properly solution or not

Regards

Hello!

On Sun, Jun 09, 2013 at 06:17:10AM +0430, Host DL wrote:

I am facing the same exact issue as explained by Drew,

is there any working solution to tune nginx for higher throughput?

or how to deal with sleeping D state nginx processes ?

See this reply for basic tuning suggestions:

http://mailman.nginx.org/pipermail/nginx/2012-May/033761.html


Maxim D.
http://nginx.org/en/donation.html

Hello Maxim,

Thanks for your response, and sorry that I am new to mailing list and my
1st message may was not very clear to you

I’ve already read all posts in this conversation and all tuning options
has
been tested

I’m using 8x 2TB SATA ENT in RAID10 level + 64G RAM on my box
CentOS 5.9 x64_84 / 2.6.18-348.6.1.el5

nginx.conf:

worker_priority -10;
worker_processes 64;
worker_rlimit_nofile 20000;

events {
worker_connections 2048;
use epoll;
worker_aio_requests 128;
}

http {
sendfile off;
tcp_nopush on;
tcp_nodelay on;
aio on;
directio 2m;
#directio_alignment 4k;
output_buffers 1 1m;

keepalive_timeout  15;

......

}

During the peak time connections will reach up to 14-15K in total and
more
than 1Gbit/s outgoing throughput
Please note that the server was stable with about 12K connections in the
peak time and about 1-1.1Gbit/s throughput but after adding another VH
with
about 2-3K connections it seems that server is unable to handle the
request
properly at the peak time
Its expected the throughput to exceed the previous ~1.1Gbit/s rate but
it
doesn’t, Even it doesn’t reach to 1Gbit/s while the connections are now
getting more and bigger

During every peak time the LA will each to the number of nginx workers (
64
for my current config ) and will stay at the same rate to the end of
peak
time, all processes are in D state and the interesting thing, memory is
not
being used fully and it may push about 30-40G with about 20-30% I/O wait

Connections are not being processed fast and it will stay in connecting
and
request sent/wait for a few seconds and then data transfer will start at
a
very slow rate

I’ve already tried both sendfile and AIO, and AIO seems to handle the
connections better
Played with output_buffer and increased both number and size of buffers
but
no any special effect, even the throughput got lower

The interesting thing is that at the same peak time the read transfer
rate
is more than 500mbit/s using scp/rsync from the RAID to /dev/null

Sorry for my long post and if my english sux
Any suggestion would be greatly appreciarted

Respect,
Moozi

============================================

Dear Maxim,

Thanks for your response,
I don’t think it will help much since all of my files are larger enough
than 2MB

Regarding AIO problem in linux, do you think using AIO + sendfile
together
on FreeBSD will be better in performance in my case?

Respect

============================================

Hello!

On Sun, Jun 09, 2013 at 07:24:49PM +0430, Host DL wrote:

}
keepalive_timeout 15;
Its expected the throughput to exceed the previous ~1.1Gbit/s rate but it
doesn’t, Even it doesn’t reach to 1Gbit/s while the connections are now
getting more and bigger

During every peak time the LA will each to the number of nginx workers ( 64
for my current config ) and will stay at the same rate to the end of peak
time, all processes are in D state and the interesting thing, memory is not
being used fully and it may push about 30-40G with about 20-30% I/O wait

Main problem with AIO on Linux is that it requires directio to
actually work asynchronously. I would assume you’ve just reached
a critical number of synchronous requests to disks as due to
“directio 2m” in your config (and that’s why you see all workers
in D state). Try tuning directio to a lower value to see if it
helps.

Note well: memory is not used for filesystem cache with directio,
so there is no surprise it’s not being used fully.


Maxim D.
http://nginx.org/en/donation.html

On Monday 10 June 2013 15:08:45 Host DL wrote:

Dear Maxim,

Thanks for your response,
I don’t think it will help much since all of my files are larger enough
than 2MB
[…]

How large? And how big is the entire dataset? Perhaps, quite the
opposite you
should increase the directio value for better utilization of pagecache.

wbr, Valentin V. Bartenev


http://nginx.org/en/donation.html

Thanks you for clarification on this issue

The last modification that worked properly was reducing workers
I tried both 64, 24 and 12 workers and less workers was better in
performances during the peak time

I can’t recognize why but it works like a charm

PS: Dual E5-2620 has been used which leads to 24 HT core and 12 real
core

Hello!

On Mon, Jun 10, 2013 at 03:38:45PM +0430, Host DL wrote:

Dear Maxim,

Thanks for your response,
I don’t think it will help much since all of my files are larger enough
than 2MB

The “D” state of nginx workers has only one explanation: blocking
operations on disks. When serving static files with nginx on
Linux, this basically means one of the following:

  1. opening / stat()'ing files
  2. blocking aio reads due to no directio on unaligned reads
  3. blocking aio reads due to no directio on small files

Within nginx, you may reduce possibility of (3) using directio
directive with a smaller value. Both (1) and (2) are more or less
unavoidable, but aren’t likely to happen, at least with proper OS
tuning.

If in doubt, try tracing where worker processes are blocked. As a
very first step, ps(1) output should be examined for a wait
channel column (wchan).

Regarding AIO problem in linux, do you think using AIO + sendfile together
on FreeBSD will be better in performance in my case?

Yes.


Maxim D.
http://nginx.org/en/donation.html