Nginx workers in "D" status in top

I’m seeing a number of workers periodically entering the “D” status in
top (uninteruptible sleep). Normally, this means it is blocking on
disk IO. However, I am using nginx 0.7.62 (default package) on Ubunutu
9.10, and I believe asynchronous IO should be enabled.

We are using proxy_cache, so there is some reading from disk in our
configuration (not just reverse proxy).

Is it normal to have nginx workers block on the disk even on an
asynchronous IO-capable system?

How can I check if nginx is actually using async IO?

My configuration is large, but I will post fragments if necessary. I
am using 4 worker processes on a single-CPU system (recently upped to
10 because of this issue).

Thanks,


RPM

I has this problem before, somehow nginx was blocking on access to a
NFS partition that was saturated. Check if you have any link
whatsoever with a saturated disk, and pay special attention to NFS
partitions.

On Tue, Jul 27, 2010 at 6:54 PM, Ryan M. [email protected]
wrote:


nginx mailing list
[email protected]
nginx Info Page


Joaquin Cuenca Abela

On Tue, Jul 27, 2010 at 12:22 PM, Joaquin Cuenca Abela
[email protected] wrote:

I has this problem before, somehow nginx was blocking on access to a
NFS partition that was saturated. Check if you have any link
whatsoever with a saturated disk, and pay special attention to NFS
partitions.

No NFS involved at all, but I suppose it is possible the disk is being
saturated (even though it is actually a very fast SAN volume). I will
have to check on that, but if the IO is asynchronous the nginx workers
should not get into the “uninteruptible sleep” sate.

That said, I do get frequent large requests and responses being
spooled to temporary files on disk. Those should all be handled by
filesystem cache, but I wonder if nginx doesn’t use async IO for those
processes.

I also suppose it could be access logging… this is a very busy
system (handling > 100 requests per second). Does the access log not
use async IO?


RPM

On Tue, Jul 27, 2010 at 11:54:31AM -0500, Ryan M. wrote:

How can I check if nginx is actually using async IO?

My configuration is large, but I will post fragments if necessary. I
am using 4 worker processes on a single-CPU system (recently upped to
10 because of this issue).

nginx supports file AIO only in 0.8.11+, but the file AIO is functional
on FreeBSD only. On Linux AIO is supported by nginx only on kerenl
2.6.22+ (although, CentOS 5.5 has backported the required AIO features).
Anyway, on Linux AIO works only if file offset and size are aligned
to a disk block size (usually 512 bytes) and this data can not be cached
in OS VM cache (Linux AIO requires DIRECTIO that bypass OS VM cache).
I believe a cause of so strange AIO implementaion is that AIO in Linux
was developed mainly for databases by Oracle and IBM.


Igor S.
http://sysoev.ru/en/

On Tue, Jul 27, 2010 at 1:45 PM, Igor S. [email protected] wrote:

nginx supports file AIO only in 0.8.11+, but the file AIO is functional
on FreeBSD only. On Linux AIO is supported by nginx only on kerenl
2.6.22+ (although, CentOS 5.5 has backported the required AIO features).
Anyway, on Linux AIO works only if file offset and size are aligned
to a disk block size (usually 512 bytes) and this data can not be cached
in OS VM cache (Linux AIO requires DIRECTIO that bypass OS VM cache).
I believe a cause of so strange AIO implementaion is that AIO in Linux
was developed mainly for databases by Oracle and IBM.

Thank you for the detailed explanation. I suppose I am somewhat
shocked by the state of AIO on Linux, but then again most server
applications are likely using blocking IO with threads, as the
programming model is more straightforward.

I often see 3 or more nginx workers in “uninterruptible sleep” state
at the same time, even if only for a few ms. I presume this means any
other connections being handled by those workers are also blocked,
even if they are proxy-only connections that don’t hit the disk. We
noticed extended response times from our application at peak periods,
even though the observed CPU load on the back-end would actually dip
and then spike. So I think that nginx is effectively “queuing”
requests that are blocked behind requests that have large responses
that get spooled to disk.

Do you foresee an issue with running 10 or more workers per CPU core?
Is there any upper bound on the number nginx of workers after which
inter-process communication overhead starts to become problematic?


RPM