Nginx 1.0.2 AIO problem with open_file_cache

Hello everybody. I recently have had some problem with AIO support and
need
some help or advice on it.

With AIO on, everything was OK. But when I enabled open_file_cache,
nginx
frequently closed client connection directly without sending back any
data.

My platform is linux 2.6.30. CentOS 5.4;

The nginx server that has this issue serves static file at backend.
Another
nginx serves as proxy in front of it.
Backend nginx config is like this:

aio on;

directio 512;

output_buffers 1 128k;

location /files {

    internal;


    alias /;
    open_file_cache max=50000 inactive=20s;


    open_file_cache_valid    30s;


    open_file_cache_min_uses 1;


    open_file_cache_errors   on;

With this config, on the frontend proxy server there are error logs like
this:

2011/09/03 17:01:19 [error] 12498#0: *78679759 upstream prematurely
closed
connection while reading response header from upstream
, client:
221.193.212.77, server: , request: “GET … HTTP/1.1”, upstream:
“http://**/file/871586e763bf4417bf8d475858deee2b/260110/0b47fc7f4d84e4bcb01a6d6d1fcacc94-3581749”,
host…

And I saw in tcpdump output the connection was closed very quickly by
backend nginx:

22:02:22.892327 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: S
1788188296:1788188296(0) win 5840 <mss 1460,sackOK,timestamp 666482608
0,nop,wscale 7>
22:02:22.895683 IP 172.16.228.63.irdmi > 172.16.113.32.58271: S
350810413:350810413(0) ack 1788188297 win 5792 <mss
1460,sackOK,timestamp
1948226601 66<
22:02:22.895695 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: . ack 1
win 46
<nop,nop,timestamp 666482612 1948226601>
22:02:22.895703 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: P
1:181(180)
ack 1 win 46 <nop,nop,timestamp 666482612 1948226601>
22:02:22.897680 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 181
win
54 <nop,nop,timestamp 1948226603 666482612>
22:02:22.898348 IP 172.16.228.63.irdmi > 172.16.113.32.58271: F 1:1(0)
ack
181 win 54 <nop,nop,timestamp 1948226603 666482612>
22:02:22.898473 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: F
181:181(0)
ack 2 win 46 <nop,nop,timestamp 666482615 1948226603>
22:02:22.899804 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 182
win
54 <nop,nop,timestamp 1948226605 666482615>

On the backend nginx, strace shows that io_getevents got an error “-22”
EINVAL
.

[pid 5275] recvfrom(48, 0x7fffe4400ae7, 1, 2, 0, 0) = -1 EAGAIN
(Resource
temporarily unavailable)
[pid 5275] epoll_wait(6, {{EPOLLIN|EPOLLOUT, {u32=2445415233,
u64=139850875540289}}}, 512, 5672) = 1
[pid 5275] recvfrom(201, “\1\6\0\1\0\337\1\0Status: 301 Moved
Perman”…,
4096, 0, NULL, NULL) = 256
[pid 5275] close(201) = 0
[pid 5275]
stat("//data1/871586e763bf4417bf8d475858deee2b/g268315/252/995f20f76d0d3871fdb5d71bbc92956c-27769564",
{st_mode=S_IFREG|0600, st_size=122507, …<
[pid 5275] io_submit(139850942226432, 1, {{0xf43718, 0, 0, 0, 40}}) =
1

[pid 5275] epoll_wait(6, {{EPOLLIN, {u32=7173248, u64=7173248}}},
512,
5671) = 1

[pid 5275] read(7, “\1\0\0\0\0\0\0\0”, 8) = 8
[pid 5275] io_getevents(139850942226432, 1, 64, {{0xf43718,
0xf436d8,
-22, 0}}, {0, 0}) = 1

[pid 5275] write(24, “172.16.164.30 - - 172.16.228.63:”…, 338) =
338
[pid 5275] close(48)

And access.log shows that response length is ZERO:

[03/Sep/2011:06:28:48 +0800] “GET
/file/871586e763bf4417bf8d475858deee2b/260284/ed31933859d27e33b8398deaea1d2ade-3969549
HTTP/1.0” 206 0 0.001 “-” “-” “-” Range:bytes=27459105-

After I removed those open_file_cache directives, No such error
occurred.

And the shorter inactive is, the more often I get such errors.

Anybody has idea about avoiding this or fixing it?

Hello!

On Sat, Sep 03, 2011 at 05:46:39PM +0800, dr-dr xp wrote:

nginx serves as proxy in front of it.
location /files {
open_file_cache_valid 30s;

22:02:22.892327 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: S
54 <nop,nop,timestamp 1948226603 666482612>

[pid 5275] io_submit(139850942226432, 1, {{0xf43718, 0, 0, 0, 40}}) = 1

Anybody has idea about avoiding this or fixing it?

Could you please provide debug log? See
Debugging | NGINX for more information.

Additionally, some more info about file requested might be
helpful: how big it is? whether it changed on backend during
requests? if yes, is it changed atomically?

Maxim D.

Hello!

On Sat, Sep 03, 2011 at 11:06:24PM +0800, dr-dr xp wrote:

Access: 2011-09-03 22:31:22.000000000 +0800

In my cluster there are also some other similar nodes running with the same
open_file_cache parameters but without aio enabled(sendfile is used). They
have no such issue.

Please try the attached patch.

Maxim D.

Hello everybody. I recently have had some problem with AIO support and
The nginx server that has this issue serves static file at backend.
output_buffers 1 128k;

With this config, on the frontend proxy server there are error logs like

46

22:02:22.898473 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: F
[pid 5275] recvfrom(48, 0x7fffe4400ae7, 1, 2, 0, 0) = -1 EAGAIN

stat(“//data1/871586e763bf4417bf8d475858deee2b/g268315/252/995f20f76d0d3871fdb5d71bbc92956c-27769564”,

-22, 0}}, {0, 0}) = 1*

Debugging | NGINX for more information.
nginx Info Page


要了几天饱饭就不记得西北风啥味了

Hello!

On Tue, Sep 06, 2011 at 12:16:19AM +0800, dr-dr xp wrote:

Thank you very much Dounin. With the patch it works fine now.

Thanks for testing, the patch was committed and will be included
in next release.

Maxim D.

The attachment is the error.log with only the failed request. And stat
Access: (0664/-rw-rw-r–) Uid: ( 1003/ www) Gid: ( 1003/
involved
Please try the attached patch.

data.

    internal;

connection while reading response header from upstream*, client:

46
ack

On the backend nginx, strace shows that io_getevents got an error
4096, 0, NULL, NULL) = 256

338

nginx Info Page

要了几天饱饭就不记得西北风啥味了