2 questions on send_timeout (maybe bugs)

dubstep · May 18, 2011, 11:15am

hi all,

I meet 2 question when using nginx 1.0.0.

Question1.

what happens if the downstream connection breaks, before all data sent
to the client?

get client’s request
set handler: r->read_event_handle=ngx_http_block_reading
send data to client, and blocked ( write_filter.c:c->send_chain()
returns non-NULL or non-error), and go back to epoll_wait
HERE, the downstream breaks
epoll detect the breaking, so epoll_wait() continues in
ngx_epoll_process_events(), where the revents==EPOLL_IN | EPOLL_ERR |
EPOLL_HUP. So we call rev->handler(), which calls r->read_event_handle,
which dose nothing. AS A RESULT, the downstream breaking was not
detected by nginx !!!
Then, we have to wait for send_timeout…

This situation dosenot occur everytime. If the downstream breaks, when
nginx is sending data, the breaking will be detected.
But I can make this happen all the time, when:

the client and server are far away from each other (e.g. not in a
local net)
using memc module

Question2.

I set ‘send_timeout’ as 20s, but nginx timeouts in 40s actually.

in
src/http/ngx_http_upstream.c[line2448]:ngx_http_upstream_process_non_buffered_request(),
a timer is added;
after that timer timeout, in
src/http/ngx_http_request.c[line2179]:ngx_http_set_write_handler(),
another timer is added again.

so the send_timeout is doubled.

Thanks
Wu

SSWuSSBingzhengSS_S · May 18, 2011, 3:35pm

Hello!

On Wed, May 18, 2011 at 05:15:00PM +0800, Wu Bingzheng wrote:

downstream breaking was not detected by nginx !!!
Here write event handler will be called. While standard
ngx_http_writer() is in use - it will call output filter chain and
detect error. Else it’s up to active handler to correctly handle
this, e.g. standard upstream modules (proxy/fastcgi/…) may
either terminate request immediately or wait for full upstream
response depending on various settings (proxy_ignore_client_abort,
proxy_cache/proxy_store enabled).

Question2.
so the send_timeout is doubled.
Timeout is re-set on each send operation, and expected to be
triggered when nothing happens during send_timeout.

Maxim D.

SSWuSSBingzhengSS_S · May 19, 2011, 4:40am

Hi Maxim,

Thanks for your answer. But I am still confused.

At 2011-05-18 21:35:07，“Maxim D.” [email protected] wrote:

Question1.
5. epoll detect the breaking, so epoll_wait() continues in
response depending on various settings (proxy_ignore_client_abort,
proxy_cache/proxy_store enabled).

You said the write handler will be called. But if a connection is break,
revents returned by epoll_wait() is ‘EPOLL_IN | EPOLL_ERR | EPOLL_HUP’,
and no EPOLL_OUT in revents. So I think the write-handler will NOT be
called.
The FACT is, in my server, there are many many access logs whose
time-taken are 40.001s. (send_timeout is 20s, and this is a music online
server, so many people re-choose a song before the current song finishs,
which means the downstream breaks before nginx sends all data).

I add 2 debug log in ngx_epoll_process_events(), and copy part of the
debug log here.
In this log, fd8 is downstream and fd9 is upstream. Only the downstream
is what we will focus on (so ignore the fd9 please).
You can see that, in line 4, nginx write 496B to downstream, and the
return value is non-NULL(line 5), which means the downstream is delayed.
In line 20, epoll_wait() returns with revents=25, which means the
fd8(downstream) is break. But only a read-handler is called in line
21, and NO write-handler is called.
So, nginx doesnot detect the break of downstream, and waits for
send_timeout, and wakes in line 32 and 35.

  0 ......
  1 2011/05/19 10:00:07 [debug] 26851#0: *1 write new buf t:0 f:0

0000000000000000, pos 00000000005AFB70, size: 4096 file: 0, size: 0
2 2011/05/19 10:00:07 [debug] 26851#0: *1 http write filter: l:0
f:1 s:4096
3 2011/05/19 10:00:07 [debug] 26851#0: *1 http write filter limit
0
4 2011/05/19 10:00:07 [debug] 26851#0: *1 writev: 496
5 2011/05/19 10:00:07 [debug] 26851#0: *1 http write filter
00000000005D91C0
6 2011/05/19 10:00:07 [debug] 26851#0: *1 copy filter: -2
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
7 2011/05/19 10:00:07 [debug] 26851#0: *1 event timer add: 8:
20000:1305770427453
8 2011/05/19 10:00:07 [debug] 26851#0: *1 event timer del: 9:
1305770467453
9 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll call wriet
handler. fd: 9, revnets:5
10 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
11 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream dummy
handler
12 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll_wait: fd 9,
revnets:5
13 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll call read handler.
fd: 9, revnets:5
14 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
15 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream process
non buffered upstream
16 2011/05/19 10:00:07 [debug] 26851#0: *1 event timer: 8, old:
1305770427453, new: 1305770427453
17 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll call wriet
handler. fd: 9, revnets:5
18 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
19 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream dummy
handler
20 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll_wait: fd 8,
revnets:25
21 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll call read handler.
fd: 8, revnets:25
22 2011/05/19 10:00:07 [debug] 26851#0: *1 http run request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
23 2011/05/19 10:00:07 [debug] 26851#0: *1 http reading blocked
24 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll_wait: fd 9,
revnets:5
25 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll call read handler.
fd: 9, revnets:5
26 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
27 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream process
non buffered upstream
28 2011/05/19 10:00:07 [debug] 26851#0: *1 event timer: 8, old:
1305770427453, new: 1305770427654
29 2011/05/19 10:00:07 [debug] 26851#0: *1 epoll call wriet
handler. fd: 9, revnets:5
30 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
31 2011/05/19 10:00:07 [debug] 26851#0: *1 http upstream dummy
handler
32 2011/05/19 10:00:27 [debug] 26851#0: *1 event timer del: 8:
1305770427453
33 2011/05/19 10:00:27 [debug] 26851#0: *1 http run request:
“/data/music/885720/ä»~Jå¤©.mp3?xcode=efb9e826d9161812ab9483faf71dddbb”
34 2011/05/19 10:00:27 [debug] 26851#0: *1 http upstream process
non buffered downstream
35 2011/05/19 10:00:27 [info] 26851#0: *1 client timed out (110:
Connection timed out) while sending to client, client: 61.135.169.175,
server: z
iyouziyuan.baidu.com, request: “GET
/data/music/885720/%E4%BB%8A%E5%A4%A9.mp3?xcode=efb9e826d9161812ab9483faf71dddbb
HTTP/1.0”, upstream: “mem
cached://172.18.254.50:11200”
36 2011/05/19 10:00:27 [debug] 26851#0: *1 finalize http upstream
request: 0
37 2011/05/19 10:00:27 [debug] 26851#0: *1 finalize http memcached
request

Question2.
so the send_timeout is doubled.

Timeout is re-set on each send operation, and expected to be
triggered when nothing happens during send_timeout.

So, you think that, I set ‘send_timeout’ as 20s, but nginx actually
timeout in 40s is right and expected?

Maxim D.

nginx mailing list
[email protected]
nginx Info Page

Thanks again for you responding.
Best Regards,
Wu

SSWuSSBingzhengSS_S · May 19, 2011, 1:38pm

Hello!

On Thu, May 19, 2011 at 10:39:43AM +0800, Wu Bingzheng wrote:

downstream breaking was not detected by nginx !!!
You said the write handler will be called. But if a connection
is break, revents returned by epoll_wait() is ‘EPOLL_IN |
EPOLL_ERR | EPOLL_HUP’, and no EPOLL_OUT in revents. So I think
the write-handler will NOT be called.

Ah, ok, it looks like I was wrong here. Write handler will be
only called if there are no EPOLLIN flag in revents, and this is
probably wrong assuming EPOLLHUP follows the same semantics as
POLLHUP[1], i.e.

POLLHUP
… This event and POLLOUT are mutually-exclusive; a stream can
never be writable if a hangup has occurred. …

Could you please provide detailed instructions to reproduce the
problem, preferably with standard modules involved?

[1] poll

The FACT is, in my server, there are many many access logs whose
time-taken are 40.001s. (send_timeout is 20s, and this is a
music online server, so many people re-choose a song before the
current song finishs, which means the downstream breaks before
nginx sends all data).

I add 2 debug log in ngx_epoll_process_events(), and copy part
of the debug log here.

Just a side note: mostly same information may be seen at global
level debug log.

[…]

Timeout is re-set on each send operation, and expected to be
triggered when nothing happens during send_timeout.

So, you think that, I set ‘send_timeout’ as 20s, but nginx
actually timeout in 40s is right and expected?

It should timeout in 20s after last send. Whether request time
of 40s is wrong or not depends on many factors, but the code you
pointed out looks fine.

Note also that in debug log you provided nginx correctly times out
after 20s from last send.

Maxim D.

SSWuSSBingzhengSS_S · May 20, 2011, 5:58am

Hi Maxim,

I am so sorry that I made a big mistake on question 1.
Nginx didn’t detect the downstream breaking, because I use
“proxy_ignore_client_abort on;”.
So Nginx works correctlly. I am so sorry about it.

For question 2.
You said that :

Note also that in debug log you provided nginx correctly times out
after 20s from last send.

That’s because I set the send_timeout as 10s. I am sorry that I did not
make clear of that.

You can reproduce the doubled timeout easy:

[server] using nginx-1.0.0, without any addon module. in nginx.conf:

server {
    listen       5821;
    proxy_buffering off; #must in non-buffering mode
    send_timeout 10;

    location / {
        proxy_pass http://sourceserver;
    }
}

[client] you can run this simple python script:

#!/usr/bin/python
import socket
import time
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((127.0.0.1, 5821))
s.send(“GET /data/something HTTP/1.0\r\n\r\n”)
buf = s.recv(100)
print buf
time.sleep(100)

Then, in access.log, you will find that the request is finished in 20s.

Best Regads,
Wu

SSWuSSBingzhengSS_S · May 20, 2011, 1:33pm

Hello!

On Fri, May 20, 2011 at 11:58:20AM +0800, Wu Bingzheng wrote:

Hi Maxim,

I am so sorry that I made a big mistake on question 1.
Nginx didn’t detect the downstream breaking, because I use
“proxy_ignore_client_abort on;”.
So Nginx works correctlly. I am so sorry about it.

I believe the problem may still appear with normal usage once
upstream connection terminates, EPOLLHUP handling in epoll event
module doesn’t look correct for me. This should be investigated
further.

For question 2.
You said that :

Note also that in debug log you provided nginx correctly times out
after 20s from last send.

That’s because I set the send_timeout as 10s. I am sorry that I did not make
clear of that.

You can reproduce the doubled timeout easy:

Ok, I was able to reproduce this and see the problem. Indeed,
when upstream module detects client timeout it doesn’t finalize
request properly, and as a result http core thinks it should
continue sending data in buffers and hence waits for another
timeout.

Quick fix for this particular problem would be something like

diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c
— a/src/http/ngx_http_upstream.c
+++ b/src/http/ngx_http_upstream.c
@@ -2320,7 +2320,7 @@ ngx_http_upstream_process_non_buffered_d
if (wev->timedout) {
c->timedout = 1;
ngx_connection_error(c, NGX_ETIMEDOUT, “client timed out”);

   ngx_http_upstream_finalize_request(r, u, 0);

   ngx_http_upstream_finalize_request(r, u,

NGX_HTTP_REQUEST_TIME_OUT);
return;
}

@@ -3013,6 +3013,7 @@ ngx_http_upstream_finalize_request(ngx_h
#endif

 if (u->header_sent

   && rc != NGX_HTTP_REQUEST_TIME_OUT
   && (rc == NGX_ERROR || rc >= NGX_HTTP_SPECIAL_RESPONSE))

{
rc = 0;

Though upstream module certainly requires larger cleanup of error
handling. This is not the first time I’ve been bitten by upstream
not finalizing requests properly.

Maxim D.