Sendfile EAGAIN support broken?

Dobai-Pataky_BSSSSl · December 24, 2010, 11:03pm

I’ve been trying figure out a problem we are seeing in passenger module
where by the full POST data isnt passed to the underlying application.

After much digging I believe this may actually be a problem with the
http_upstream / sendfile support in the core nginx code.

The following log snippet shows a trace of this happening:-

2010/12/24 18:04:08 [debug] 27169#0: *21 connected
2010/12/24 18:04:08 [debug] 27169#0: *21 http upstream connect: 0
2010/12/24 18:04:08 [debug] 27169#0: *21 http upstream send request
2010/12/24 18:04:08 [debug] 27169#0: *21 chain writer buf fl:0 s:2405
2010/12/24 18:04:08 [debug] 27169#0: *21 chain writer buf fl:0 s:15939
2010/12/24 18:04:08 [debug] 27169#0: *21 chain writer buf fl:1
s:23658449
2010/12/24 18:04:08 [debug] 27169#0: *21 chain writer in:
0000000801885F78
2010/12/24 18:04:08 [debug] 27169#0: *21 sendfile() sent only 18344
bytes (35: Resource temporarily unavailable)
2010/12/24 18:04:08 [debug] 27169#0: *21 sendfile: -1, 1, 0, @0
18344:23658449
2010/12/24 18:04:08 [debug] 27169#0: *21 chain writer out:
0000000801885F98
2010/12/24 18:04:08 [debug] 27169#0: *21 event timer add: 23:
600000:1293214448446
2010/12/24 18:04:08 [debug] 27169#0: *21 kevent set event: 23: ft:-2
fl:0025
2010/12/24 18:04:08 [debug] 27169#0: *21 http run request:
“/uploadfile”
2010/12/24 18:04:08 [debug] 27169#0: *21 http upstream check client,
write event:1, “/uploadfile”
2010/12/24 18:06:08 [debug] 27169#0: *21 http run request:
“/uploadfile”

The last line is where the 2 minute request timeout triggered.

It seems that even though EAGAIN is returned there is never an attempt
to try to send the remaining data.

Any ideas?

Posted at Nginx Forum:

steveh · December 25, 2010, 1:44am

In case its helpful there’s some additional info on the passenger list
here:-
http://groups.google.com/group/phusion-passenger/t/51e2e1d124c16d5e?hl=en

Posted at Nginx Forum:

steveh · December 25, 2010, 2:39am

Hi,

I’ve been trying figure out a problem we are seeing in passenger module
where by the full POST data isnt passed to the underlying application.

Passenger is horribly broken, can you try to reproduce this using HTTP
reverse-proxy (“proxy_pass”) instead?

Best regards,
Piotr S. < [email protected] >

steveh · December 25, 2010, 3:57am

Hello!

On Fri, Dec 24, 2010 at 05:01:39PM -0500, steveh wrote:

2010/12/24 18:04:08 [debug] 27169#0: *21 http upstream send request
2010/12/24 18:04:08 [debug] 27169#0: *21 chain writer out:
0000000801885F98

Sendfile returned EAGAIN - it’s somewhat normal, though exact
number of bytes sent looks suspicious (two memory buffers, not a
single byte from file).

2010/12/24 18:04:08 [debug] 27169#0: *21 event timer add: 23:
600000:1293214448446

Timeout on send to upstream set
(proxy_send_timeout/fastcgi_send_timeout/…)
set, it’s 10 minutes.

2010/12/24 18:04:08 [debug] 27169#0: *21 kevent set event: 23: ft:-2
fl:0025

Write event enabled and expected to be triggered by OS as long as
write will be again possible on the connection in question.

2010/12/24 18:04:08 [debug] 27169#0: *21 http run request:
“/uploadfile”
2010/12/24 18:04:08 [debug] 27169#0: *21 http upstream check client,
write event:1, “/uploadfile”
2010/12/24 18:06:08 [debug] 27169#0: *21 http run request:
“/uploadfile”

The last line is where the 2 minute request timeout triggered.

By “2 minute request timeout” you mean something in your client?
The only timeout nginx will maintain in this state is upstream
send_timeout (just set to 10 mins).

It seems that even though EAGAIN is returned there is never an attempt
to try to send the remaining data.

Any ideas?

Looks like OS problem. So the questions are: which OS? which
filesystem used for client_body_temp_path?

Assuming OS is FreeBSD: make sure filesystem isn’t tmpfs and/or
zfs, both had known problems with sendfile(). These problems are
believed to be fixed, but I doubt you’re running recent enough
9-current or 8-stable.

Removing passenger out of the picture for further debugging may be
a good idea, too.

Maxim D.

steveh · January 6, 2011, 8:11am

Hello!

On Wed, Jan 05, 2011 at 08:01:32PM -0500, steveh wrote:

[…]

Going through the code I couldn’t figure our how any retry would be
triggered / attempted.

This line in your debug log indicate that write event was
successfully registered:

2010/12/24 18:04:08 [debug] 27169#0: *21 kevent set event: 23: ft:-2
fl:0025

It should be triggered via event loop once write() will be
possible on connection in question.

Quite happy to throw some debugging to try and
identify the problem if someone can point me in the right direction.

Don’t think its possible to remove passenger out the equation as that’s
the only way I’ve been able to trigger this thus far.

The only suspicious thing in debug log you’ve provided is strange
number of bytes sent in initial sendfile() call - i.e. only first
two memory buffers, but not a single byte from file. And nothing
else happens as if not a single byte were read by other side.

Though quick look though passenger nginx module suggest it uses
unix sockets, and this somewhat explains the above - as unix
sockets do watermarks autotuning.

So the most possible reason is “other end” (i.e. your application
and/or passenger) not reading data from socket.

There are no any single sign which suggests there are something
wrong in nginx.

Maxim D.

steveh · January 6, 2011, 2:02am

Sorry for the slow response, I expected to be auto subscribed to my own
thread but wasnt.

Anyway to answer some of the questions:-
proxy_read_timeout 120 is whats triggering the timeout

The OS is FreeBSD running 8-stable from just before Christmas so a
recent 8.2-PRERELEASE, which should have any needed OS fixes in. FS is
ZFS and while it does have some performance penalties (dual buffers and
not zero copy) for sendfile it otherwise works fine for streaming at
least so I don’t think that’s the source atm.

Going through the code I couldn’t figure our how any retry would be
triggered / attempted. Quite happy to throw some debugging to try and
identify the problem if someone can point me in the right direction.

Don’t think its possible to remove passenger out the equation as that’s
the only way I’ve been able to trigger this thus far.

Posted at Nginx Forum:

steveh · January 6, 2011, 10:21am

As far as I can tell passenger is reading / has read all the data and is
sitting their waiting for more as it knows the total size. That’s what
alerted us to the issue in the first place as it complains about a
corrupt post as its short.

What would you expect to see when a write space event is triggered,
another kevent line? If so any ideas what?

Posted at Nginx Forum: