Nginx worker stuck, potential mod_zip bug

I’m running an older version of nginx (0.7.67) with mod_zip 1.1.6. I
believe we have a found a rare bug, I’m trying to figure out of it is
with mod_zip or with nginx, and if upgrading nginx will potentially
resolve it.

The symptom is a worker process getting “stuck” at 100% CPU, leaving
all connections in CLOSE_WAIT, and servicing no requests. It appears
that the trigger for this is downloading an archive from mod_zip, but
we have never been able to reproduce it, only observe it in
production.

I was finally able to catch one of the workers and get a backtrace from
gdb:

$ cat backtrace.log | addr2line -e /usr/lib/debug/usr/sbin/nginx -f
ngx_http_postpone_filter
/build/buildd/nginx-0.7.67/src/http/ngx_http_postpone_filter_module.c:125
ngx_http_ssi_body_filter
/build/buildd/nginx-0.7.67/src/http/modules/ngx_http_ssi_filter_module.c:430
ngx_http_charset_body_filter
/build/buildd/nginx-0.7.67/src/http/modules/ngx_http_charset_filter_module.c:643
ngx_http_zip_body_filter
/build/buildd/nginx-0.7.67/modules/mod_zip-1.1.6/ngx_http_zip_module.c:336
ngx_output_chain
/build/buildd/nginx-0.7.67/src/core/ngx_output_chain.c:58
ngx_http_copy_filter
/build/buildd/nginx-0.7.67/src/http/ngx_http_copy_filter_module.c:114
ngx_http_range_body_filter
/build/buildd/nginx-0.7.67/src/http/modules/ngx_http_range_filter_module.c:549
ngx_http_output_filter
/build/buildd/nginx-0.7.67/src/http/ngx_http_core_module.c:1716
ngx_event_pipe_write_to_downstream
/build/buildd/nginx-0.7.67/src/event/ngx_event_pipe.c:627
ngx_http_upstream_process_upstream
/build/buildd/nginx-0.7.67/src/http/ngx_http_upstream.c:2509
ngx_http_upstream_handler
/build/buildd/nginx-0.7.67/src/http/ngx_http_upstream.c:844
ngx_event_process_posted
/build/buildd/nginx-0.7.67/src/event/ngx_event_posted.c:30
ngx_worker_process_cycle
/build/buildd/nginx-0.7.67/src/os/unix/ngx_process_cycle.c:793
ngx_spawn_process
/build/buildd/nginx-0.7.67/src/os/unix/ngx_process.c:201
ngx_reap_children
/build/buildd/nginx-0.7.67/src/os/unix/ngx_process_cycle.c:612
main
/build/buildd/nginx-0.7.67/src/core/nginx.c:396
??
??:0
_start
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0

I also had the log in debug mode (it is incredibly large ~ 150GB
uncompressed) and it is completely filled with the following little
loop:

2012/02/17 11:58:45 [debug] 10150#0: *1888 mod_zip: entering
subrequest body filter
2012/02/17 11:58:45 [debug] 10150#0: *1888 http postpone filter
“/s3/bucket/key” 0000000000000000
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: 0
“/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write busy: 8192
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write:
out:0000000000000000, f:1
2012/02/17 11:58:45 [debug] 10150#0: *1888 http output filter
“/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: “/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 mod_zip: entering
subrequest body filter
2012/02/17 11:58:45 [debug] 10150#0: *1888 http postpone filter
“/s3/bucket/key” 0000000000000000
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: 0
“/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write busy: 8192
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write:
out:0000000000000000, f:1
2012/02/17 11:58:45 [debug] 10150#0: *1888 http output filter
“/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: “/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 mod_zip: entering
subrequest body filter
2012/02/17 11:58:45 [debug] 10150#0: *1888 http postpone filter
“/s3/bucket/key” 0000000000000000
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: 0
“/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write busy: 8192
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write:
out:0000000000000000, f:1
2012/02/17 11:58:45 [debug] 10150#0: *1888 http output filter
“/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: “/s3/bucket/key”

Am I right in assuming this is a bug in mod_zip, it looks like a
buffer is never being drained to the client?

Hello!

On Fri, Feb 17, 2012 at 04:03:46PM -0800, W. Andrew Loe III wrote:

I’m running an older version of nginx (0.7.67) with mod_zip 1.1.6. I
believe we have a found a rare bug, I’m trying to figure out of it is
with mod_zip or with nginx, and if upgrading nginx will potentially
resolve it.

The symptom is a worker process getting “stuck” at 100% CPU, leaving
all connections in CLOSE_WAIT, and servicing no requests. It appears
that the trigger for this is downloading an archive from mod_zip, but
we have never been able to reproduce it, only observe it in
production.

[…]

2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write busy: 8192
2012/02/17 11:58:45 [debug] 10150#0: *1888 pipe write: out:0000000000000000, f:1
2012/02/17 11:58:45 [debug] 10150#0: *1888 http output filter “/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 copy filter: “/s3/bucket/key”
2012/02/17 11:58:45 [debug] 10150#0: *1888 mod_zip: entering
subrequest body filter
2012/02/17 11:58:45 [debug] 10150#0: *1888 http postpone filter
“/s3/bucket/key” 0000000000000000

[…]

Am I right in assuming this is a bug in mod_zip, it looks like a
buffer is never being drained to the client?

Quick look though mod_zip sources suggests it doesn’t do anything
for this request (subrequest). I would rather think you’ve hit
something like this problem:

http://trac.nginx.org/nginx/changeset/4136/nginx

Try upgrading to 1.1.4+/1.0.7+ to see if it helps.

Maxim D.