Nginx with custom modules crashes in gzip crc32()

dubstep · October 17, 2011, 6:27am

I have nginx with my custom module, that rewrite content on some
conditions. Without it everything works fine, but after enabling it,
nginx start to crash approx. every 2 hours ( > 100 req./sec) coredump
shows that it crahses in gzip module:

Core was generated by `nginx: worker process '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000343ea0286d in crc32 () from /usr/lib64/libz.so.1
(gdb) bt
#0 0x000000343ea0286d in crc32 () from /usr/lib64/libz.so.1
#1 0x0000000000450f3f in ngx_http_gzip_filter_add_data (r=0x601f380,
in=0x29ac280) at src/http/modules/ngx_http_gzip_filter_module.c:708
#2 ngx_http_gzip_body_filter (r=0x601f380, in=0x29ac280) at
src/http/modules/ngx_http_gzip_filter_module.c:394
#3 0x0000000000451ac5 in ngx_http_postpone_filter (r=0x601f380,
in=0x29ac280) at src/http/ngx_http_postpone_filter_module.c:82
#4 0x00000000004521b1 in ngx_http_ssi_body_filter (r=0x343ea0c9c0,
in=0x7906442d) at src/http/modules/ngx_http_ssi_filter_module.c:392
#5 0x00000000004564b5 in ngx_http_charset_body_filter (r=0x343ea0c9c0,
in=0x61c0ff8) at src/http/modules/ngx_http_charset_filter_module.c:552
#6 0x0000000000457a5c in ngx_http_sub_body_filter (r=0x343ea0c9c0,
in=0x29ac280) at src/http/modules/ngx_http_sub_filter_module.c:188
#7 0x0000000000470e8f in ngx_http_af_filter (r=0x601f380, in=0x29ac280)
at /usr/src/redhat/SOURCES/af-headers/ngx_af_headers_module.c:768
#8 0x00000000004793e6 in clweb_c_body_filter (r=0x601f380,
in=0x29ac280) at
/usr/src/redhat/SOURCES/content-parser-module/ngx_mod_content_parser.c:510
#9 0x0000000000479d28 in ngx_http_gunzip_body_filter (r=0x601f380,
in=0x29ac280) at
/usr/src/redhat/SOURCES/gunzip/ngx_http_gunzip_filter_module.c:323
#10 0x000000000047ca5d in ngx_subr_body_filter (r=0x601f380,
in=0x1f5f830) at
/usr/src/redhat/SOURCES/ngx_subr_module/ngx_subr_module.c:219
#11 0x000000000047d46d in ngx_subr_body_filter (r=0x601f380,
in=0x1f5f830) at
/usr/src/redhat/SOURCES/ngx_subr_all_module/ngx_subr_all_module.c:219
#12 0x000000000040ba99 in ngx_output_chain (ctx=0x5e958c0, in=0x61c0ff8)
at src/core/ngx_output_chain.c:65
#13 0x000000000043bbf7 in ngx_http_copy_filter (r=0x601f380,
in=0x1f5f830) at src/http/ngx_http_copy_filter_module.c:141
#14 0x000000000044bdd1 in ngx_http_range_body_filter (r=0x343ea0c9c0,
in=0x61c0ff8) at src/http/modules/ngx_http_range_filter_module.c:551
#15 0x000000000042ed92 in ngx_http_output_filter (r=0x601f380,
in=0x1f5f830) at src/http/ngx_http_core_module.c:1868
#16 0x00000000004463fa in ngx_http_upstream_process_non_buffered_request
(r=0x601f380, do_write=) at
src/http/ngx_http_upstream.c:2381
#17 0x00000000004468fc in
ngx_http_upstream_process_non_buffered_upstream (r=0x601f380,
u=0x6096858) at src/http/ngx_http_upstream.c:2352
#18 0x00000000004457f6 in ngx_http_upstream_handler (ev=0x148e568) at
src/http/ngx_http_upstream.c:917
#19 0x00000000004269be in ngx_epoll_process_events (cycle=0x796480,
timer=, flags=) at
src/event/modules/ngx_epoll_module.c:635
#20 0x000000000041e3c8 in ngx_process_events_and_timers (cycle=0x796480)
at src/event/ngx_event.c:245
#21 0x0000000000425203 in ngx_worker_process_cycle (cycle=0x796480,
data=) at src/os/unix/ngx_process_cycle.c:800
#22 0x0000000000423967 in ngx_spawn_process (cycle=0x796480,
proc=0x425118 <ngx_worker_process_cycle>, data=0x0, name=0x4d01ce
“worker process”, respawn=-3) at src/os/unix/ngx_process.c:196
#23 0x000000000042478c in ngx_start_worker_processes (cycle=0x796480,
n=12, type=-3) at src/os/unix/ngx_process_cycle.c:360
#24 0x0000000000425964 in ngx_master_process_cycle (cycle=0x796480) at
src/os/unix/ngx_process_cycle.c:136
#25 0x0000000000408dba in main (argc=22, argv=0x795060) at
src/core/nginx.c:405

dmesg says:
nginx[20500]: segfault at 61c1000 ip 000000343ea0286d sp
00007fff348eec78 error 4 in libz.so.1.2.3[343ea00000+14000]

so libz is going to read smth at bad address.
This is definetly not nginx bug, but it is not simple to debug nginx, so
I will be happy for any advice, what can cause nginx to crash? May be
its stack problems?

Server info: Centos linux 5.6, amd64, nginx 1.0.6, zlib-devel-1.2.3-3

Posted at Nginx Forum:

artemg · October 17, 2011, 10:27am

On Mon, Oct 17, 2011 at 12:27:19AM -0400, artemg wrote:

#1 0x0000000000450f3f in ngx_http_gzip_filter_add_data (r=0x601f380,
in=0x29ac280) at src/http/modules/ngx_http_sub_filter_module.c:188
/usr/src/redhat/SOURCES/ngx_subr_module/ngx_subr_module.c:219
in=0x1f5f830) at src/http/ngx_http_core_module.c:1868
src/event/modules/ngx_epoll_module.c:635
src/os/unix/ngx_process_cycle.c:136
its stack problems?

Server info: Centos linux 5.6, amd64, nginx 1.0.6, zlib-devel-1.2.3-3

It seems your module errorneously overrides memory area that corresponds
to ctx->zstream.next_in or ctx->zstream.avail_in values:

    ctx->crc32 = crc32(ctx->crc32, ctx->zstream.next_in,
                       ctx->zstream.avail_in);

–
Igor S.

artemg · October 19, 2011, 2:20am

Thanks for the answer, but its difficult to understand how can that
happen. I even compare output of modified (by my module) response and
unmodified - they are the same, with the only difference that data in
chains is filled in another way. I.e. input data lengths are 100-200-100
bytes (.last - .pos) and after my content rewriting module they can be
0-0-400, or even 0-0-0 chain, and data can be added in next chain. I am
doing buffering to match some patterns. In fact of this I see in error
log:

[alert] 31974#0: *98657 zero size buf in writer t:1 r:1 f:0
0000000001714FE0 0000000001F8D290-0000000001F8D290 0000000000000000 0-0
while sending to client

Is that ok, to have zero size bufs, or I need to modify the chain, not
to pass them forward to other modules?

Posted at Nginx Forum:

artemg · October 19, 2011, 10:20am

agentzh, thanks for the answer, seems the problem is really with zero
size bufs. I changed code to insert spaces, if buffer is empty and
everything start to work with gzip disabled (before that nothing worked
with gzip off), and I think there will be no crashes in gzip now if
enabled. Now I will create my own chains to pass to downstream filters.

I didn’t try valgrind because usually it consumes a lot more cpu, and
this is unacceptable on staging machine(there will be high cpu load, or
I will need to decrease number of requests so I will wait for crash more
time). But thanks for reminding about it.

Posted at Nginx Forum:

artemg · October 19, 2011, 5:25am

On Wed, Oct 19, 2011 at 8:20 AM, artemg [email protected] wrote:

Passing zero-size non-special bufs to the downstream output filters is
surely BAD. You have to fix it

Also, using tools like valgrind’s memcheck to find memory issues in
your modules is highly recommended and can often save you huge number
of hours of debugging

Regards,
-agentzh

artemg · October 20, 2011, 1:53pm

On Thu, Oct 20, 2011 at 7:17 PM, artemg [email protected] wrote:

By the way passing zero size last_buf is ok, as I understand? What do
you mean by “non-special bufs” ?

A buf with last_buf set is “special”. Check out the ngx_buf_special
macro definition in nginx core’s src/core/ngx_buf.h:

#define ngx_buf_special(b)
      \
    ((b->flush || b->last_buf || b->sync)
      \
     && !ngx_buf_in_memory(b) && !b->in_file)

Regards,
-agentzh

artemg · October 20, 2011, 1:17pm

By the way passing zero size last_buf is ok, as I understand? What do
you mean by “non-special bufs” ?

Posted at Nginx Forum: