Segfault in NGINX - testcase

Hello,

NGINX will segfault, if you delete the root node in an rbtree with
two elements. Testcase:
http://doppelbauer.name/nginx-testcase.c

A patch is attached below.

Thanks a lot
Markus Doppelbauer

*** a/core/ngx_rbtree.c 2010-03-19 21:07:14.342916081 +0100
— b/core/ngx_rbtree.c 2010-03-19 21:07:56.642187263 +0100
*************** ngx_rbtree_delete(ngx_thread_volatile ng
*** 187,193 ****
if (subst == *root) {
root = temp;
ngx_rbt_black(temp);
!
/
DEBUG stuff */
node->left = NULL;
node->right = NULL;
— 187,194 ----
if (subst == *root) {
root = temp;
ngx_rbt_black(temp);
! temp->parent = NULL; /

parent of root node should be NULL /
!
/
DEBUG stuff */
node->left = NULL;
node->right = NULL;

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,65693,65693#msg-65693

Hello!

On Fri, Mar 19, 2010 at 04:19:43PM -0400, double wrote:

Hello,

NGINX will segfault, if you delete the root node in an rbtree with
two elements. Testcase:
http://doppelbauer.name/nginx-testcase.c

A patch is attached below.

From what I see it’s your function ngx_rbtree_next() which
segfaults, not nginx. It tries to identify root node via checking
it’s parent against NULL - which is not correct way to do it, you
should instead compare node pointer against tree root pointer.

If you do still think that nginx is affected - please provide
another test case. The one which uses nginx, not code borrowed
from it, is preffered - see http://mdounin.ru/hg/nginx-tests/ for
some framework and test samples.

Maxim D.

Hello,

The root node of the nginx-rbtree is always NULL, because
“ngx_rbtree_insert()” provides that feature. Only if the rbtree
has 2 elements and you delete the root node, then “parent” of the
root-node points to the deleted element. Why not fixing it?

Thank you very much
Marcus

The corresponding code:

void ngx_rbtree_insert(ngx_thread_volatile ngx_rbtree_t *tree,
ngx_rbtree_node_t *node)
{
[…]
if (*root == sentinel) {
node->parent = NULL;
node->left = sentinel;
node->right = sentinel;
ngx_rbt_black(node);
*root = node;

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,65693,65847#msg-65847

Hi Maxim,

We want the same.
Thanks for pointing out.

Marcus

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,65693,66020#msg-66020

Hello!

On Sat, Mar 20, 2010 at 04:43:56AM -0400, double wrote:

Hello,

The root node of the nginx-rbtree is always NULL, because
“ngx_rbtree_insert()” provides that feature. Only if the rbtree
has 2 elements and you delete the root node, then “parent” of the
root-node points to the deleted element. Why not fixing it?

As you already pointed out, there is at least one place where
root->parent becomes non-NULL (and I’m not sure it’s the only
place where it happens). And this doesn’t cause any harm as nginx
doesn’t assume it should be NULL.

While I tend to think that it’s good idea to keep it NULL at least
with NGX_DEBUG defined (to simplify debugging) - there is no bug
here. The bug is in your tree traversal code which tries to use
assumption that root->parent == NULL. And even if your patch will
be applied (it’s up to Igor anyway) - your tree traversal code
should be fixed if you are planning to use it somewhere in
production.

Maxim D.

Hi,

With a standard (i.e. no options) installation of 0.8.34 on my Linux
machine I get a segfault if the resolver named in the conf file is a
loopback address/IP, but the resolver does not exist. Other
non-existing resolvers don’t cause a problem (they just hang, and
probably will time out), only loopback ones.

The debug log is :

2010/03/21 04:07:30 [debug] 4287#0: *3 http cl:-1 max:1048576
2010/03/21 04:07:30 [debug] 4287#0: *3 generic phase: 2
2010/03/21 04:07:30 [debug] 4287#0: *3 post rewrite phase: 3
2010/03/21 04:07:30 [debug] 4287#0: *3 generic phase: 4
2010/03/21 04:07:30 [debug] 4287#0: *3 generic phase: 5
2010/03/21 04:07:30 [debug] 4287#0: *3 access phase: 6
2010/03/21 04:07:30 [debug] 4287#0: *3 access phase: 7
2010/03/21 04:07:30 [debug] 4287#0: *3 post access phase: 8
2010/03/21 04:07:30 [debug] 4287#0: *3 http script var: “http”
2010/03/21 04:07:30 [debug] 4287#0: *3 http script copy: “://”
2010/03/21 04:07:30 [debug] 4287#0: *3 http script var: “google.com
2010/03/21 04:07:30 [debug] 4287#0: *3 http script var: “/”
2010/03/21 04:07:30 [debug] 4287#0: *3 http init upstream, client timer:
0
2010/03/21 04:07:30 [debug] 4287#0: *3 epoll add event: fd:3 op:3
ev:80000005
2010/03/21 04:07:30 [debug] 4287#0: *3 posix_memalign:
0000000001A44DD0:4096 @16
2010/03/21 04:07:30 [debug] 4287#0: *3 http script copy: "Host: "
2010/03/21 04:07:30 [debug] 4287#0: *3 http script var: “google.com
2010/03/21 04:07:30 [debug] 4287#0: *3 http script copy: "
"
2010/03/21 04:07:30 [debug] 4287#0: *3 http script copy: "Connection:
close
"
2010/03/21 04:07:30 [debug] 4287#0: *3 http proxy header: “User-Agent:
curl/7.19.5 (x86_64-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g
zlib/1.2.3.3 libidn/1.15”
2010/03/21 04:07:30 [debug] 4287#0: *3 http proxy header: “Accept: /
2010/03/21 04:07:30 [debug] 4287#0: *3 http proxy header:
“Proxy-Connection: Keep-Alive”
2010/03/21 04:07:30 [debug] 4287#0: *3 http proxy header:
"GET / HTTP/1.0
Host: google.com
Connection: close
User-Agent: curl/7.19.5 (x86_64-pc-linux-gnu) libcurl/7.19.5
OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15
Accept: /
Proxy-Connection: Keep-Alive

"
2010/03/21 04:07:30 [debug] 4287#0: *3 http cleanup add:
0000000001A44E60
2010/03/21 04:07:30 [debug] 4287#0: *3 http finalize request: -4, “/?”
a:1, c:2
2010/03/21 04:07:30 [debug] 4287#0: *3 http request count:2 blk:0
2010/03/21 04:07:30 [debug] 4287#0: *3 http run request: “/?”
2010/03/21 04:07:30 [debug] 4287#0: *3 http upstream check client, write
event:1, “/”
2010/03/21 04:07:30 [debug] 4287#0: *3 http upstream recv(): -1 (11:
Resource temporarily unavailable)
2010/03/21 04:07:30 [alert] 4275#0: worker process 4287 exited on signal
11

I’ve not got time to look into and write a patch for it right now, but
will do if no-one gets around to it at some point.

Thanks,

Marcus.

Hello!

On Sun, Mar 21, 2010 at 04:21:38AM +0200, Marcus C. wrote:

Hi,

With a standard (i.e. no options) installation of 0.8.34 on my Linux
machine I get a segfault if the resolver named in the conf file is a
loopback address/IP, but the resolver does not exist. Other
non-existing resolvers don’t cause a problem (they just hang, and
probably will time out), only loopback ones.

The debug log is :

[…]

2010/03/21 04:07:30 [debug] 4287#0: *3 http cleanup add: 0000000001A44E60
2010/03/21 04:07:30 [debug] 4287#0: *3 http finalize request: -4, “/?” a:1, c:2

Could you please provide full debug log (i.e. switched on at global
level, not
http/server/location)? E.g. between the above lines should be some
valuable
resolving information which isn’t logged in request context but in
global one
instead.

It should look like this:


2010/03/25 04:53:49 [debug] 86639#0: *1 http cleanup add: 0811F8F4
2010/03/25 04:53:49 [debug] 86639#0: malloc: 08126580:68
2010/03/25 04:53:49 [debug] 86639#0: resolve: “google.com
2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810D580:60
2010/03/25 04:53:49 [debug] 86639#0: malloc: 08133460:10
2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810AD00:28
2010/03/25 04:53:49 [debug] 86639#0: resolve: “google.com” 12007
2010/03/25 04:53:49 [debug] 86639#0: UDP socket 11
2010/03/25 04:53:49 [debug] 86639#0: connect to 127.0.0.1:53, fd:11 #2
2010/03/25 04:53:49 [debug] 86639#0: kevent set event: 11: ft:-1 fl:0025
2010/03/25 04:53:49 [debug] 86639#0: send: fd:11 28 of 28
2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810D640:60
2010/03/25 04:53:49 [debug] 86639#0: event timer add: -1:
30000:2466707655
2010/03/25 04:53:49 [debug] 86639#0: event timer add: -1:
5000:2466682655
2010/03/25 04:53:49 [debug] 86639#0: *1 http finalize request: -4, “/?”
a:1, c:2

but will do if no-one gets around to it at some point.
Config and backtrace should be helpfull too. Unfortunately I’m not able
to
reproduce the problem.

Maxim D.

Just a quick note, if someone needs (for an addon) to traverse an
rbtree,
here is the clean way:

static ngx_inline ngx_rbtree_node_t *
ngx_rbtree_next(ngx_rbtree_t *rbtree, ngx_rbtree_node_t *node)
{
ngx_rbtree_node_t *sentinel;
sentinel = rbtree->sentinel;
if( node->right != sentinel )
{
node = node->right;
while( node->left != sentinel )
node = node->left;
}
else
{
ngx_rbtree_node_t *root, *temp;
root = rbtree->root;
temp = node;
while( temp != root && node == temp->parent->right )
temp = node = temp->parent;
if( temp == root )
return sentinel;
if( node->right != temp->parent )
node = temp->parent;
}
return node;
}

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,65693,67781#msg-67781

Sry, this is better:

static ngx_inline ngx_rbtree_node_t *
ngx_rbtree_next(ngx_rbtree_t *rbtree, ngx_rbtree_node_t *node)
{
if( node->right != rbtree->sentinel )
{
node = node->right;
while( node->left != rbtree->sentinel )
node = node->left;
}
else
{
ngx_rbtree_node_t *root;
root = rbtree->root;
while( node != root && node == node->parent->right )
node = node->parent;
if( node == root )
return rbtree->sentinel;
node = node->parent;
}
return node;
}

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,65693,67808#msg-67808

Hi,

Not only loopback resolver address, but any addresses without UDP:53
port listened can trigger this segfault. I’ve confirmed the problem on
my Linux box. Here’s the backtrace of nginx-0.8.49 when segfault
happens:

#0 0x08059acd in ngx_log_error_core (level=4, log=0x810c16c, err=111,
fmt=0x80e92cd “recv() failed”) at src/core/ngx_log.c:93
#1 0x08062442 in ngx_connection_error (c=0x812c538, err=111,
text=0x80e92cd “recv() failed”) at src/core/ngx_connection.c:1015
#2 0x0806ef06 in ngx_udp_unix_recv (c=0x812c538, buf=0xbfffdcfc
“m\220”, size=4096) at src/os/unix/ngx_udp_recv.c:99
#3 0x080689b1 in ngx_resolver_read_response (rev=0x81424f8) at
src/core/ngx_resolver.c:952
#4 0x08073035 in ngx_epoll_process_events (cycle=0x810d528, timer=5000,
flags=) at src/event/modules/ngx_epoll_module.c:642
#5 0x0806baa2 in ngx_process_events_and_timers (cycle=0x810d528) at
src/event/ngx_event.c:245
#6 0x08070edc in ngx_single_process_cycle (cycle=0x810d528) at
src/os/unix/ngx_process_cycle.c:306
#7 0x08059576 in main (argc=1, argv=0xbfffef74) at src/core/nginx.c:398

The bug exists in all versions later than nginx-0.8.36 but not in
nginx-0.7.x versions.

This problem is due to the incorrectly copy of cycle->new_log when
ngx_resolver_create() initializing udp_connection->log. Because
cycle->new_log only gets initialized in ngx_init_cycle() after all
configurations have been processed, the ngx_resolver_create() will be
called BEFORE cycle->new_log has anything meaningful. And because it
COPIED cycle->new_log, udp_connection->log will always be invalid even
cycle->new_log does get initialized properly later. When resolver is
used to resolve names and failed to connect the specified nameserver,
nginx tries to log a timeout error using the invalid log structure
stored in udp_connection, and then boom…

IMHO, by changing the definition of ngx_udp_connection_t to make log
field a pointer instead of a ngx_log_t structure (also with a bunch of
related reference modifications, of course), this bug will gone without
pains.

The attached patch should fix the bug.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs