Crash (double free or corruption) on trying to proxy to localhost

I’ve currently got a pool of systems running nginx proxying to a pool of
systems running apache. I’m trying to move to running nginx locally on
the
apache hosts instead of separate systems to avoid the extra network hop,
make more-efficient use of resources, and enable some future development
(including migrating to running our application on nginx via fastcgi
instead
of apache, ideally). we’ve currently got some significant architecture
built up around apache, so converting right now is uncomfortable.

Ideally, I’d like nginx to just serve from localhost, but fail over to
the
rest of the pool when localhost in unavailable, so in my upstream I have
every server except for localhost set as ‘backup’. I’m otherwise
running
identical configurations of apache and nginx on a single system together
as
used in the rest of the two pools. This works exactly as expected,
except
that I get a few crashes of nginx workers every minute. This only
happens
when proxying to the local system. If I proxy anywhere else, it works
fine. Other proxies can serve from this system without trouble. I see
this
same behaviour on other hosts when I build them the same way, so it’s
not an
error with the host. I see this crash on 0.7.65, 0.8.54, and 1.0.0,
running
on Ubuntu 10.04 LTS. I see this crash whether I’m connecting to
127.0.0.1
or the host’s local IP. I see this crash whether I’m listening on *:80
or
:80. I see this crash whether I’m connecting to :80 or
running
apache on a different port and connecting to :81. I see this crash
whether
I’m running ubuntu’s “nginx-light” configuration, or their “nginx-full”
configuration. I see no errors logged from apache.

  1. I’d really love to make this work, so if there’s anything else I can
    try,
    any additional debugging information I can give, I’d appreciate it.
  2. Nginx has been very useful to me so far, so I thought you’d
    appreciate a
    bug report.

Posted on github, I have a problem description, section of a debug log,
my
(slightly edited: flattened includes and stripped an IP) nginx.conf, a
gdb
backtrace, and some additional information I was asked for when looking
for
help on IRC. This is everything I’ve been able to come up with that
sounds
plausibly relevant.

Any help?

(Thanks to MTechnology and kolbyjack for helping me troubleshoot this on
IRC)

Apparently these crashed don’t happen with the upstreamfair module, as
far
as I’ve been able to tell so far. That’s probably good enough as a
workaround for me for now, and may be a clue when investigating this
crash.

Hello!

On Sun, May 01, 2011 at 05:54:17PM -0700, Stephen Weeks wrote:

every server except for localhost set as ‘backup’. I’m otherwise running
apache on a different port and connecting to :81. I see this crash whether
backtrace, and some additional information I was asked for when looking for
help on IRC. This is everything I’ve been able to come up with that sounds
plausibly relevant.

nginx crash report · GitHub

Any help?

Could you please provide:

  1. nginx -V output

  2. Full debug log for ‘*60’ connection (the one which triggered
    abort in glibc), the one you provided contatins only last part of
    the connection in question. Running grep -F ’ 27772#0: *60 ’ on
    original debug log should produce something useable.

Maxim D.

Sure! I’ve added it to the post on github. I’ve slightly edited the
log
out of paranoia (replacing a customer ID and auth key).

I can confirm now that I ran this proxy with the upstreamfair module
overnight and it didn’t crash at all. I can’t get it to preferentially
serve to localhost, though, as it doesn’t support ‘backup’ as a server
attribute, and doesn’t seem to really use the weights, so that’s
suboptimal.

Anything else I can add to help troubleshooting this? Anything you’d
like
from the core dump?

I should be able to try this out tonight (>12 hours from now). I’ll let
you
know how it works. Thanks for looking into this for me.

Hello!

On Mon, May 02, 2011 at 04:45:13PM -0700, Stephen Weeks wrote:

Sure! I’ve added it to the post on github. I’ve slightly edited the log
out of paranoia (replacing a customer ID and auth key).

I can confirm now that I ran this proxy with the upstreamfair module
overnight and it didn’t crash at all. I can’t get it to preferentially
serve to localhost, though, as it doesn’t support ‘backup’ as a server
attribute, and doesn’t seem to really use the weights, so that’s suboptimal.

Anything else I can add to help troubleshooting this? Anything you’d like
from the core dump?

Ok, thank you, it looks like I see the problem.

Allocation for “tried” flags doesn’t take into account number of
backup servers, and if there are more backup servers than normal
ones (and backup servers are in fact used) - this may cause memory
corruption.

Please try the attached patch.

Maxim D.

apache hosts instead of separate systems to avoid the extra network hop,
identical configurations of apache and nginx on a single system together
error with the host. I see this crash on 0.7.65, 0.8.54, and 1.0.0,
configuration. I see no errors logged from apache.
(slightly edited: flattened includes and stripped an IP) nginx.conf, a


nginx mailing list
[email protected]
nginx Info Page