Mongrel hangs in production. gdb stack included

I am getting hanging Mongrels daily under light volume.

Mongrel Web Server 1.1.4
Apache 2.2 w/mod_proxy
RedHat EL4 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686
i386 GNU/Linux
ruby 1.8.5 (2006-08-25) [i686-linux]
fastthread (1.0.1)
Mysql 5.0.45

The mongrels are hanging with 0% CPU, their database connections are
still being reported as open by mysql. Attaching gdb to the processes
yields similar results (included below). I have no idea about what the
next steps are in figuring out what is going on – short of upgrading
to ruby 1.8.6!

Help and pointers are much appreciated.

(gdb) whe
#0 0x0058b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0065ec93 in __read_nocancel () from /lib/tls/libc.so.6
#2 0x04ccc27f in vio_read (vio=0xfffffe00, buf=0xfffffe00 <Address
0xfffffe00 out of bounds>, size=-512) at viosocket.c:44
#3 0x04ccc356 in vio_read_buff (vio=0x9088ae0,
buf=0xb5d471df "e Travel Plaza, located on I-90 eastbound between
Interchanges 48A (Pembroke) and 48 (Batavia), at milepost 397. The
renovation projects scheduled at Ulster and Pembroke travel plazas, in
addition to "…, size=4412) at viosocket.c:95
#4 0x04ccd8f6 in my_real_read (net=0x913eb10, complen=0xbfed5e98) at
net.c:804
#5 0x04ccdc30 in my_net_read (net=0x913eb10) at net.c:978
#6 0x04cc6b78 in net_safe_read (mysql=0x913eb10) at client.c:596
#7 0x04cc792f in cli_read_rows (mysql=0x913eb10,
mysql_fields=0xad6e1a0, fields=59) at client.c:1343
#8 0x04cc9881 in mysql_store_result (mysql=0x913eb10) at client.c:2697
#9 0x04ca299f in store_result (obj=3076071240) at mysql.c:677
#10 0x0805bcb4 in rb_call0 (klass=3076091220, recv=3076071240,
id=27825, oid=4294966784, argc=1, argv=0xbfed6160, body=0xb7596458,
flags=0) at eval.c:5665
#11 0x0805c017 in rb_call (klass=3076091220, recv=3076071240,
mid=27825, argc=1, argv=0xbfed6160, scope=0) at eval.c:6048
#12 0x08058127 in rb_eval (self=3076059900, n=0x113c) at ruby.h:654
#13 0x0805d99b in rb_yield_0 (val=6, self=3076059900, klass=0,
flags=0, avalue=0) at eval.c:4987
#14 0x08058475 in rb_eval (self=3076059900, n=0x113c) at eval.c:3248
#15 0x08058952 in rb_eval (self=3076059900, n=0x113c) at eval.c:3624
#16 0x0805d99b in rb_yield_0 (val=6, self=3076059900, klass=0,
flags=0, avalue=0) at eval.c:4987
#17 0x08058475 in rb_eval (self=3082341360, n=0x113c) at eval.c:3248
#18 0x0805b9ce in rb_call0 (klass=3082341260, recv=3082341360,
id=53385, oid=4294966784, argc=0, argv=0x0, body=0xb7b931f8, flags=0)
at eval.c:5954
#19 0x0805c017 in rb_call (klass=3082341260, recv=3082341360,
mid=53385, argc=0, argv=0x0, scope=0) at eval.c:6048
#20 0x08058127 in rb_eval (self=3082341360, n=0x113c) at ruby.h:654

(gdb) whe
#0 0x0058b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0065ec93 in __read_nocancel () from /lib/tls/libc.so.6
#2 0x01d6d27f in vio_read (vio=0xfffffe00, buf=0xfffffe00 <Address
0xfffffe00 out of bounds>, size=-512) at viosocket.c:44
#3 0x01d6d356 in vio_read_buff (vio=0x93e7c30,
buf=0xb6f00db4 “~\202\215^-?\213\201?QJ\237\203\201\230D?\232V
\201M8\233\224?\201\021-o\217[\200?#?\211\236\200?\203??0\213\021wC?h
\211\236j\234?\215\210F]Ì£\232\207\006Q\b\236\220\205?Dv\231t\204?
8\220\224 \203?-\214\216\236\202?$\026\210?\201?\203Y?u\221?v???\217?
jS??\215?]\206??\213?P?\235?\212\023DS\230?\210@8\210\223d\206o-?\215?
\204?$R\210z\202?\203\017??\230\236v??\026\226”, size=503704)
at viosocket.c:95
#4 0x01d6e8f6 in my_real_read (net=0x93e71a8, complen=0xbff9efa8) at
net.c:804
#5 0x01d6ec30 in my_net_read (net=0x93e71a8) at net.c:978
#6 0x01d67b78 in net_safe_read (mysql=0x93e71a8) at client.c:596
#7 0x01d6892f in cli_read_rows (mysql=0x93e71a8,
mysql_fields=0xaec8f00, fields=51) at client.c:1343
#8 0x01d6a881 in mysql_store_result (mysql=0x93e71a8) at client.c:2697
#9 0x01d4399f in store_result (obj=3071130920) at mysql.c:677
#10 0x0805bcb4 in rb_call0 (klass=3071150400, recv=3071130920,
id=27825, oid=4294966784, argc=1, argv=0xbff9f270, body=0xb70e0058,
flags=0) at eval.c:5665
#11 0x0805c017 in rb_call (klass=3071150400, recv=3071130920,
mid=27825, argc=1, argv=0xbff9f270, scope=0) at eval.c:6048
#12 0x08058127 in rb_eval (self=3071126200, n=0x7af98) at ruby.h:654

On Tue, Apr 29, 2008 at 8:31 PM, Erik M. [email protected] wrote:

I am getting hanging Mongrels daily under light volume.

#8 0x04cc9881 in mysql_store_result (mysql=0x913eb10) at client.c:2697

#8 0x01d6a881 in mysql_store_result (mysql=0x93e71a8) at client.c:2697

Whatever the problem, it relates to your app’s db connection.

Kirk H.

What does
mysql> show innodb status\G
mysql> show processlist;

show during the hang, it may be a deadlock in the db. We had similar
problems with large after_save hooks that would deadlock updating
multiple tables.

Cheers

Dave

Are you rotating your log files?

Might want to google “mongrel log rotate hang” as I believe there have
been some issues brought up here in the past.

It happens at night usually so I haven’t had a chance to run show
innodb status yet. I will say that once when it happened show
processlist still showed all of the expected connections.

I have no large hooks. Just a call to MyObject.find(:all, :include
=> :parent) that generates an SQL statement approximately 2,000-3,000
characters long. My max_packet is set to 16mb

Erik

Also are you 100% certain you are using the native C mysql-ruby
bindings and not the built into rails pure ruby ones?

-Ezra

On Apr 30, 2008, at 6:28 AM, Erik M. wrote:

What does

Mysql 5.0.45
#0 0x0058b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5 0x04ccdc30 in my_net_read (net=0x913eb10) at net.c:978
mid=27825, argc=1, argv=0xbfed6160, scope=0) at eval.c:6048
flags=0) at eval.c:5954
#3 0x01d6d356 in vio_read_buff (vio=0x93e7c30,
#5 0x01d6ec30 in my_net_read (net=0x93e71a8) at net.c:978
mid=27825, argc=1, argv=0xbff9f270, scope=0) at eval.c:6048
http://rubyforge.org/mailman/listinfo/mongrel-users


Mongrel-users mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/mongrel-users

No log rotation through ruby.

I caught it just about an hour ago and show innodb status looked about
as normal as a site under low volume could. No deadlocks.

How can be 100% certain beyond having the gem installed and having the
stack from gdb include:

#8 0x04cc9881 in mysql_store_result (mysql=0x913eb10) at client.c:
2697
#8 0x01d6a881 in mysql_store_result (mysql=0x93e71a8) at client.c:
2697

Thanks

Erik

What’s your global mysql interactive_timeout set to? This is the
timeout after which mysql will close any inactive connections. The
default is 28800 (8 hours). Try setting it to 24 hours or 1 week.
It’s probable that your app is timed out waiting on a closed mysql
connection.

This is not really related to mongrel anymore, but…

The stack trace points to the C-version of the mysql-ruby library
(mysql.c -> store_result). it looks like it’s hanging in the libmysql
C-library provided (or compiled) with your mysql client installation
in mysql_store_result. This points to either a badly compiled
libmysql (unlikely) or some other issue with how your database is
configured.

As mentioned, use show engine innodb status\G – it’ll show you any
previous deadlocks even after they’ve occurred. if you’ve got one in
your app, you’ll see it right away. This only applies if your tables
are InnoDB (mysql defaults to MyISAM).

Also, if you’re doing MyObject.find(:all, …) without conditions and
a huge include, and you’re using MyISAM tables, you’ll be triggering a
huge table locks (one for my_objects and one for parents) every time
you run that query which will stop up any other reads or writes
causing a slow down in your app and potentially causing deadlocks.

First, you should try to refactor your code so it doesn’t need to pull
in a huge dataset potentially causing lots of locks. If you’re
crunched and you really need to do this, try switching to InnoDB
should help if you’re not there already. Also:

  • make sure that the :parent association has an index set on the join
    column
  • do the read on a replicated slave
  • and/or, run “SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED”
    before the query (set it back to “REPEATABLE READ” after you’re done).
    this will prevent your select from locking tables when reading, but
    will potentially give you stale data. i think it only works with
    innodb, but i’m not sure.

Thanks for your response. I agree that it is not a Mongrel issue. All
of our tables are innodb. The site is an “enterprisesy” site so it
doesn’t get a lot of traffic is real terms. However there are RSS
feeds that pull from it all day and night and Google gives it a lot of
love, so the connections are never really at risk of timing out. When
the issue has occurred a show processlist will list all of the
connections that I expect to have.

Thanks for the tips on isolation level queries. The initial exception
that leads to the hanging Mongrel occurs on a method that generates a
custom RSS feed. It mostly happens at night, but I’m not sure that
there is any relation. Anyway, I have a band aid in place with a
brutally mean Monit configuration and I will continue to troubleshoot.
Thanks!

Erik

Actually you mention RSS feeds, and that reminded me of something.
I’ve noticed that my mongrels die at night sometimes as well, and it’s
during something similar to feed creation, but it’s not the app that’s
causing the issue.

We’re doing a mysql db optimize around that time, and it ends up
locking a bunch of the tables, which in turn causes the mongrels to
hang and eventually monit restarts them.

So if it happens mainly at night, maybe there is some maintenance
going on at the same time.