I'm running a mongrel cluster behind Apache + mod_proxy. Several times a day, one Mongrel will spike to 99-101% CPU usage and freeze there. The standard mongrel_rails restart command won't affect it, nor will it respond to anything short of a kill -6. Memory usage remains low when this happens. There doesn't seem to be a pattern behind it; sometimes it'll happen several times in quick succession, other times it'll go for hours without a problem. Usage is pretty light, as the site is pre-launch and just in use by a few testers. It's not tied to any specific action like file uploads, nor any particular URL. Even when I hit it with a heavy stress test via apache bench I am unable to duplicate the problem reliably. So far I've tried the following: reinstalling the native C MySQL gem (v2.7); checking open file descriptors (lsof generally reports about 60 files under the hung process); strace and gdb (no glaring errors jump out at me, but I haven't used these tools in the past so I may be missing something); setting SetEnv force-proxy-request-1.0 1 and proxy- nokeepalive 1; setting ActiveRecord::Base.verification_timeout to a lower setting; and leaving my shoes by the door for the server gnomes to fill with candy. Nothing seems to have worked. My environment is: Mongrel 1.1.5 MongrelCluster 1.0.5 Apache 2.2.3, using mod_proxy Ruby 1.8.6 RHEL 5.1 Has anyone seen this behavior before? Does anyone have any other debug tips that might be useful? At this point I'm pretty lost, and not sure at all if the problem is in my application or in the server stack. Thanks for any insight you might have. j
on 2008-06-19 01:31
on 2008-06-19 01:40
On Jun 18, 2008, at 4:30 PM, Josh French wrote: > when I hit it with a heavy stress test via apache bench I am unable to > to fill with candy. Nothing seems to have worked. > tips that might be useful? At this point I'm pretty lost, and not sure > at all if the problem is in my application or in the server stack. > Thanks for any insight you might have. Next time this happens get the PID of the errant process and run this one it: $ strace -p <PID> And make a pastie of some of the output. Try to see what it's stuck on doing. Cheers- - Ezra Zygmuntowicz -- Founder & Software Architect -- ezra@engineyard.com -- EngineYard.com
on 2008-06-19 04:03
On Wed, 2008-06-18 at 16:30 -0700, Josh French wrote: > I'm running a mongrel cluster behind Apache + mod_proxy. Several times > a day, one Mongrel will spike to 99-101% CPU usage and freeze there. > The standard mongrel_rails restart command won't affect it, nor will > it respond to anything short of a kill -6. Do what Ezra says and use strace, but in the meantime, you can use "god" or "monit" to monitor the process and restart it when this happens. Certainly better to track down the root cause if possible, though... Yours, Tom
on 2008-06-19 04:34
Thanks for the tips. I've been using god, but when one of the Mongrels gets into this unresponsive state a standard mongrel_rails cluster::restart on the port in question fails to restart the process. I have to shell in and issue a kill with -6 or -9. I can't rely on a restart condition on CPU usage, as usage remains within normal bounds right up until it jumps to 100% and jams. I've also run strace on the process. While I'm not exactly sure how to interpret it, I'm not seeing any obvious errors (system calls returning -1 for instance.) Naturally this is one of the times when it runs fine for hours, but I'll pastie what I find when it happens next. Any general pointers on what sort of things I should be looking for in there? Thanks, Josh
on 2008-06-19 04:57
On Jun 18, 2008, at 7:34 PM, Josh French wrote: > returning -1 for instance.) Naturally this is one of the times when it > runs fine for hours, but I'll pastie what I find when it happens next. > Any general pointers on what sort of things I should be looking for in > there? > > Thanks, > Josh Generally if you catch a process that is spinning at 100% cpu it will be stuck in some kind of loop, so catching it in action is important. Lot's of times it will be looping and blocking on the database or some C ext gone off into the weeds to die. So seeing what strings it might be writing to any files or sockets can help trace it down to a section of code sometimes. Cheers- - Ezra
on 2008-06-19 12:55
Josh,
Make God (or whatever monitoring you setup) to run $ strace -p
<PID> _when_ the CPU% spike hits and have it mailed to you, or logged.
--
Aníbal Rojas
http://hasmanydevelopers.com
http://rubycorner.com
http://anibal.rojas.com.ve
on 2008-06-19 16:35
Thanks guys -- that was enough to point me in the right direction. (A regexp was getting stuck on some gnarly markup.) For future Googlers, I also found this post helpful: http://weblog.jamisbuck.org/2006/9/22/inspecting-a... Thanks for your help! Josh
on 2009-10-06 05:35
Ezra, having same issue and this is closet forum listing I see. Mongrel_rails running up to 100% CPU after 10-12 hours. Restarting the processes with monit solves for awhile but not the real fix. A pastie of the stack trace: http://pastie.org/643236 Any help appreciated. Ric Ezra Zygmuntowicz wrote: > On Jun 18, 2008, at 4:30 PM, Josh French wrote: > >> when I hit it with a heavy stress test via apache bench I am unable to >> to fill with candy. Nothing seems to have worked. >> tips that might be useful? At this point I'm pretty lost, and not sure >> at all if the problem is in my application or in the server stack. >> Thanks for any insight you might have. > > Next time this happens get the PID of the errant process and run this > one it: > > $ strace -p <PID> > > And make a pastie of some of the output. Try to see what it's stuck > on doing. > > Cheers- > > > - Ezra Zygmuntowicz > -- Founder & Software Architect > -- ezra@engineyard.com > -- EngineYard.com
on 2010-09-24 16:42
Ric For wrote: > Ezra, having same issue and this is closet forum listing I see. > Mongrel_rails running up to 100% CPU after 10-12 hours. Restarting the > processes with monit solves for awhile but not the real fix. > > A pastie of the stack trace: > http://pastie.org/643236 Same stack trace here. Ric? Anyone?
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.