Forum: Rails deployment Mongrel hanging with high CPU usage

Posted by Josh French (jfrench)
on 2008-06-19 01:31
(Received via mailing list)
I'm running a mongrel cluster behind Apache + mod_proxy. Several times
a day, one Mongrel will spike to 99-101% CPU usage and freeze there.
The standard mongrel_rails restart command won't affect it, nor will
it respond to anything short of a kill -6. Memory usage remains low
when this happens. There doesn't seem to be a pattern behind it;
sometimes it'll happen several times in quick succession, other times
it'll go for hours without a problem. Usage is pretty light, as the
site is pre-launch and just in use by a few testers. It's not tied to
any specific action like file uploads, nor any particular URL. Even
when I hit it with a heavy stress test via apache bench I am unable to
duplicate the problem reliably.

So far I've tried the following: reinstalling the native C MySQL gem
(v2.7); checking open file descriptors (lsof generally reports about
60 files under the hung process); strace and gdb (no glaring errors
jump out at me, but I haven't used these tools in the past so I may be
missing something); setting SetEnv force-proxy-request-1.0 1 and proxy-
nokeepalive 1; setting ActiveRecord::Base.verification_timeout to a
lower setting; and leaving my shoes by the door for the server gnomes
to fill with candy. Nothing seems to have worked.

My environment is:

Mongrel 1.1.5
MongrelCluster 1.0.5
Apache 2.2.3, using mod_proxy
Ruby 1.8.6
RHEL 5.1

Has anyone seen this behavior before? Does anyone have any other debug
tips that might be useful? At this point I'm pretty lost, and not sure
at all if the problem is in my application or in the server stack.
Thanks for any insight you might have.

j
Posted by Ezra Zygmuntowicz (Guest)
on 2008-06-19 01:40
(Received via mailing list)
On Jun 18, 2008, at 4:30 PM, Josh French wrote:

> when I hit it with a heavy stress test via apache bench I am unable to
> to fill with candy. Nothing seems to have worked.
> tips that might be useful? At this point I'm pretty lost, and not sure
> at all if the problem is in my application or in the server stack.
> Thanks for any insight you might have.

  Next time this happens get the PID of the errant process and run this
one it:

$ strace -p <PID>

  And make a pastie of some of the output. Try to see what it's stuck
on doing.

Cheers-


- Ezra Zygmuntowicz
-- Founder & Software Architect
-- ezra@engineyard.com
-- EngineYard.com
Posted by Tom Copeland (Guest)
on 2008-06-19 04:03
(Received via mailing list)
On Wed, 2008-06-18 at 16:30 -0700, Josh French wrote:
> I'm running a mongrel cluster behind Apache + mod_proxy. Several times
> a day, one Mongrel will spike to 99-101% CPU usage and freeze there.
> The standard mongrel_rails restart command won't affect it, nor will
> it respond to anything short of a kill -6. 

Do what Ezra says and use strace, but in the meantime, you can use "god"
or "monit" to monitor the process and restart it when this happens.
Certainly better to track down the root cause if possible, though...

Yours,

Tom
Posted by Josh French (jfrench)
on 2008-06-19 04:34
(Received via mailing list)
Thanks for the tips. I've been using god, but when one of the Mongrels
gets into this unresponsive state a standard mongrel_rails
cluster::restart on the port in question fails to restart the process.
I have to shell in and issue a kill with -6 or -9.  I can't rely on a
restart condition on CPU usage, as usage remains within normal bounds
right up until it jumps to 100% and jams.

I've also run strace on the process. While I'm not exactly sure how to
interpret it, I'm not seeing any obvious errors (system calls
returning -1 for instance.) Naturally this is one of the times when it
runs fine for hours, but I'll pastie what I find when it happens next.
Any general pointers on what sort of things I should be looking for in
there?

Thanks,
Josh
Posted by Ezra Zygmuntowicz (Guest)
on 2008-06-19 04:57
(Received via mailing list)
On Jun 18, 2008, at 7:34 PM, Josh French wrote:

> returning -1 for instance.) Naturally this is one of the times when it
> runs fine for hours, but I'll pastie what I find when it happens next.
> Any general pointers on what sort of things I should be looking for in
> there?
>
> Thanks,
> Josh


  Generally if you catch a process that is spinning at 100% cpu it will
be stuck in some kind of loop, so catching it in action is important.
Lot's of times it will be looping and blocking on the database or some
C ext gone off into the weeds to die. So seeing what strings it might
be writing to any files or sockets can help trace it down to a section
of code sometimes.

Cheers-
- Ezra
Posted by Aníbal Rojas (Guest)
on 2008-06-19 12:55
(Received via mailing list)
Josh,

    Make God (or whatever monitoring you setup) to run $ strace -p
<PID> _when_ the CPU% spike hits and have it mailed to you, or logged.

--
Aníbal Rojas
http://hasmanydevelopers.com
http://rubycorner.com
http://anibal.rojas.com.ve
Posted by Josh French (jfrench)
on 2008-06-19 16:35
(Received via mailing list)
Thanks guys -- that was enough to point me in the right direction. (A
regexp was getting stuck on some gnarly markup.) For future Googlers,
I also found this post helpful: 
http://weblog.jamisbuck.org/2006/9/22/inspecting-a...

Thanks for your help!

Josh
Posted by Ric For (ato)
on 2009-10-06 05:35
Ezra, having same issue and this is closet forum listing I see. 
Mongrel_rails running up to 100% CPU after 10-12 hours. Restarting the 
processes with monit solves for awhile but not the real fix.

A pastie of the stack trace:

http://pastie.org/643236

Any help appreciated.

Ric





Ezra Zygmuntowicz wrote:
> On Jun 18, 2008, at 4:30 PM, Josh French wrote:
> 
>> when I hit it with a heavy stress test via apache bench I am unable to
>> to fill with candy. Nothing seems to have worked.
>> tips that might be useful? At this point I'm pretty lost, and not sure
>> at all if the problem is in my application or in the server stack.
>> Thanks for any insight you might have.
> 
>   Next time this happens get the PID of the errant process and run this
> one it:
> 
> $ strace -p <PID>
> 
>   And make a pastie of some of the output. Try to see what it's stuck
> on doing.
> 
> Cheers-
> 
> 
> - Ezra Zygmuntowicz
> -- Founder & Software Architect
> -- ezra@engineyard.com
> -- EngineYard.com
Posted by Henno Täht (henno)
on 2010-09-24 16:42
Ric For wrote:
> Ezra, having same issue and this is closet forum listing I see. 
> Mongrel_rails running up to 100% CPU after 10-12 hours. Restarting the 
> processes with monit solves for awhile but not the real fix.
> 
> A pastie of the stack trace:
> http://pastie.org/643236

Same stack trace here. Ric? Anyone?
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.