I’m running a mongrel cluster behind Apache + mod_proxy. Several times
a day, one Mongrel will spike to 99-101% CPU usage and freeze there.
The standard mongrel_rails restart command won’t affect it, nor will
it respond to anything short of a kill -6. Memory usage remains low
when this happens. There doesn’t seem to be a pattern behind it;
sometimes it’ll happen several times in quick succession, other times
it’ll go for hours without a problem. Usage is pretty light, as the
site is pre-launch and just in use by a few testers. It’s not tied to
any specific action like file uploads, nor any particular URL. Even
when I hit it with a heavy stress test via apache bench I am unable to
duplicate the problem reliably.
So far I’ve tried the following: reinstalling the native C MySQL gem
(v2.7); checking open file descriptors (lsof generally reports about
60 files under the hung process); strace and gdb (no glaring errors
jump out at me, but I haven’t used these tools in the past so I may be
missing something); setting SetEnv force-proxy-request-1.0 1 and proxy-
nokeepalive 1; setting ActiveRecord::Base.verification_timeout to a
lower setting; and leaving my shoes by the door for the server gnomes
to fill with candy. Nothing seems to have worked.
My environment is:
Apache 2.2.3, using mod_proxy
Has anyone seen this behavior before? Does anyone have any other debug
tips that might be useful? At this point I’m pretty lost, and not sure
at all if the problem is in my application or in the server stack.
Thanks for any insight you might have.