My rails app has been growing in LOC, everything was running fine, until
someday (one or two weeks ago) where I pushed an update to my server:
after a random period of time, my ruby processes eat 100% of the cpu,
and the app becomes unresponsive. The problem is that I am unable to
tell which update started giving troubles.
$ netstat -anp shows connections not being properly closed between my
rails process and postgresql database, the rails app certainly is
I have yet been unable to identify the source of the problem even after:
- reinstalling on a fresh operating system (debian lenny)
- switching from connecting to postgresql through remote tcp to local
- updating nginx
- updating Rails and other gems
- updating plugins, and removing some that are not so useful
- moving from Thin instances to Nginx+Passenger
- removing suspicious and most recent lines of code that could be the
Everything works fine on my dev machine. On the production server, after
a random amount of time, it suddenly goes crazy. It’s terribly painful
to hunt down and I don’t see any new potential areas to investigate.
Recently I have been seeing a new error message from time to time but
which disappears on the next request:
A copy of XX has been removed from the module tree but is still active!
Could that be related to some memory leak that will eventually lock a
rails process at 100% cpu after some time?
Has anyone had any troubles like this? Does anyone have an idea where
the problem could come from? How to tackle the problem?
As it’s random, I can make modifications then after 6 hours be happy
thinking that it all works, then 10 minutes later it fails…