Mongrel using way more memory on production than staging. Any ideas why?

cwt137 · October 3, 2007, 10:26pm

On 10/3/07, Zed A. Shaw [email protected] wrote:

Apart from that, I’ve got no idea. Last time I dealt with this crap with the
horrible Ruby GC implementation the entire Ruby world took out
torches and chased me down the street screaming that I was ruining
their party be exposing how crap the code is.

Hey now. I was one of those torch carriers, but all I was worried
about was making sure we were pointing in the right direction in
identifying the real memory leak instead of just villifying poor,
innocent Mutex. In the end, it was the influence of that dastardly
Array#shift that had turned Mutex into the problem causing bad boy.

What’s really appalling is how long it took after that time before
Array#shift was actually fixed in a ruby release. It should have been
fixed in 1.8.5.

I suspect that the leak in the gethostbyname code is a similar sort of
sloppiness that had been overlooked (and probably still exists in
1.9).

Chris, you asked about suggestions on how track down memory leaks.
What Zed said. In addition, you can manually write code to check
ObjectSpace object counts. If you suspect that the problem is
actually at Ruby or C/C++ extension level, you can also use a tool
like valgrind to analyze running code and see if you can pinpoint
anything that is actualy a problem.

Kirk H.

cwt137 · October 3, 2007, 10:32pm

The problem isn’t 120M – but that it seems to keep climbing until much
instability ensues…

That’s possibly a leak, but I’m still a bit confused as to why the
64-bit server is using so much more… and whether I should maybe run
the mongrel_cluster instances on 32-bit kernel (assuming that’s
possible).

Could try googl’ing for ruby memory profiler
and possibly (if desperate) running ruby with valgrind.
GL!
-Roger

cwt137 · October 4, 2007, 8:43am

On Wed, 3 Oct 2007 20:52:20 +0200, Chris T. wrote:

That’s possibly a leak, but I’m still a bit confused as to why the
64-bit server is using so much more… and whether I should maybe run
the mongrel_cluster instances on 32-bit kernel (assuming that’s
possible).

On the Java side of the world I have an app that consumes 40% more
memory running on 64-bit over 32-bit. This is under 64-bit linux. Our
JVMs average around 5G so 32-bit kernels aren’t really an option.
(32-bit Java has a 2G per jvm cap).

Corey

cwt137 · October 3, 2007, 10:59pm

Kirk H. wrote:

On 10/3/07, Zed A. Shaw [email protected] wrote:

Apart from that, I’ve got no idea. Last time I dealt with this crap with the
horrible Ruby GC implementation the entire Ruby world took out
torches and chased me down the street screaming that I was ruining
their party be exposing how crap the code is.

Hey now. I was one of those torch carriers, but all I was worried
about was making sure we were pointing in the right direction in
identifying the real memory leak instead of just villifying poor,
innocent Mutex. In the end, it was the influence of that dastardly
Array#shift that had turned Mutex into the problem causing bad boy.

What’s really appalling is how long it took after that time before
Array#shift was actually fixed in a ruby release. It should have been
fixed in 1.8.5.

I suspect that the leak in the gethostbyname code is a similar sort of
sloppiness that had been overlooked (and probably still exists in
1.9).

Chris, you asked about suggestions on how track down memory leaks.
What Zed said. In addition, you can manually write code to check
ObjectSpace object counts. If you suspect that the problem is
actually at Ruby or C/C++ extension level, you can also use a tool
like valgrind to analyze running code and see if you can pinpoint
anything that is actualy a problem.

Kirk H.

Thanks for all the suggestions. Will try them out, and perhaps try some
direct comparisons of 1.8.5 and 1.8.6 to see how the Array#shift pro is
affecting things.

Anybody got any thoughts re the idea of running the app servers under
32-bit kernel? May seem naive, but would seem to improve the memory
profile.