Mongrel using way more memory on production than staging. Any ideas why?

I’ve been trying to track down the culprit of erratic behaviour and
crashes on my production server (which is split into a number of Xen
instances), so set up a staging server so that I could really try to get
to the bottom of it.

The staging server (also split with Xen) is set up pretty much
identically as far as the mongrel_cluster server is concerned (the
production box has two servers/instances for mongrel_cluster each with 4
mongrels per cluster vs one server on the staging box with 2 mongrels
per cluster – otherwise they are the same: both Debian Etch, same gems,
same Ruby - 1.8.5, same mongrel versions - 1.0.1, same mongrel_cluster -
1.0.2). Oh, and the data on the staging one is a clone of the production
one.

On the production box, pretty much immediately (and I’m talking about
within one or two requests), the mongrels climb up to about 150-160MB.
One the staging server, even when I’m hammering it with a benchmarking
suite (have tried with httperf, and bench and crawl) the mongrels sit
comfortably at about 60MB each.

Here’s some stats from top:

Production server:

top - 08:38:43 up 22 days, 13:21, 1 user, load average: 0.24, 0.23,
0.15
Tasks: 33 total, 3 running, 30 sleeping, 0 stopped, 0 zombie
Cpu(s): 16.6%us, 24.5%sy, 0.0%ni, 51.0%id, 0.0%wa, 0.0%hi, 3.6%si,
4.3%st
Mem: 524288k total, 384528k used, 139760k free, 1576k buffers
Swap: 262136k total, 1352k used, 260784k free, 19644k cached

and from ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT
START TIME COMMAND
mongrel 20387 2.6 18.5 148780 97228
? Sl 08:36 0:11 /usr/bin/ruby1.8
/usr/bin/mongrel_rail
mongrel 20390 2.8 19.4 153396 101940
? Sl 08:36 0:13 /usr/bin/ruby1.8
/usr/bin/mongrel_rail
mongrel 20393 4.1 19.6 154356 102936
? Sl 08:36 0:18 /usr/bin/ruby1.8
/usr/bin/mongrel_rail
mongrel 20396 4.7 18.8 150124 98576
? Rl 08:36 0:21 /usr/bin/ruby1.8
/usr/bin/mongrel_rail

And here’s the corresponding staging server ones:

top - 10:00:55 up 15:22, 1 user, load average: 0.00, 0.01, 0.15
Tasks: 29 total, 2 running, 27 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
1.0%st
Mem: 262336k total, 203064k used, 59272k free, 1156k buffers
Swap: 262136k total, 4k used, 262132k free, 58636k cached

USER PID %CPU %MEM VSZ RSS TTY STAT
START TIME COMMAND
mongrel 3617 7.2 16.8 58808 44296
? Sl 09:28 2:28 /usr/bin/ruby1.8
/usr/bin/mongrel_rail
mongrel 3620 7.1 16.7 58424 43912
? Sl 09:28 2:26 /usr/bin/ruby1.8
/usr/bin/mongrel_rail

Anybody got any suggestions, or at least thoughts on tracking down the
issues?

Thanks,
Chris

On Oct 2, 2007, at 11:08 AM, Chris T wrote:

The staging server (also split with Xen) is set up pretty much
identically

On the production box, pretty much immediately (and I’m talking about
within one or two requests), the mongrels climb up to about 150-160MB.
One the staging server, even when I’m hammering it with a benchmarking
suite (have tried with httperf, and bench and crawl) the mongrels sit
comfortably at about 60MB each.

Same box, hardware+kernel wise (32 vs 64 bit)? Libraries built the
same way?

JS

The staging server (also split with Xen) is set up pretty much
identically

On the production box, pretty much immediately (and I’m talking about
within one or two requests), the mongrels climb up to about 150-160MB.
One the staging server, even when I’m hammering it with a benchmarking
suite (have tried with httperf, and bench and crawl) the mongrels sit
comfortably at about 60MB each.

Your issue may be related to the amount of data in your production
database. In our codebase we’re constantly be hit by AR calls that bring
the whole table into memory, then filter out the bits they don’t want
rails
side.

In development we don’t notice the bloat or delay but under a production
dataset the effect was pronounced.

Cheers

Dave

Dave C. wrote:

The staging server (also split with Xen) is set up pretty much
identically

On the production box, pretty much immediately (and I’m talking about
within one or two requests), the mongrels climb up to about 150-160MB.
One the staging server, even when I’m hammering it with a benchmarking
suite (have tried with httperf, and bench and crawl) the mongrels sit
comfortably at about 60MB each.

Your issue may be related to the amount of data in your production
database. In our codebase we’re constantly be hit by AR calls that bring
the whole table into memory, then filter out the bits they don’t want
rails
side.

In development we don’t notice the bloat or delay but under a production
dataset the effect was pronounced.

Cheers

Dave

The staging server is using a copy of the production database, so that
can’t be it (though I thought it might be before setting up the staging
server).

Johan Sørensen wrote:

comfortably at about 60MB each.

Same box, hardware+kernel wise (32 vs 64 bit)? Libraries built the
same way?

JS

No, the production box is Athlon64 X2 3800, the staging one i386
(actually a Pentium 4 I had lying around). However the ruby libraries
were built the same way:

sudo apt-get install ruby1.8 libzlib-ruby rdoc irb ruby1.8-dev
libopenssl-ruby1.8

On production:
#ruby -v
ruby 1.8.5 (2006-08-25) [x86_64-linux]

On staging:
#ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]

On 10/2/07, Chris T [email protected] wrote:

One the staging server, even when I’m hammering it with a benchmarking
(actually a Pentium 4 I had lying around). However the ruby libraries
Ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]

I have not personally encountered this issue, however I also do not use
apt-get to install anything in the Ruby application stack. I compile
everything by hand. I’d recommend trying that next (compile-install Ruby
and
RubyGems) on the production server.

Most production servers I work with are also 64 bit Linux boxes and as I
said, I compile everything and haven’t seen this issue.

I hope that this helps,

~Wayne

Chris T wrote:

On staging:
#ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]

This explains it well, pointers are 8 byte long on 64 bit while only 4
byte long on 32 bit. Each ruby object is essentially a pointer (plus
some
other magic encoded in the byte alignment and misc other data). A ruby
process is bound to be more heavy on 64 bit.

You can use a 64 bit db server and keep the application servers 32, this
should give some benefits while trading in some uniformity.

Dee Z. wrote:

Chris T wrote:

On staging:
#ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]

This explains it well, pointers are 8 byte long on 64 bit while only 4
byte long on 32 bit. Each ruby object is essentially a pointer (plus
some
other magic encoded in the byte alignment and misc other data). A ruby
process is bound to be more heavy on 64 bit.

You can use a 64 bit db server and keep the application servers 32, this
should give some benefits while trading in some uniformity.

OK. I think I get this. How do i ensure the app servers are 32 bit. Do I
need to use a 32-bit kernel when creating up the instances?

Wayne E. Seguin wrote:

On 10/2/07, Chris T [email protected] wrote:

One the staging server, even when I’m hammering it with a benchmarking
(actually a Pentium 4 I had lying around). However the ruby libraries
Ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]

I have not personally encountered this issue, however I also do not use
apt-get to install anything in the Ruby application stack. I compile
everything by hand. I’d recommend trying that next (compile-install Ruby
and
RubyGems) on the production server.

Most production servers I work with are also 64 bit Linux boxes and as I
said, I compile everything and haven’t seen this issue.

I hope that this helps,

~Wayne

I may try that – set up a test instance same as other ones but with
Ruby compiled by hand. Did it once before and seem to remember it wasn’t
too bad.

I really doubt the pointers are giving you 120% more memory usage.
Overhead from switching from 32bit to 64bit addressing is usually
about 15%.

Evan

On 10/2/07, Chris T [email protected] wrote:

On the production box, pretty much immediately (and I’m talking about
within one or two requests), the mongrels climb up to about 150-160MB.
One the staging server, even when I’m hammering it with a benchmarking
suite (have tried with httperf, and bench and crawl) the mongrels sit
comfortably at about 60MB each.

It is the difference between x86_64 and i386, mostly.

Some of it may also be that you are not hitting Mongrel hard enough in
staging. An overloaded Mongrel with default --num-procs setting of
1024 should eventually allocate enough memory for 1024 request handler
threads, which takes even a skeleton Rails app above 60 Mb level.


Alexey V.
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]
RubyWorks [http://rubyworks.thoughtworks.com]

This is something like what I use, modify to your own designs:

mkdir ~/src

package=ruby

cd ~/src && curl -O
ftp://ftp.ruby-lang.org/pub/ruby/stable-snapshot.tar.gz

tar zxf stable-snapshot.tar.gz && cd ruby && ./configure
–prefix=/usr/local
–disable-pthread --with-readline-dir=/usr/local

make && sudo make install && cd ~/src

Wayne E. Seguin wrote:

This is something like what I use, modify to your own designs:

mkdir ~/src

package=ruby

cd ~/src && curl -O
ftp://ftp.ruby-lang.org/pub/ruby/stable-snapshot.tar.gz

tar zxf stable-snapshot.tar.gz && cd ruby && ./configure
–prefix=/usr/local
–disable-pthread --with-readline-dir=/usr/local

make && sudo make install && cd ~/src

OK. Will try a few things out tomorrow and report back.

On 10/2/07, Evan W. [email protected] wrote:

I really doubt the pointers are giving you 120% more memory usage.
Overhead from switching from 32bit to 64bit addressing is usually
about 15%.

Usually, yes. And yet RSS of a Mongrel process running a skeleton
Rails app is ~20 Mb on i386 and ~32 Mb on x86_64.


Alexey V.
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]
RubyWorks [http://rubyworks.thoughtworks.com]

On Wed, 3 Oct 2007 15:12:01 +0200
Chris T. [email protected] wrote:

Update on what I’ve tried/found out so far.

http://spreadsheets.google.com/pub?key=pHDcT1cywdrovBLlL7QV8Ow

What’s funny is I’ve been stuggling with JRuby R. applications
running under any application server requring 2G (yes gigabytes) of
RAM just to function. Many times we’d have to do complete restarts
about 1 or 2 times a week.

So it’s strange to hear someone say that 120M is too much. :slight_smile:


Zed A. Shaw

Chris T. wrote:

Wayne E. Seguin wrote:

This is something like what I use, modify to your own designs:

mkdir ~/src

package=ruby

cd ~/src && curl -O
ftp://ftp.ruby-lang.org/pub/ruby/stable-snapshot.tar.gz

tar zxf stable-snapshot.tar.gz && cd ruby && ./configure
–prefix=/usr/local
–disable-pthread --with-readline-dir=/usr/local

make && sudo make install && cd ~/src

OK. Will try a few things out tomorrow and report back.

Update on what I’ve tried/found out so far.

Set up a testbed xen server instance on the same box (AMD64x2) as my
mongrel clusters. It’s not quite a direct comparison as I compiled Ruby
from source, whereas the staging server was installed from binaries via
apt-get.

I then ran stand alone mongrel in the same production environment with
the same data on both server, and threw various requests at it via curl.
The results are here:

http://spreadsheets.google.com/pub?key=pHDcT1cywdrovBLlL7QV8Ow

but briefly:

– on startup (before any requests have been made), the testbed (64-bit)
server uses 87% more memory than the staging one (78MB vs 42MB)

– on the first request, both increase memory usage by 17%, and then
both gradually climb by varying percentages, though the testbed’s mem
usage is always 80-95% more than the 32-bit staging server.

– memory usage increases inconsistently, sometimes staying the same
when the requests are repeated, sometimes increasing. I’m wondering if
this could be a memory leak to do with caching (which might explain why
it wasn’t showing up when I was httperf’ing the staging server).

There’s obviously more work to be done here, but all comments and
suggestions are welcome. This is new territory for me, so I may be
missing something completely obvious.

Cheers

Chris

On 10/3/07, Chris T. [email protected] wrote:

– on startup (before any requests have been made), the testbed (64-bit)
server uses 87% more memory than the staging one (78MB vs 42MB)

This appears to be in the reasonable ballpark of what to expect, based
on my testing with a bunch of different frameworks.

– on the first request, both increase memory usage by 17%, and then
both gradually climb by varying percentages, though the testbed’s mem
usage is always 80-95% more than the 32-bit staging server.

The initial bump is probably a factor of Rails and of your
application’s object initializations.

An ongoing memory increase, though, is caused either by your
application intentionally caching things in RAM, or by something
leaking. RAM usage should stay pretty stable.

– memory usage increases inconsistently, sometimes staying the same
when the requests are repeated, sometimes increasing. I’m wondering if
this could be a memory leak to do with caching (which might explain why
it wasn’t showing up when I was httperf’ing the staging server).

It could be. It could be that something your app is using is leaking,
or that you are hitting a memory leak in Ruby. As of 1.8.6, the
Array#shift bug is fixed, but there are probably others. I
demonstrated a leak a few months ago in one of the getby methods.
gethostbyname, IIRC, and I have no doubt that if one took the time to
look into this intently, other leaky libs could be found. Do you use
rmagick? Historically, people have had a lot of memory leak issues
with it.

There’s obviously more work to be done here, but all comments and
suggestions are welcome. This is new territory for me, so I may be
missing something completely obvious.

Tracking down memory leaks in Ruby is a labor intensive process. Good
luck.

Kirk H.

Zed A. Shaw wrote:

On Wed, 3 Oct 2007 15:12:01 +0200
Chris T. [email protected] wrote:

Update on what I’ve tried/found out so far.

http://spreadsheets.google.com/pub?key=pHDcT1cywdrovBLlL7QV8Ow

What’s funny is I’ve been stuggling with JRuby R. applications
running under any application server requring 2G (yes gigabytes) of
RAM just to function. Many times we’d have to do complete restarts
about 1 or 2 times a week.

So it’s strange to hear someone say that 120M is too much. :slight_smile:


Zed A. Shaw

The problem isn’t 120M – but that it seems to keep climbing until much
instability ensues…

That’s possibly a leak, but I’m still a bit confused as to why the
64-bit server is using so much more… and whether I should maybe run
the mongrel_cluster instances on 32-bit kernel (assuming that’s
possible).

Kirk H. wrote:

On 10/3/07, Chris T. [email protected] wrote:

– on startup (before any requests have been made), the testbed (64-bit)
server uses 87% more memory than the staging one (78MB vs 42MB)

This appears to be in the reasonable ballpark of what to expect, based
on my testing with a bunch of different frameworks.

So just to clarify, the 80%+ is not uncommon for 64-bit vs 32-bit. Any
benefits (as far as mongrel is concerned) to being on 64-bt, or (if it’s
possible) is it worth running the app servers on 32-bit kernels?

– on the first request, both increase memory usage by 17%, and then
both gradually climb by varying percentages, though the testbed’s mem
usage is always 80-95% more than the 32-bit staging server.

The initial bump is probably a factor of Rails and of your
application’s object initializations.

That’s what I figured

An ongoing memory increase, though, is caused either by your
application intentionally caching things in RAM, or by something
leaking. RAM usage should stay pretty stable.

The caching is all done using fragment caching (on disk – haven’t yet
investigated memcached).

– memory usage increases inconsistently, sometimes staying the same
when the requests are repeated, sometimes increasing. I’m wondering if
this could be a memory leak to do with caching (which might explain why
it wasn’t showing up when I was httperf’ing the staging server).

It could be. It could be that something your app is using is leaking,
or that you are hitting a memory leak in Ruby. As of 1.8.6, the
Array#shift bug is fixed, but there are probably others. I
demonstrated a leak a few months ago in one of the getby methods.
gethostbyname, IIRC, and I have no doubt that if one took the time to
look into this intently, other leaky libs could be found. Do you use
rmagick? Historically, people have had a lot of memory leak issues
with it.

I’d read about the Array#shift bug and had removed it everywhere but one
place (must do that), but doing a search of the app see that it’s used
elsewhere, including a number of plugins and rails itself.

I ditched rmagick some time ago due to concerns about it and no use
ImageScience.

There’s obviously more work to be done here, but all comments and
suggestions are welcome. This is new territory for me, so I may be
missing something completely obvious.

Tracking down memory leaks in Ruby is a labor intensive process. Good
luck.

Any suggestions for best practice/howtos/etc?

Kirk H.

On Wed, 3 Oct 2007 20:52:20 +0200
Chris T. [email protected] wrote:

Zed A. Shaw wrote:
The problem isn’t 120M – but that it seems to keep climbing until much
instability ensues…

That’s possibly a leak, but I’m still a bit confused as to why the
64-bit server is using so much more… and whether I should maybe run
the mongrel_cluster instances on 32-bit kernel (assuming that’s
possible).

I haven’t tracked all the different things you’ve done, but have you
tried:

  1. Running with mongrel_rails -B and looking in the log/mongrel_debug/*
    files? Specifically objects.log
  2. Trying bleakhouse?
  3. Running it on jruby? See if you still have the leak there. If
    you’ve got a leak under jruby then it’s your code dude.

Apart from that, I’ve got no idea. Last time I dealt with this crap
with the horrible Ruby GC implementation the entire Ruby world took out
torches and chased me down the street screaming that I was ruining their
party be exposing how crap the code is.

But hey, that’s just me.


Zed A. Shaw