Forum: Mongrel garbage collection patch

Posted by Roger Pack (rogerdpack)
on 2008-02-06 03:46
Would it make sense to add the option for mongrel to disable GC during a
request and enable it after?  Or even run it after?
Just thinking out loud.
Thanks for Mongrel.
-Roger
Posted by Zed A. Shaw (Guest)
on 2008-02-06 07:08
(Received via mailing list)
On Wed, 6 Feb 2008 03:46:34 +0100
Roger Pack <lists@ruby-forum.com> wrote:

> Would it make sense to add the option for mongrel to disable GC during a
> request and enable it after?  Or even run it after?
> Just thinking out loud.
> Thanks for Mongrel.

Not sure, it's probably not get you far since you'd end up paying for
it later anyway.  I'd say try it for one or two actions or as a couple
of mongrel handlers and see if it helps you.  You coudl also do it
filter style that would let you just do this GC hackery for one request.

--
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/
Posted by Luis Lavena (luislavena)
on 2008-02-06 07:12
(Received via mailing list)
On Feb 6, 2008 12:46 AM, Roger Pack <lists@ruby-forum.com> wrote:
> Would it make sense to add the option for mongrel to disable GC during a
> request and enable it after?  Or even run it after?
> Just thinking out loud.

You're talking about mongrel, or mongrel_rails?

The thing with that will be force garbage collection on every request
will hit (and hard) the performance of the server.

Also, is not Mongrel job if the framework behind it (in this case,
Rails) generate so many objects that cross 8M limit set by VM (which
usually trigger GC collection).

There was a patch submitted a few days back to ruby-core about
Benchmark.realtime that reduce memory allocation and increase
performance. The problem was Rails use it extensively on every place
and several times *per request*, eating memory and reducing
performance...

Just my comments :-)

--
Luis Lavena
Multimedia systems
-
A common mistake that people make when trying to design
something completely foolproof is to underestimate
the ingenuity of complete fools.
Douglas Adams
Posted by Dave Cheney (Guest)
on 2008-02-06 07:12
(Received via mailing list)
We currently handle GC in a after_filter :cleanup for controllers that
use a lot of ram *ahem* image magic.

Cheers

Dave
Posted by Roger Pack (rogerdpack)
on 2008-02-06 07:35
> Not sure, it's probably not get you far since you'd end up paying for
> it later anyway.  I'd say try it for one or two actions or as a couple
> of mongrel handlers and see if it helps you.  You coudl also do it
> filter style that would let you just do this GC hackery for one request.

I believe the two things that have been done to combat this "sore spot" 
for rails (that it sometimes GC's more than once per request [1]) are
1) patch gc.c so that it collects less frequently [i.e. sets the 
collection frequency to be every 40MB instead of 8).  Kind of hard using 
extensions only :)
2) have rails GC only every X requests (fastcgi does this, I think).

It might make a difference in performance.
Thoughts?


[1] http://blog.pluron.com/2008/01/ruby-on-rails-i.html
Posted by Evan Weaver (eweaver)
on 2008-02-06 09:23
(Received via mailing list)
Disabling GC around the requests would guarantee that the size of the
Mongrel process will balloon to the size of all objects combined in
the most heavyweight request. Remember that the Ruby heap never
returns space to the OS. As soon as you re-enable the GC, the entire
Ruby heap (now 4-5x bigger than it normally would be) will get paged
back in to physical RAM.

As I understand it, the point of disabling the GC is to allow part of
the heap to swap out. This has no benefit if you enable the GC
after--you have to disable the GC, Kernel.fork, run the request, let
the request thread die, and then re-enable the heap in the parent.
This gets you some marginal COW benefit at the cost of having to page
out lots of useless pages while the request is running.

I'm doubtful that there is much benefit but it deserves some testing.

Incidentally the Ruby heap (and the GC) should have nothing to do with
Imagemagick. Extensions that use malloc() are a totally different
scenario.

Evan
Posted by Roger Pack (rogerdpack)
on 2008-02-06 16:46
> Disabling GC around the requests would guarantee that the size of the
> Mongrel process will balloon to the size of all objects combined in
> the most heavyweight request.

Definitely a trade-off between RAM and CPU.  I would say that those who 
are CPU not RAM bound would be most interested in it


>  Remember that the Ruby heap never
> returns space to the OS.
(well it does if you're extremely lucky and a heap chunk happens to be 
entirely freed of its ruby objects, but, since the heap chunks are 
allocated in larger and larger blocks, this is unlikely and, as you 
noted, probably close to 'never')
> As soon as you re-enable the GC, the entire
> Ruby heap (now 4-5x bigger than it normally would be) will get paged
> back in to physical RAM.
Right.  No savings in terms of RAM being swapped out (except using the 
method you suggested below).  I believe the main advantage to GC'ing 
'only every so often' or 'once per request'  would be not that you use 
less RAM, but that you use keep all that (bloated) RAM in memory and 
traverse it far less frequently.  So it "might" save CPU at the expense 
of RAM.  The overhead of mongrel +rails setup is (for me) around 40MB. 
So basically ruby, every 8MB of allocated memory, is traversing the 48MB 
of memory, which sets it down to 40MB.  So for it to create 50MB of 
memory (however many requests that is) it will traverse ~5*50MB memory = 
250MB.  If you leave it to only GC after ~40MB have been allocated,  it 
traverses 100MB once (and sets it back down to 40).  As you noted it 
does use more RAM, and none of that RAM can healthfully reside in swap. 
Significant, at least for those with lots of RAM?   I don't know.

> As I understand it, the point of disabling the GC is to allow part of
> the heap to swap out. This has no benefit if you enable the GC
> after--you have to disable the GC, Kernel.fork, run the request, let
> the request thread die, and then re-enable the heap in the parent.
That might be a quite useful RAM-wise--its like 'ignoring' the garbage 
generated with a request and deserves consideration.  Nice.

> I'm doubtful that there is much benefit but it deserves some testing.
> 
> Incidentally the Ruby heap (and the GC) should have nothing to do with
> Imagemagick. Extensions that use malloc() are a totally different
> scenario.
IIRC when garbage collection begins it also requests the extensions to 
clean themselves up, too, though I'll admit I never saw or understood 
how this is accomplished within gc.c.  It's possible that it just called 
'cleanup' on extensions' (now old) ruby allocated objects, which are 
linked to their own malloc'ed objects and know how to clean them up.

> Evan

Thanks Evan.
-Roger
Posted by Evan Weaver (eweaver)
on 2008-02-07 00:13
(Received via mailing list)
Some VSZ/RSS sizes for the front page of a large application:

Fresh caches, minimal AR usage, startup:
GC.enabled: 120200 66732
GC.disabled: 293492 239900

Fresh caches, one request.
GC.enabled: 127744 73360
GC.disabled: 345296 289616

All stale caches, one request:
GC.enabled: 528536 472840
GC.disabled: 1203112 1147276

This is actually not nearly as bad for GC.disabled as I expected.

Evan
Posted by Roger Pack (rogerdpack)
on 2008-02-07 08:57
> Fresh caches, one request.
> GC.enabled: 127744 73360
> GC.disabled: 345296 289616
> This is actually not nearly as bad for GC.disabled as I expected.

How about time wise?
Posted by Roger Pack (rogerdpack)
on 2008-12-22 09:42
Evan Weaver wrote:
> Disabling GC around the requests would guarantee that the size of the
> Mongrel process will balloon to the size of all objects combined in
> the most heavyweight request. Remember that the Ruby heap never
> returns space to the OS. As soon as you re-enable the GC, the entire
> Ruby heap (now 4-5x bigger than it normally would be) will get paged
> back in to physical RAM.
> 

One thing that could be done is to force a GC every "x" requests--that 
way a GC is for sure run between requests [at least with single threaded 
rails] and that way the stack is most shallow and the GC will not pick 
up ghost references left on the stack.  But hopefully we won't have to 
worry about that soon :)
-=R
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.