Mongrel garbage collection

Sorry, for the re-post, but I’m new to the mailing list and wanted to
bring
back up and old topic I saw in the archives.

http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html

I think a patch to delay garbage collection and run it later is pretty
important for high performance web applications. I do understand the
trade-offs of having explicit vs. implicit garbage collection running,
and
would much prefer to off-load my garbage collection until later point
(when
users are not waiting for a request).

I agree from the previous points that this could very well be
rails-specific, but isn’t this a feature that would benefit all of the
frameworks that use mongrel?

This could be easily added as a configuration option to run after N
number
of requests or let the GC behave as normal and run when needed, the
default
of course, allowing the GC when it deems necessary. Adding the
collection
would be explicit after processing a request, but before listening to
any
new requests.

  • scott

On Fri, Mar 21, 2008 at 12:12 PM, Scott W. [email protected]
wrote:

Sorry, for the re-post, but I’m new to the mailing list and wanted to bring
back up and old topic I saw in the archives.

http://rubyforge.org/pipermail/mongrel-users/2008-February/004991.html

I think a patch to delay garbage collection and run it later is pretty
important for high performance web applications. I do understand the

In the vast majority of cases you are going to do a worse job of
determining when and how often to run the GC than even MRI Ruby’s
simple algorithms. MRI garbage collection stops the world – nothing
else happens while the GC runs – so when talking about overall
throughput on an application, you don’t want it to run any more than
necessary.

I don’t use Rails, but in the past I have experimented with this quite
a lot under IOWA, and in my normal applications (i.e. not using
RMagick) I could never come up with an algorithm of self-managed
GC.disable/GC.enable/GC.start that gave the same overall level of
throughput that I got by letting Ruby start the GC according to its
own algorithms. That experience makes me skeptical of that approach
in the general case, though there are occasional specific cases where
it can be useful.

Kirk H.

On Fri, Mar 21, 2008 at 11:49 AM, Kirk H. [email protected]
wrote:

GC.disable/GC.enable/GC.start that gave the same overall level of
throughput that I got by letting Ruby start the GC according to its
own algorithms. That experience makes me skeptical of that approach
in the general case, though there are occasional specific cases where
it can be useful.

Kirk H.

I understand that the GC is quite knowledgeable about when to run
garbage
collection when examining the heap. But, the GC doesn’t know anything
about
my application or it’s state. The fact that when the GC runs everything
stops is why I’d prefer to limit when the GC will run. I’d rather it
run
outside of serving a web request rather then when it’s right in the
middle
of serving requests.

I know that the ideal situation is to not need to run the GC, but the
reality is that I’m using various gems and plugins and not all are well
behaved and free of memory leaks. Rails itself may also have regular
leaks
from time to time and I’d prefer to have my application consistently be
slow
than randomly (and unexpectedly) be slow. The alternative is to
terminate
your application after N number of requests and never run the GC, which
I’m
not a fan of.

  • scott

You’ll likely either end up using more RAM than you otherwise would
have in between GC calls, resulting in bigger processes

This is definitely true. Keep in mind that the in-struct mark phase
means that the entire process has to lurch out of swap whenever the GC
runs. Since the process is now much bigger, and the pages idled longer
and are more likely to be swapped out, that can be pretty a brutal
hit.

Evan

On Fri, Mar 21, 2008 at 1:23 PM, Scott W. [email protected]
wrote:

I understand that the GC is quite knowledgeable about when to run garbage
collection when examining the heap. But, the GC doesn’t know anything about
my application or it’s state. The fact that when the GC runs everything
stops is why I’d prefer to limit when the GC will run. I’d rather it run
outside of serving a web request rather then when it’s right in the middle
of serving requests.

It doesn’t matter, if one is looking at overall throughput. And how
long do your GC runs take? If you have a GC invocation that is
noticable on a single request, your processes must be gigantic, which
would suggest to me that there’s a more fundamental problem with the
app.

I know that the ideal situation is to not need to run the GC, but the
reality is that I’m using various gems and plugins and not all are well
behaved and free of memory leaks. Rails itself may also have regular leaks

No, it’s impractical to never run the GC. The ideal situation, at
least where execution performance and throughput on a high performance
app is concerned, is to just intelligently reduce how often it needs
to run by paying attention to your object creation. In particular,
pay attention to the throwaway object creation.

from time to time and I’d prefer to have my application consistently be slow
than randomly (and unexpectedly) be slow. The alternative is to terminate
your application after N number of requests and never run the GC, which I’m
not a fan of.

If your goal is to deal with memory leaks, then you really need to
define what that means in a GC’d language like Ruby.
To me, a leak is something that consumes memory in a way that eludes
the GC’s ability to track it and reuse it. The fundamental nature of
that sort of thing is that the GC can’t help you with it.

If by leaks, you mean code that just creates a lot of objects that the
GC needs to clean up, then those aren’t leaks. It may be inefficient
code, but it’s not a memory leak.

And in the end, while disabling GC over the course of a request may
result in processing that one request more quickly than it would have
been processed otherwise, the disable/enable dance is going to cost
you something.

You’ll likely either end up using more RAM than you otherwise would
have in between GC calls, resulting in bigger processes, or you end up
calling GC more often than you otherwise would have, reducing your
high performance app’s throughput.

And for the general cases, that’s not an advantageous situation.

To be more specific, if excessive RAM usage and GC costs that are
noticable to the user during requests is a common thing for Rails
apps, and the reason for that is bad code in Rails and not just bad
user code, then the Rails folks should be the targets of a
conversation on the matter. Mongrel itself, though, does not need to
be, and should not be playing manual memory management games on the
behalf of a web framework.

Kirk H.

At 01:19 PM 3/21/2008, [email protected] wrote:

I understand that the GC is quite knowledgeable about when to run

It doesn’t matter, if one is looking at overall throughput.

Hi Kirk,

One thought on this - would it be possible to schedule GC to run just
after all the html has been rendered to the client from Rails, but
while leaving open the connection (so that mongrel is blocked on
Rails)?

If so, it seems like if one were using something like nginx fair proxy,
then the mongrel would be running it’s garbage collection AFTER the
client got all its html but BEFORE any new requests were sent to it.

In a fully loaded server it wouldn’t matter at all, but most
environments have a little headroom at least, so that nginx fair proxy
would just route around the mongrel that is still running a GC at the
end of it’s Rails loop.

So total throughput for a given (non-max) volume of requests might be
unaffected since nothing would ever pile up behind a rails process that
has slowed down to run GC (and the client will be happy since they got
all their html before the GC started).

I have no idea if this is meaningful, but I’ve been playing with some
performance tests against mongrel + nginx fair proxy and it occurs to
me that this might be relevant…

Best,

Steve

On 22/03/2008, at 8:19 AM, Steve M. wrote:

If so, it seems like if one were using something like nginx fair
proxy,
then the mongrel would be running it’s garbage collection AFTER the
client got all its html but BEFORE any new requests were sent to it.

In a fully loaded server it wouldn’t matter at all, but most
environments have a little headroom at least, so that nginx fair proxy
would just route around the mongrel that is still running a GC at the
end of it’s Rails loop.

That would only be true if you set the connect timeout on the backend
to 1 second AND your GC pass took longer than 1 second.

The alternative is to terminate your application after N number of requests and never run the > GC, which I’m not a fan of.

WSGI (Python) can do that, and it’s a pretty nice alternative to
having Monit kill a leaky app that may have a bunch of requests queued
up (Mongrel soft shutdown not withstanding).

Evan

On Fri, Mar 21, 2008 at 1:19 PM, Kirk H. [email protected] wrote:

middle

of serving requests.

It doesn’t matter, if one is looking at overall throughput. And how
long do your GC runs take? If you have a GC invocation that is
noticable on a single request, your processes must be gigantic, which
would suggest to me that there’s a more fundamental problem with the
app.

Right now, my processes aren’t gigantic… I’m preparing for a ‘worst
case’
scenario when I have a extremely large processes or memory usage. This
can
easily happen on specific applications such as an image server (using
image
magick) or parsing/creating large xml payloads (a large REST server).
For
those applications, I may have a large amount of memory used for each
request, which will increase until the GC is run.

There may be perfectly good reasons to have intermediate object creation
(good encapsulation, usage of a another library/gem you can’t modify,
large
operations that you need to keep atomic). While ideally you’d fix the
memory usage problem, this doesn’t solve all cases.

define what that means in a GC’d language like Ruby.
To me, a leak is something that consumes memory in a way that eludes
the GC’s ability to track it and reuse it. The fundamental nature of
that sort of thing is that the GC can’t help you with it.

Yes, for Ruby (and other GC’d languages), it’s much harder to leak
memory
such that the GC can never clean it up - but it does (and has) happened.
This case I’m less concerned about as a leak of this magnitude should be
considered a bug and fixed.

If by leaks, you mean code that just creates a lot of objects that the
GC needs to clean up, then those aren’t leaks. It may be inefficient
code, but it’s not a memory leak.

Inefficient it may be - but it might be just optimizing for a different
problem. For example, take ActiveRecord’s association cache and it’s
query
cache. If you’re doing a large number of queries each page load,
ActiveRecord is still going to cache them for each request - this is far
better than further round trips to the database, but may lead to a large
amount of memory consumed per each request.

And in the end, while disabling GC over the course of a request may
result in processing that one request more quickly than it would have
been processed otherwise, the disable/enable dance is going to cost
you something.

Agreed. But again, I’d rather it be a constant cost outside of
processing a
request than a variable cost inside of processing a request.

You’ll likely either end up using more RAM than you otherwise would
have in between GC calls, resulting in bigger processes, or you end up
calling GC more often than you otherwise would have, reducing your
high performance app’s throughput.

And for the general cases, that’s not an advantageous situation.

This can vary from application to application - all the more reason to
make
this a configurable option (and not the default).

Kirk H.

I still disagree on this point - I doubt that Rails is the only web
framework that would benefit from being able to control when the GC is
run.
This is going to be a common problem across frameworks whenever web
applications are consuming then releasing large amounts of memory - I’d
say
it can be a pretty common use case for certain types of web
applications.

  • scott

On Sat, Mar 22, 2008 at 3:39 AM, Dave C. [email protected] wrote:

end of it’s Rails loop.

That would only be true if you set the connect timeout on the backend to 1
second AND your GC pass took longer than 1 second.

Yes, but worse case here is that another request gets delayed before
processing. Still potentially better (IMHO) than dealing with this
delaying
when processing a request.

  • scott

If you plan on regularly killing your application (for whatever reason),
then this is a pretty good option. This is a pretty common practice for
apache modules and fastcgi applications as a hold-over from dealing with
older leaky C apps.

I’d personally prefer for my Ruby web apps to re-run the GC rather than
have
to startup/shutdown/parse configs/connect to external resources costs,
but
it’s because they are far less likely to leak memory that the GC can’t
catch
or get into an unstable state.

  • scott

On Mon, Mar 24, 2008 at 9:21 AM, Scott W. [email protected]
wrote:

Right now, my processes aren’t gigantic… I’m preparing for a ‘worst case’
scenario when I have a extremely large processes or memory usage. This can
easily happen on specific applications such as an image server (using image
magick) or parsing/creating large xml payloads (a large REST server). For
those applications, I may have a large amount of memory used for each
request, which will increase until the GC is run.

(nod) image magick is a well known bad citizen. Either don’t use
it at all, or use it in an external process from your web app
processes.

And if, for whatever reason, you must use it inside of your web app
process, and your use case really does create processes so enormous
that you can perceive a response lag from a manual GC.start inside of
your request processing, then create a custom Rails handler that does
it. You can trivially alter it to do whatever GC.foo actions you
desire. The code is simple and easy to follow, so just make your own
Mongrel::Rails::RailsHandlerWithParanoidGCManagement.

There may be perfectly good reasons to have intermediate object creation
(good encapsulation, usage of a another library/gem you can’t modify, large
operations that you need to keep atomic). While ideally you’d fix the
memory usage problem, this doesn’t solve all cases.

Obviously. It’s easy and convenient to ignore the issue, and often
the issue doesn’t matter for a given piece of code. But if memory
usage or execution speed becomes an issue for one’s code, going back
and taking a look at the throwaway object creation, and addressing it,
can net considerable improvements.

Yes, for Ruby (and other GC’d languages), it’s much harder to leak memory
such that the GC can never clean it up - but it does (and has) happened.
This case I’m less concerned about as a leak of this magnitude should be
considered a bug and fixed.

Oh, I know. That’s why I brought it up, though. You were talking
about memory leaks, so I wanted to make a distinction. Real leaks,
like the Array#shift bug, or leaky continuations, or badly behaved
Ruby extensions, aren’t affected by GC manipulations.

Inefficient it may be - but it might be just optimizing for a different
problem. For example, take ActiveRecord’s association cache and it’s query
cache. If you’re doing a large number of queries each page load,
ActiveRecord is still going to cache them for each request - this is far
better than further round trips to the database, but may lead to a large
amount of memory consumed per each request.

Sure. And if it’s optimizing for a different problem, then that’s
fine, so long as the optimization isn’t creating a worse probablem
than the issue it’s trying to address.

But that’s also largely irrelevant, I think. I just did a quick test.
I created a program that creates 10 million objects. It has a
footprint of about a gigabyte of RAM usage. It takes Ruby 0.5 seconds
to walk that 10 million objects on my server.

If you have a web app that has processes anywhere near that large, you
have bigger problems to deal with. And if you have a more reasonably
large, million object app, then on my server, the GC cost would be
0.05 seconds. Given the typical speed of Rails apps, an occasional
0.05 second delay is going to be unnoticable.

Agreed. But again, I’d rather it be a constant cost outside of processing a
request than a variable cost inside of processing a request.

You’re worrying about something that just isn’t a problem in the vast,
vast majority of cases. Again, testing on my server, even with a very
simple, very fast piece of code creating objects, it takes almost 20x
as long to create the objects as to GC them.

This can vary from application to application - all the more reason to make
this a configurable option (and not the default).

It’s still my position that it’s not Mongrel’s job to be implementing
a manual memory management scheme that is almost always going to be a
performance loser over just leaving it alone.

It’s still my position that if one has an application that, through
testing, has been shown to have a use case where it can actually
benefit from manual GC.foo management, then one can trivially create a
mongrel handler that will do this for you.

I still disagree on this point - I doubt that Rails is the only web
framework that would benefit from being able to control when the GC is run.
This is going to be a common problem across frameworks whenever web
applications are consuming then releasing large amounts of memory - I’d say
it can be a pretty common use case for certain types of web applications.

My point is that if it is Rails code that is causing the problem,
that’s a Rails problem.

My point is also that manual GC.foo management is going to cause more
problems than it helps for the vast majority of applications. GC
cycles aren’t that slow, especially compared to the speed of a typical
Rails app, and certainly not when compared to the speed of a Rails
request that makes a lot of objects and does any sort of intensive,
time consuming operations.

Kirk H.

On Tue, Mar 25, 2008 at 11:53 AM, Zed A. Shaw [email protected]
wrote:

For

those applications, I may have a large amount of memory used for each
request, which will increase until the GC is run.

Well, does that mean you DO have this problem or DO NOT have this
problem? If you aren’t actually facing a real problem that could be
solved by this change then you most likely won’t get very far. Any
imagined scenario you come up with could easily just be avoided.

Right now my current deployment configuration for all my rails
applications
is using apache + fastcgi.

With this deployment strategy, if I don’t set the garbage collection in
my
dispatch.fcgi, any rails application I use that uses image magick (for
resizing/effects/etc) eats memory like a hog.
In my dispatch…
http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi
I usually set this to around 50 executions per gc run and my rails apps
seem
pretty happy.

This has been working great for me thus far, but using mod_fastcgi
leaves
zombies processes occasionally during restart. Checking in with the
docs,
mod_fastcgi is more or less deprecated, and mod_fcgid is prefered.
mod_fcgid has all sorts of issues (random 500s and the like), and to
boot
the documentation is quite poor.

So, I’ve decieded to move my apps over to using nginx with proxy with
mongrel. The decsion to move the nginx is pretty minor (it’s lighter
weight
and easier to configure), but my decision to move to mongrel warrented a
bit
of research. I do want to ensure that all of my applications behave
properly in terms of memory consumption and the first thing I’ve noticed
is
that mongrel doesn’t have the same options available for customizing
when
the GC runs.

This leads me to believe that either there’s something specific to rails
running under FastCGI that requires the GC to disabled/enabled during
processes execution or mongrel hasn’t implemented the feature yet.

If you want to do this then you’ll have to write code and you’ll have
to learn how to make a Mongrel handler that is registered before and
after the regular rails handler. All you do is have this before handler
analyze the request and disable the GC on the way in. In the after
handler you just have to renable the GC and make it do the work.

It’s pretty simple, but you will have to go read Mongrel code and
understand it first. Otherwise you’re just wasting your time really.

Zed A. Shaw

Sounds good to me - I don’t mind writing code, I just want to see if I
do
spend the time if it’s something the mongrel community would accept…

Quick question about the code change…

Counting the number of processes served and determining the GC behavior
should be done inside a mutex (or we start to run the risk of running
the GC
twice or mis-counting the number of requests processed).

I don’t see any common mutex used for all mongrel dispatchers, but the
logic
is specific within each type of http handler (rails, camping, etc).
Would
it make sense then to put the optional GC run check (and GC run, if
applicable) within the syncronize block for each http handler or is the
something that should live in the base HTTPHandler class?

  • scott

On Mon, Mar 24, 2008 at 3:58 PM, Scott W. [email protected]
wrote:

pretty happy.

You’re using RMagick, not ImageMagick directly. If you used the
later (via system calls) there will no be memory leakage you can worry
about.

This has been working great for me thus far, but using mod_fastcgi leaves
zombies processes occasionally during restart. Checking in with the docs,
mod_fastcgi is more or less deprecated, and mod_fcgid is prefered.
mod_fcgid has all sorts of issues (random 500s and the like), and to boot
the documentation is quite poor.

Moving from FastCGi to Mongrel will also require you monitor your
cluster processes with external tools, since you’re suing things that
leak too much memory like RMagick and requires restart of the process.

To make it clear: the memory leaked by RMagick cannot be recovered
with garbage collection mechanism. I tried that several times but in
the long run, required to restart and hunt down all the zombies
processes left by Apache.

So, I’ve decieded to move my apps over to using nginx with proxy with
mongrel. The decsion to move the nginx is pretty minor (it’s lighter weight
and easier to configure), but my decision to move to mongrel warrented a bit
of research. I do want to ensure that all of my applications behave
properly in terms of memory consumption and the first thing I’ve noticed is
that mongrel doesn’t have the same options available for customizing when
the GC runs.

Can you tell me how you addressed the “schedule” of the garbage
collection execution on your previous scenario? AFAIK most of the
frameworks or servers don’t impose to the user how often GC should be
performed.

This leads me to believe that either there’s something specific to rails
running under FastCGI that requires the GC to disabled/enabled during
processes execution or mongrel hasn’t implemented the feature yet.

I’ll bet is rails specific, or you should take a look at the fcgi ruby
extension, since it is responsible, ruby-side, of bridging both
worlds.

On a personal note, I believe is not responsibility of Mongrel, as a
webserver, take care of the garbage collection and leakage issues of
the Vm on which your application runs. In any case, the GC of the VM
(MRI Ruby) should be enhanced to work better with heavy load and long
running environments.


Luis L.
Multimedia systems

Human beings, who are almost unique in having the ability to learn from
the experience of others, are also remarkable for their apparent
disinclination to do so.
Douglas Adams

On Mon, 24 Mar 2008 08:21:52 -0700
“Scott W.” [email protected] wrote:

Right now, my processes aren’t gigantic… I’m preparing for a ‘worst case’
scenario when I have a extremely large processes or memory usage. This can
easily happen on specific applications such as an image server (using image
magick) or parsing/creating large xml payloads (a large REST server). For
those applications, I may have a large amount of memory used for each
request, which will increase until the GC is run.

Well, does that mean you DO have this problem or DO NOT have this
problem? If you aren’t actually facing a real problem that could be
solved by this change then you most likely won’t get very far. Any
imagined scenario you come up with could easily just be avoided.

If you want to do this then you’ll have to write code and you’ll have
to learn how to make a Mongrel handler that is registered before and
after the regular rails handler. All you do is have this before handler
analyze the request and disable the GC on the way in. In the after
handler you just have to renable the GC and make it do the work.

It’s pretty simple, but you will have to go read Mongrel code and
understand it first. Otherwise you’re just wasting your time really.


Zed A. Shaw

On Mon, Mar 24, 2008 at 12:18 PM, Luis L. [email protected]
wrote:

On Mon, Mar 24, 2008 at 3:58 PM, Scott W. [email protected] wrote:

You’re using RMagick, not ImageMagick directly. If you used the
later (via system calls) there will no be memory leakage you can worry
about.

You’re correct - I’m using ‘RMagick’ - and it uses a large amount of
memory. But that’s not really the overall point. My overall point is
how
to properly handle a rails app that uses a great deal of memory during
each
request. I’m pretty sure this happens in other rails applications that
don’t happen to use ‘RMagick’.

Moving from FastCGi to Mongrel will also require you monitor your
cluster processes with external tools, since you’re suing things that
leak too much memory like RMagick and requires restart of the process.

Yes, although all monitoring will be able to do is kill off a
mis-behaved
application. I’d much rather run garbage collection rather than kill of
my
application.

To make it clear: the memory leaked by RMagick cannot be recovered
with garbage collection mechanism. I tried that several times but in
the long run, required to restart and hunt down all the zombies
processes left by Apache.

So far, running the GC under fastcgi has given me pretty good results.
The
zombing issue with fast cgi is a known issue with mod_fastcgi and I’m
pretty
sure unrelated to RMagick or garbage collection.

Can you tell me how you addressed the “schedule” of the garbage

collection execution on your previous scenario? AFAIK most of the
frameworks or servers don’t impose to the user how often GC should be
performed.

In the previous scenario I was using fast_cgi with rails. In my
previous
reply I provided a link to the rails fastcgi dispatcher.
http://dev.rubyonrails.org/browser/trunk/railties/dispatches/dispatch.fcgi

In addtion, in other languages and other language web frameworks there
are
provisions to control garbage collection (for languages that have
garbage
collections, of course).

I’ll bet is rails specific, or you should take a look at the fcgi ruby
extension, since it is responsible, ruby-side, of bridging both
worlds.

This is done in the Rails FastCGI dispatcher. I believe that the
equivalent
of this in Mongrel is the Mongrel Rails dispatcher. Since the Mongrel
Rails
dispatcher is distributed as a part of Mongrel, I’d say this code is
owned
by Mongrel, which bridges these two worlds when using mongrel as a
webserver.

On a personal note, I believe is not responsibility of Mongrel, as a
webserver, take care of the garbage collection and leakage issues of
the Vm on which your application runs. In any case, the GC of the VM
(MRI Ruby) should be enhanced to work better with heavy load and long
running environments.

Ruby provides an API to access and call the Garbage Collector. This
provides ruby application developers the ability to control when the
garbage
collection is run because in some cases, there may be an
application-specific reason to prevent or explicity run the GC. Web
servers
are a good example of these applications where state may help determine
a
better time to run the GC. As you’re serving each request, you’re
generally
allocating a number of objects, then rendering output, then moving on to
the
next request.

By limiting the GC to run in between requests rather than during
requests
you are trading request time for latency between requests. This is a
trade-off that I think web application developers should deciede, but by
no
means should this be a default or silver bullet for all. My position is
that this just be an option within Mongrel as a web server.

  • scott

On Mon, Mar 24, 2008 at 4:59 PM, Scott W. [email protected]
wrote:

about.

You’re correct - I’m using ‘RMagick’ - and it uses a large amount of memory.
But that’s not really the overall point. My overall point is how to
properly handle a rails app that uses a great deal of memory during each
request. I’m pretty sure this happens in other rails applications that
don’t happen to use ‘RMagick’.

Yes, I faced huge memory usage issues with other things non related to
image processing and found that a good thing was move them out of the
request-response cycle and into a out-of-bound background job.

So far, running the GC under fastcgi has given me pretty good results. The
zombing issue with fast cgi is a known issue with mod_fastcgi and I’m pretty
sure unrelated to RMagick or garbage collection.

Yes, but even you “reclaim” the memory with GC, there will be pieces
that wouldn’t be GC’ed ever, since the leaked in the C side, outside
GC control (some of the RMagick and ImageMagick mysteries).

This is done in the Rails FastCGI dispatcher. I believe that the equivalent
of this in Mongrel is the Mongrel Rails dispatcher. Since the Mongrel Rails
dispatcher is distributed as a part of Mongrel, I’d say this code is owned
by Mongrel, which bridges these two worlds when using mongrel as a
webserver.

Then you could provide a different Mongrel Handler that could perform
that, or even a series of GemPlugins that provide a gc:start instead
of plain ‘start’ command mongrel_rails scripts provides.

collection is run because in some cases, there may be an
that this just be an option within Mongrel as a web server.

–gc-interval maybe?

Now that you convinced me and proved your point, having the option to
perform it (optionally, not forced) will be something good to have.

Patches are Welcome :wink:


Luis L.
Multimedia systems

Human beings, who are almost unique in having the ability to learn from
the experience of others, are also remarkable for their apparent
disinclination to do so.
Douglas Adams

At 08:21 AM 3/24/2008, [email protected] wrote:

scenario when I have a extremely large processes or memory
creation
(good encapsulation, usage of a another library/gem you can’t modify,
large
operations that you need to keep atomic). While ideally you’d fix the
memory usage problem, this doesn’t solve all cases.

Hi Scott,

I hope this somewhat OT post is ok (feedback welcome). I’ve had memory
problems with image magick too - even when it runs out of process. On
certain (rare but reasonably sized) image files it seems to go memory
haywire, eating too much memory and throwing my app stack into swap.

So I wrote this simple rails plug-in which is very limited in function,
but does mostly what I needed from an image processor. Notable for your
issue above, it lets you easily specify limits on how much memory image
magick is allowed to consume while doing its work (thanks to Ara Howard
on initial direction on that one). It might be interest to you:

http://www.misuse.org/science/2008/01/30/mojomagick-ruby-image-library-for-imagemagick/

Best,

Steve

Forgive me for not having read the whole thread, however, there is one
thing that seems to be really important, and that is, ruby hardly ever
runs the damned GC. It certainly doesn’t do full runs nearly often
enough (IMO).

Also, implicit OOMEs or GC runs quite often DO NOT affect the
extensions correctly. I don’t know what rmagick is doing under the
hood in this area, but having been generating large portions of
country maps with it (and moving away from it very rapidly), I know
the GC doesn’t do “The Right Thing”.

First call of address is GC_MALLOC_LIMIT and friends. For any small
script that doesn’t breach that value, the GC simply doesn’t run. More
than this, RMagick, in it’s apparent ‘wisdom’ never frees memory if
the GC never runs. Seriously, check it out. Make a tiny script, and
make a huge image with it. Hell, make 20, get an OOME, and watch for a
run of the GC. The OOME will reach your code before the GC calls on
RMagick to free.

Now, add a call to GC.start, and no OOME. Despite the limitations of
it (ruby performance only IMO), most of the above experience was built
up on windows, and last usage was about 6 months ago, FYI.

On 24 Mar 2008, at 20:37, Luis L. wrote:

each
request. I’m pretty sure this happens in other rails applications
that
don’t happen to use ‘RMagick’.

Personally, I’ll simply say call the GC more often. Seriously. I mean
it. It’s not that slow, not at all. In fact, I call GC.start
explicitly inside of by ubygems.rb due to stuff I have observed before:

http://blog.ra66i.org/archives/informatics/2007/10/05/calling-on-the-gc-after-rubygems/
- N.B. This isn’t “FIXED” it’s still a good idea (gem 1.0.1).

Now, by my reckoning (and a few production apps seem to be showing
emperically (purely emperical, sorry)) we should be calling on the GC
whilst loading up the apps. I mean come on, when are a really serious
number of temporary objects being created. Actually, it’s when
rubygems loads, and that’s the first thing that happens in, hmm,
probably over 90% of ruby processes out there.

sure unrelated to RMagick or garbage collection.

Yes, but even you “reclaim” the memory with GC, there will be pieces
that wouldn’t be GC’ed ever, since the leaked in the C side, outside
GC control (some of the RMagick and ImageMagick mysteries).

Sure, but leaks are odd things. Some processes that appear to be
leaking are really just fragmenting (allocating more ram due to lack
of ‘usable’ space on ‘the heap’. Call the GC more often, take a 0.01%
performance hit, and monitor. I bet it’ll get better. In fact, you can
drop fragmentation the first allocated segment significantly just by
calling GC.start after a rubygems load, if you have more than a few
gems.

Can you tell me how you addressed the “schedule” of the garbage
collection execution on your previous scenario? AFAIK most of the
frameworks or servers don’t impose to the user how often GC should
be
performed.

In fact there are many rubyists who hate the idea of splatting
GC.start into processes. Given what I’ve seen, I’m willing to reject
that notion completely. Test yourself, YMMV.

FYI, even on windows under the OCI, where performance for the
interpreter sucks, really really hard, I couldn’t reliably measure the
runtime of a call to GC.start after loading rubygems. I don’t know
what kind of ‘performance’ people are after, but I can’t see the point
in not running the GC more often, especially for ‘more common’ daemon
load. Furthermore, hitting the kernel for more allocations more often,
is actually pretty slow too, so this may actually even result in
faster processes under certain conditions.

Running a lib like RMagick, I would say you should be doing this,
straight up, no arguments.

garbage
equivalent
of this in Mongrel is the Mongrel Rails dispatcher. Since the
Mongrel Rails
dispatcher is distributed as a part of Mongrel, I’d say this code
is owned
by Mongrel, which bridges these two worlds when using mongrel as a
webserver.

It doesn’t really matter where you run the GC. It matters that it
runs, how often, and what it’s doing. If you’re actually calling on
the GC and freeing nothing, that’s stupid, but if you’ve run RMagick
up, just call GC.start anyway, and I’m pretty sure it’ll help. There’s
certainly no harm in investigating this, unless you’re doing something
silly with weakrefs.

Then you could provide a different Mongrel Handler that could perform
that, or even a series of GemPlugins that provide a gc:start instead
of plain ‘start’ command mongrel_rails scripts provides.

$occasional_gc_run_counter = 0
before_filter :occasional_gc_run

def occasional_gc_run
$occasional_gc_run_counter += 1
if $occasional_gc_run_counter > 1_000
$occasional_gc_run_counter = 0
GC.start
end
end

Or whatever. It doesn’t really matter that much where you do this, or
when, it just needs to happen every now and then. More importantly,
add a GC.start to the end of environment.rb, and you will have
literally half the number of objects in ObjectSpace.

On a personal note, I believe is not responsibility of Mongrel, as a
webserver, take care of the garbage collection and leakage issues of
the Vm on which your application runs. In any case, the GC of the VM
(MRI Ruby) should be enhanced to work better with heavy load and
long
running environments.

Right, and it’s not just the interpreter, although indirection around
this stuff can help. (such as compacting).

generally
position is
that this just be an option within Mongrel as a web server.

Right, I think this is important too. You’re absolutely right that
there’s no specific place to provide a generic solution. In rails the
answer may be simple, but that’s because rails outer architecture is
simplistic. No threads, no out-of-request processing, and so on.

–gc-interval maybe?

Now that you convinced me and proved your point, having the option to
perform it (optionally, not forced) will be something good to have.

Surely you can just:

require ‘thread’
Thread.new { loop { sleep GC_FORCE_INTERVAL; GC.start } }

In environment.rb in that case.

Of course, this is going to kill performance under evented_mongrel,
thin and so on. I’d stay away from threaded solutions. _why blogged
years ago about the GC, trying to remind people that we actually have
control. I know ruby is supposed to abstract memory problems etc away
from us, and for the most part it does, but hey, no one’s perfect,
right? :slight_smile:

http://whytheluckystiff.net/articles/theFullyUpturnedBin.html

Patches are Welcome :wink:

Have fun! :o)

My hunch is that rmagick is allocating large amounts of RAM ouside of
Ruby. It registers its objects with the interpreter, but the RAM
usage in rmagick itself doesn’t count against GC_MALLOC_LIMIT because
Ruby didn’t allocate it, so doesn’t know about it.

It’s allocating opaque objects on the Ruby heap but not using Ruby’s
built-in malloc? That seems pretty evil.

Evan