Ruby GC question (MRI, JRuby, etc)

My basic understanding of the garbage collectors in use by the various
Ruby runtimes is that they all search for objects from a “root” memory
object. If an object cannot be reached from this root, then it is
collected.

Here’s a snippet of ruby code. I’m not sure how the GC will treat it.

class Foo
def initialize
@baz = Baz.new
@quxxo = Quxxo.new
end
end

class Bar
def run
Foo.new
nil
end
end

bar = Bar.new
bar.run
bar.run
bar.run

What happens to the instances of Foo created in the call to #run? Since
I am not saving them somewhere (e.g. to an array), do they get
collected right away?

If the Foo instances get collected, is it safe to assume the Baz and
Quxxo instances are being collected at the same time? Does their
existence prevent the Foo instance from being collected?

cr

On Tue, Aug 17, 2010 at 11:19 AM, Chuck R. [email protected]
wrote:

My basic understanding of the garbage collectors in use by the various Ruby runtimes is that they all search for objects from a “root” memory object. If an object cannot be reached from this root, then it is collected.

It depends on the Ruby. JRuby and Rubinius have different garbage
collectors than MRI Ruby.

What happens to the instances of Foo created in the call to #run? Since I am not saving them somewhere (e.g. to an array), do they get collected right away?

Nothing is ever collected right away in the MRI rubies currently. The
object will exist in memory until a GC cycle runs. Unless a GC cycle
is started manually (GC.start), GC cycles only run when Ruby runs
short on preallocated memory. Take a look at this
http://www.engineyard.com/blog/2010/mri-memory-allocation-a-primer-for-developers/
or google on the subject and you’ll find a number of articles that
will explain how it works in more detail than you will get in an
email.

If the Foo instances get collected, is it safe to assume the Baz and Quxxo instances are being collected at the same time? Does their existence prevent the Foo instance from being collected?

It depends on why you are assuming it. If you have an implementation
that depends on specific garbage collection behaviors or collection in
specific chronologies in order to work right, it is probably not safe
to assume anything. If you are just trying to understand the memory
behavior of your code, and make sure you aren’t doing dumb things that
can lead to a memory leak, then yes, it is safe to assume that the Baz
and Quxxo instances will be collected along with the Foo.

Kirk H.

On Aug 17, 2010, at 12:42 PM, Kirk H. wrote:

is started manually (GC.start), GC cycles only run when Ruby runs
specific chronologies in order to work right, it is probably not safe
to assume anything. If you are just trying to understand the memory
behavior of your code, and make sure you aren’t doing dumb things that
can lead to a memory leak, then yes, it is safe to assume that the Baz
and Quxxo instances will be collected along with the Foo.

Kirk,

thanks for the pointer to your write-up at engineyard. I’ll be sure to
read through it.

In the meantime, it looks like I need to save my Foo instances to an
array or something similar if I want to make sure that they do NOT get
collected until I’m ready.

cr

On Aug 18, 2010, at 4:44 AM, Brian C. wrote:

Chuck R. wrote:

In the meantime, it looks like I need to save my Foo instances to an
array or something similar if I want to make sure that they do NOT get
collected until I’m ready.

In any case, if you don’t keep a reference to them somewhere, then you
can never call any method on them, so the objects are obviously useless
(which is why they are garbage-collected in the first place)

Not necessarily true. The Bar class in my example could have its own
internal lifecycle where it is generating events for Baz and Quxxo which
in turn are reacting to or generating events for Bar. Plus, they all may
be interacting with yet more objects on a local or remote system.
Retaining a reference to the Bar instance from Foo does not preclude
them from doing useful work.

cr

Chuck R. wrote:

In any case, if you don’t keep a reference to them somewhere, then you
can never call any method on them, so the objects are obviously useless
(which is why they are garbage-collected in the first place)

Not necessarily true. The Bar class in my example could have its own
internal lifecycle where it is generating events for Baz and Quxxo

Not unless it is running in its own thread. In that case, there will be
a reference to the object held within the thread - for example in a
local variable.

However, if the object exists solely to be shared by DRb, then yes you
will need to keep a handle to it to stop it being garbage-collected.
That’s because DRb uses _id2ref to locate objects via just their id.

Chuck R. wrote:

In the meantime, it looks like I need to save my Foo instances to an
array or something similar if I want to make sure that they do NOT get
collected until I’m ready.

In any case, if you don’t keep a reference to them somewhere, then you
can never call any method on them, so the objects are obviously useless
(which is why they are garbage-collected in the first place)

On Aug 18, 2010, at 8:58 AM, Robert K. wrote:

If an object cannot be reached from this root, then it can be collected.

Small change, big difference. :slight_smile:

Ha! Yes, quite true. Depending upon the GC algo in use, some objects may
never be collected even though they are eligible for collection.

cr

Chuck R. wrote:

If an object cannot be reached from this root, then it can be collected.

Small change, big difference. :slight_smile:
Ha! Yes, quite true. Depending upon the GC algo in use, some objects
may never be collected even though they are eligible for
collection.

In particular, the GC algorithms in MRI and YARV are specifically
designed with the assumption that they will never actually run in
99.999% of all cases. They are designed for scripting, where a script
doesn’t even allocate enough memory to trigger a collection, runs for
a couple of seconds and then exits, after which the OS simply reclaims
the memory: no GC needed.

That’s why YARV and especially MRI are so exceptionally bad for server
loads. It’s also why REE can never be merged into mainline.

Unless specifically guaranteed by the language specification, you
simply cannot make any assumptions about when or even if objects get
collected. Not even Python makes such guarantees, popular myths
notwithstanding.

jwm

Hi,

In message “Re: Ruby GC question (MRI, JRuby, etc)”
on Sat, 21 Aug 2010 08:15:15 +0900, Jörg W Mittag
[email protected] writes:

|In particular, the GC algorithms in MRI and YARV are specifically
|designed with the assumption that they will never actually run in
|99.999% of all cases. They are designed for scripting, where a script
|doesn’t even allocate enough memory to trigger a collection, runs for
|a couple of seconds and then exits, after which the OS simply reclaims
|the memory: no GC needed.

99.999% is a bit over-exaggerated, but it is true that garbage
collection algorithm of YARV and MRI focus for throughput on
non-memory extensive short-running programs, and GC of REE is not
suitable for those programs.

          matz.

2010/8/17 Chuck R. [email protected]:

My basic understanding of the garbage collectors in use by the various Ruby runtimes is
that they all search for objects from a “root” memory object. If an object cannot be reached
from this root, then it is collected.

There is a small error in the wording above. While the issue has been
explained already I want to stress this point because this is a
mistake many new to GC make and it explains some weird effects that
special tests show. It should have read

If an object cannot be reached from this root, then it can be
collected.

Small change, big difference. :slight_smile:

Kind regards

robert

On Tue, Aug 17, 2010 at 12:19 PM, Chuck R. [email protected]
wrote:

My basic understanding of the garbage collectors in use by the various Ruby runtimes is that they all search for objects from a “root” memory object. If an object cannot be reached from this root, then it is collected.

Here’s a snippet of ruby code. I’m not sure how the GC will treat it.

What happens to the instances of Foo created in the call to #run? Since I am not saving them  somewhere (e.g. to an array), do they get collected right away?

Not right away on any impl; they’re allocated on the heap, so even
though they’re immediately abandoned they still require GC to run.

If the Foo instances get collected, is it safe to assume the Baz and Quxxo instances are being collected at the same time? Does their existence prevent the Foo instance from being collected?

They would not; no external references to the Foo instance ever exist
on the heap or on the stack.

As far as how this behaves in JRuby: since the object is short-lived,
it would never make it out of the “eden” space on the heap, and with
the JVM’s GC that means it would basically have no GC cost (young
objects that don’t survive even a single GC cycle are practically
free). The only cost you’d be paying would be the allocation and .new
costs.

  • Charlie