Ruby's garbage collector

maz · October 24, 2006, 6:56am

Is there a name for Ruby’s garbage collecting strategy?
I’m on a web forum and the topic of garbage collection came up.
There’s a claim that you can never make a “perfect” garbage collector.
In
other words, there will always exist (pathological) conditions where
your
collection strategy will fail to find inaccessible objects. At least,
that’s my best understanding of the claim and it strikes me as… false.
I
can understand that the task is non-trivial but simply impossible? Can
that be right?
Furthermore, according to some, Java (a friendly Ruby competitor) is
incapable of collecting, specifically, circular references. I was just
wondering if Ruby had the same limitation?
Thank you…

maz · October 24, 2006, 7:02am

On Oct 24, 2006, at 8:55 AM, Just Another Victim of the Ambient
Morality wrote:

Is there a name for Ruby's garbage collecting strategy?

semi-conservative mark-and-sweep

maz · October 24, 2006, 7:40am

I do not know much about the Ruby Garbage collector, but I know Java can
easily collect circular references. Most modern Java VMs use
Generational garbage collection.

The only “garbage collection” technique I can think of right now that
fails to collect circular references is ref counting.

I’d be interested in hearing about what kind of “pathalogical”
conditions you are talking about, but my guess is that they do not
actually prevent collection of inaccessible objects. Normally, they
either increase response time or create additional memory overhead.

The claim that “there is no perfect garbage collector” can be
generalized to “there is no perfect general allocation model”. Any
memory allocation contains a host of trade offs. Java is actually
pretty advanced in a number of these, if you look at either the JRockit
VM or the more modern Sun VMs.

--Will

maz · October 24, 2006, 7:48am

Just in case you want to learn more, here is a pretty good survey of GC
techniques.

ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps

--Will

maz · October 24, 2006, 8:06am

There’s a claim that you can never make a “perfect” garbage
collector. In
other words, there will always exist (pathological) conditions
where your
collection strategy will fail to find inaccessible objects.

Conservative garbage collectors can leave some unused data if it
seems to look like
used object, however, ruby garbage collector can leave objects, that
have references from current stack.
So, after some calls, stack will change and unused objects will leave.

I don’t know, what are this ideas (about impossibility of perfect
collector) are based on.

Furthermore, according to some, Java (a friendly Ruby  
competitor) is
incapable of collecting, specifically, circular references. I was
just
wondering if Ruby had the same limitation?

No, Ruby doesn’t have this limitation and Sun Java machine (I don’t
know about others) doesn’t
have such limitation. Old Microsoft Java machine had this limitation,
because it used
only reference counting.
CORBA libraries have this limitation because of reference counting.

Python will not reclaim objects in circle with defined del
method. As ruby objects
have no destructors (finalizators), Ruby has no such problems.

maz · October 24, 2006, 3:57pm

On Tue, 24 Oct 2006 04:54:04 +0000, Just Another Victim of the Ambient
Morality wrote:

wondering if Ruby had the same limitation?
Thank you…

If it’s mathematically impossible for a garbage collector to find every
unused object, I certainly wasn’t told about it when I took compilers
class a couple years ago.

AIUI, Java uses mark-and-sweep garbage collection just like Ruby, and
both
can handle circular references. The downside to mark-and-sweep garbage
collection is the fact that mark-and-sweep can take a while at random
points in the program, so it’s not advisable for real-time programming,
and could cause noticable delays in interactive programs. It also means
that objects holding other resources (like file handles) may keep those
resources open indefinitely if they are not manually closed. Ruby
handles
this by allowing you to use a block to scope the sue of such resources,
such as

open(“file”) {|f| … } # after the block, f is now closed.

Reference counting (used by Perl) keeps a count of how many pointers
there
are to each object and frees those objects when the reference count
reaches zero. This can’t free circular references because every object
in
the circle has a pointer to it from something else in the circle. The
solution when using this kind of system is to keep circular structures
encapsulated in some other object which is written in such a way as to
break the cycle when its reference count drops to zero (in Perl where
all
pointers are reference counted), or which can use uncounted pointers
for the circular stuff can take care of disposing of the objects itself
(in
C++ where reference counting requires you to create a special pointer
class like a ref_ptr). The advantage to this method is that you know
exactly when your objects are disposed, it works well across processes
(for example COM uses it), and it can be mixed with uncounted code quite
easily (as I mentioned regarding C++).

maz · October 24, 2006, 4:11pm

On 24.10.2006 15:51, Ken B. wrote:

AIUI, Java uses mark-and-sweep garbage collection just like Ruby,

In fact Java’s GC is much more complex. It uses mark and sweep among
other mechanisms.

Kind regards

robert

maz · October 25, 2006, 9:02am

On 10/24/06, Just Another Victim of the Ambient M.
[email protected] wrote:

Is there a name for Ruby's garbage collecting strategy?

It’s “semi-conservative mark&sweep”.

I'm on a web forum and the topic of garbage collection came up.
There’s a claim that you can never make a “perfect” garbage collector. In
other words, there will always exist (pathological) conditions where your
collection strategy will fail to find inaccessible objects. At least,
that’s my best understanding of the claim and it strikes me as… false. I
can understand that the task is non-trivial but simply impossible? Can
that be right?

Depending on what you want, it might be impossible.
Garbage collection destroys objects if it is sure they will never be
used again. Normally it works by having a set of “obviously live”
objects (like global variables, local variables and so on), and
following pointers (like instance variables) they contain to find out
which objects are live.

If object cannot be reached that way, it can usually be considered dead
(disregarding stuff like C extensions). But the other way it isn’t
really true.
Objects often contain pointers that cannot be followed, but GC might
not know that.

One common example:
a_big_object = BigObject.new
an_array = [a_big_object, b, c]
an_array.shift
return an_array
Of course at this point a_big_object is dead - the returned array
contains only b and c. But Array#shift is most often implemented
not by copying second element to first position etc., but by
simply marking the first element as inactive - internally an_array
is something like {:size => 3, :ignore_first => 1, :contents =>
[a_big_object, b, c]}. In some languages GC doesn’t know how arrays
work,
so it thinks all objects linked from it are live - so it will not
reclaim a_big_object. Telling GC about standard arrays is easy, but
many data structures contain inactive links like that and
have similar problems.

Second common example (this is silly, but it actually happens a lot):
def sum_of_a_list(node, partial_sum=0)
if node == nil
return partial_sum
else
return sum_of_a_list(node.next, node.this + partial_sum)
end
end

So we ask for sum_of_a_list(make_a_huge_linked_list()).
Linked lists consists of many nodes - as we go deeper,
initial nodes of the list will become dead. But most GC systems
won’t release them, as they’re arguments to functions that didn’t
return yet. Only systems with tail-recursion optimization (like Scheme)
will be able to free such nodes.

More general case:
def foo
a_big_object = BigObject.new
bar(a_big_object)
do_something_else_but_dont_touch_a_big_object
end
a_big_object is dead after bar returns. Some compilers
will know that, but most won’t remove it as long as foo runs.

Then it gets even worse:
def foo
a_big_object = BigObject.new
a_widget.set_callback {
print “Called!”
}
end

Even when foo finished, a_big_object won’t be released
because the callback closure still “sees” it.

And we could go on and on. These are all things that
some actually used GCs can deal with, while other
actually used GCs don’t, not purely theoretical cases.

Perfect garbage collectors simply don’t exist.
Then, they usually work quite well, and when they don’t
there are simple workarounds like setting a_big_object = nil
after you’re done with it.

Furthermore, according to some, Java (a friendly Ruby competitor) is
incapable of collecting, specifically, circular references. I was just
wondering if Ruby had the same limitation?

No “real” garbage collection has such limitation,
certainly neither Java’s nor Ruby’s. Perl and Python
do not have real garbage collectors

maz · October 25, 2006, 10:12am

On 10/24/06, Just Another Victim of the Ambient M.
[email protected] wrote:

I'm on a web forum and the topic of garbage collection came up.
There’s a claim that you can never make a “perfect” garbage collector. In
other words, there will always exist (pathological) conditions where your
collection strategy will fail to find inaccessible objects. At least,
that’s my best understanding of the claim and it strikes me as… false. I
can understand that the task is non-trivial but simply impossible? Can
that be right?

The claim, as you have stated it, is in fact false. A similar claim,
that there will always exist pathological conditions where your
collection strategy will fail to find objects that are unused, is,
however true. The basic mark and sweep garbage collector will always
find all inaccessible objects, and all proper garbage collection
strategies can be shown to do this eventually once the program has
reached a steady state. But not all accessible objects will ever again
be used by the program, even though they are reachable; such objects
are called semantic garbage: they are reachable, but deallocating them
would have no consequences to the operation of the program. Safely
collecting semantic garbage is a difficult task, in its fullest
generality it is actually impossible, as its solution is equivalent to
the halting problem.

Furthermore, according to some, Java (a friendly Ruby competitor) is
incapable of collecting, specifically, circular references. I was just
wondering if Ruby had the same limitation?

This statement is false. Neither Java nor Ruby have trouble collecting
garbage consisting of circular references. Ruby and Perl on the other
hand, use a reference counting “garbage collector” that does have
this limitation.

maz · October 25, 2006, 3:33am

On 10/24/06, Ken B. [email protected] wrote:

If it’s mathematically impossible for a garbage collector to find every
unused object, I certainly wasn’t told about it when I took compilers
class a couple years ago.

It depends on how you define unused.

Actually a GC should remove unusable objects, i.e. objects which are
no longer reachable, and therefore can no longer affect future
computations.

Of course no GC can predict whether or not a reachable object will
ever be used in the future. So ‘memory leaks’ are possible even with a
GC. One of, but certainly not the only, reasons that global variables
are frowned upon is that they are opportunities to accumulate
reference chains to objects which otherwise should be GCed.

On the other hand the worst thing a GC can do is to reclaim objects
which ARE reachable. That’s why the Ruby GC doesn’t claim objects
which are reachable via references on an execution stack. Someone
mentioned this in another post to this thread.

Reference counting was probably the first GC algorithm, it’s safe in
the sense that it doesn’t free reachable objects, but it doesn’t free
non-reachable objects which form a self referencing set.

AIUI, Java uses mark-and-sweep garbage collection just like Ruby, and both
can handle circular references. The downside to mark-and-sweep garbage
collection is the fact that mark-and-sweep can take a while at random
points in the program, so it’s not advisable for real-time programming,
and could cause noticable delays in interactive programs.

It depends on the implementation, but most Java VMs use a form of
generational garbage collection. This was first invented by Dave
Ungar for the UC Berkeley implementation of Smalltalk, who observed
that most objects either quickly became dead (unreachable) or lived a
long time. Ungar’s original ‘generation scavenging’ GC algorithm
allocates new objects in a small space called newspace, and when
newspace fills, it moves the live objects in newspace to a new
newspace, leaving the others behind. After an object lives a certain
small number of generations, usually 2**n for a small n (like 2 or 3)
objects get tenured, to a larger “old space” Old space gets
infrequently reclaimed by another algorithm, often mark-sweep. It’s
called scavenging because, rather than freeing unused new objects, it
scavenges live ones before throwing away wholesale those left in the
old newspace.

Generational scavenging can also, but not necessarily, have nice
properties in virtual memory systems, since it tends to keep the young
object closer together, which tends to keep the working set smaller.

On the downside, these types of GC algorithms mean that the address of
an object changes over its lifetime, which makes the API for what ruby
calls extensions, and what Smalltalk calls primitives (i.e. code
written in C or another low-level language which deals with objects)
have to deal with volatile object pointers.

It also complicates implementing finalization since dead objects in
newspace are reclaimed by being left behind which makes their
detection more expensive.

It also means
that objects holding other resources (like file handles) may keep those
resources open indefinitely if they are not manually closed. Ruby handles
this by allowing you to use a block to scope the sue of such resources,
such as

open(“file”) {|f| … } # after the block, f is now closed.

Of course this is an issue regardless of the GC algorithm, it really
relates to what I said earlier about a GC collecting objects
potentially still in use as a cardinal sin.

Actually the nice thing about Ruby’s use of blocks for things like
IO#open is that it nicely handles a lot of cases for which
finalization is used in other languages, and finalization is best used
sparingly since it only gets triggered when and if an object is GCed.
Relying on finalization to free up non-object resources (e.g. to close
files) is not the best practice in general.

class like a ref_ptr). The advantage to this method is that you know
exactly when your objects are disposed, it works well across processes
(for example COM uses it), and it can be mixed with uncounted code quite
easily (as I mentioned regarding C++).

While mark and sweep does tend to ‘stop the world’, reference counting
is actually more expensive because of the amount of bookkeeping it
requires. Every assignment requires an increment of the ref count of
the object now referenced, an decrement of the ref count of the object
formerly referenced, and a check to see if that count has gone to
zero.

There are also variants of mark and sweep which allow it to be
incremental, which combines the lower overall overhead with the
ability to amorize the impact over time.

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

maz · October 25, 2006, 10:36am

On 25.10.2006 10:10, Dido S. wrote:

This statement is false. Neither Java nor Ruby have trouble collecting
garbage consisting of circular references. Ruby and Perl on the other
hand, use a reference counting “garbage collector” that does have
this limitation.

Um, this sounds pretty contradictory.

robert

maz · October 25, 2006, 11:04am

On 10/25/06, Robert K. [email protected] wrote:

On 25.10.2006 10:10, Dido S. wrote:

This statement is false. Neither Java nor Ruby have trouble collecting
garbage consisting of circular references. Ruby and Perl on the other
hand, use a reference counting “garbage collector” that does have
this limitation.

Um, this sounds pretty contradictory.

Oops, I meant to say Python and Perl.

maz · October 27, 2006, 9:36pm

On Wed, 25 Oct 2006 18:03:29 +0900,
Dido S. [email protected] wrote:

Oops, I meant to say Python and Perl.

Correction: Python has had the ability to collect most cycles since
version 2.0. (Cycles of objects with finalizers aren’t collected.)

–amk