Soft object reference for mark and sweep

aris · January 26, 2013, 7:14am

Hi.

I recall a mention a bit back about wanting to see more interesting
problems on ruby-talk. Let’s see if I can help out.

I have a system in which a Ruby script has a number of remote object
references (to a script in another process), identified by an object
with a unique string ID, which is used to make remote calls on or using
these objects. The system is all in place and is working fine- it’s
actually multi-way and each connection is bidirectional, but that is
unimportant for now.

I would like the script to be able to effectively ignore whether
something is remote or not, which means I’d like it to be able to pass
around these remote objects and be able to assume that once an object is
no longer used, it’ll just drop out of use and be garbage collected in
time, like everything else.

One problem: The store keeps a reference to the object in its Hash, for
use in lookups and incoming calls. Because of this, it’ll never be
collected. I’d like to somehow have it be collected if it’s just down to
one “reference”, and the information available to the store, so it knows
it can lose- or has just lost- the object.

The problem is essentially similar to the idea of weak pointers in a
reference-counted system- something by which you can hold a weak
reference that can detect, but doesn’t prevent, cleanup. If this was a
C++ problem, I’d be using Boost smart pointers, the store would use a
weak_ptr, and everything else would use a shared_ptr.

However, this doesn’t really make sense in a mark-and-sweep context, nor
does the idea of the object being “partially freeable” make sense, ie.
aware that it is just about to vanish or can be made free.

Any ideas as to a good way or suitable mechanisms to implement this? Is
there some way to look at an object and be able to detect that this is
the last reference that exists to the object in a script?

NB: I already have one idea- the Ruby script is embedded, so I might be
able to do something clever on the C/C++ side with Data_Wrap_Struct and
family, so that when a certain object is freed, I change something on
the C side, and the objects are stored in a container that doesn’t pass
on any of the mark calls. I’m not sure of the specifics yet, I’m sure
it’s quite possible, but I thought I’d just dig around for suggestions
beforehand.

Cheers,
Garth

Garthy_D · January 26, 2013, 9:53am

Any ideas as to a good way or suitable mechanisms to implement this?

Just a silly one: if you seldom use that Hash in the store, you can
eliminate it. You could
still lookup your objects by iterating trough ObjectSpace and find them.
But this way the
GC will be able to drop objects if there’s no reference to them.
Basically
what you do here
the same old stuff: a trade off between memory and CPU. You have to
decide if lookup
speed is more important than memory footprint or the other way around.
(…or wait for a
less silly answer here…

Garthy_D · January 26, 2013, 10:12am

Hi Nokan,

Thankyou for the input, and no, it’s not silly. It’s opened up the
possibility of another approach- don’t use the Hash at all, and merely
dig through ObjectSpace when needed, possibly even externally to the
script in some way. Combined with a cache it might form an interesting
first-pass solution. In any case, it raises the question as to whether
looking through things externally (ie. via ObjectSpace) could offer a
potential solution. I’m not 100% sure how it’d all fit together, but
it’s definitely got me thinking.

Cheers,
Garth

PS. I’m also relieved that my original post was understandable- it’s a
problem that’s a little hard to explain.

Garthy_D · January 26, 2013, 5:17pm

On Sat, Jan 26, 2013 at 9:52 AM, Nokan E. [email protected]
wrote:

what you do here
the same old stuff: a trade off between memory and CPU. You have to decide
if lookup
speed is more important than memory footprint or the other way around.
(…or wait for a
less silly answer here…

I think using WeakReference is even better because ObjectSpace lookup
is slow and does not work in some circumstances (JRuby with specific
settings).

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/weakref/rdoc/index.html

If need by one can always create a special Hash wrapper which converts
from and to WeakReference on insert and retrieval.

Kind regards

robert

Garthy_D · February 8, 2013, 5:43am

On Sun, Jan 27, 2013 at 8:21 AM, Garthy D
[email protected] wrote:

The WeakRef interface worries me though. You’d normally expect to see just
one call on such a thing (lock, which turns weak to strong/fail) and maybe a
check call, with a big warning that the result might change post-check. A
delegation-style interface with a check only seems a bit unusual. However, I
may just not have a proper understanding of how it works yet.

I strongly discourage using WeakRef for its delegate interface. It’s a
terrible pattern that simply needs to go away. If you use WeakRef,
just use it as a weak object holder and always check for nil on the
return value. You should be ok.

You might also check out the “weakling” gem, which provides some other
nice features from JVM like reference queues, where your WeakRefs get
inserted as their references get collected. It’s a bit more efficient
(or at least doesn’t impact GC performance) than using finalizers.

Charlie

Garthy_D · February 16, 2013, 7:47am

Hi Charlie,

On 08/02/13 15:12, Charles Oliver N. wrote:

just use it as a weak object holder and always check for nil on the
return value. You should be ok.

You might also check out the “weakling” gem, which provides some other
nice features from JVM like reference queues, where your WeakRefs get
inserted as their references get collected. It’s a bit more efficient
(or at least doesn’t impact GC performance) than using finalizers.

Thankyou for your thoughts on this one. Apologies for my slow reply-
some time had elapsed and I’ve not got anything to notify me on new
replies to the thread. I only just noticed this one.

Doing some research on WeakRef revealed it had significant problems
under MRI- not just with the Delegate interface, but that is sometimes
got the object references wrong. That pretty-much rules it out for me. I
didn’t proceed using WeakRef in the end. This still isn’t a solved
problem for me so I can’t say how I’ve overcome it yet, but if my
current approach turns out then hopefully I’ll have something
interesting to report back.

Cheers,
Garth

Garthy_D · January 27, 2013, 8:22am

Hi Robert,

On 27/01/13 02:46, Robert K. wrote:

GC will be able to drop objects if there’s no reference to them. Basically

Index of Classes & Methods in weakref: Ruby Standard Library Documentation (Ruby 1.9.3)

If need by one can always create a special Hash wrapper which converts
from and to WeakReference on insert and retrieval.

Thanks for that. I didn’t even know WeakRef existed! That’s going to
open up a wide range of solutions now. Looking at the implementation,
it’s using finalizers, and I had been wondering if finalizers were going
to lead to a potential solution.

The WeakRef interface worries me though. You’d normally expect to see
just one call on such a thing (lock, which turns weak to strong/fail)
and maybe a check call, with a big warning that the result might change
post-check. A delegation-style interface with a check only seems a bit
unusual. However, I may just not have a proper understanding of how it
works yet.

Still- it’s opened up a bunch of possibilities and ideas- lots of things
to look into and explore. Thankyou again Robert for yet another one of
your excellent suggestions.

Cheers,
Garth

Garthy_D · February 18, 2013, 8:26am

Hi all,

To anybody interested, I’ve recently finished implementing a solution to
my original problem, and I thought I’d share the results.

In the end, I basically used two main tools to solve the entire problem:

I created my own C/C+±based data type which did one thing: Held a
Ruby value, and didn’t mark it during mark and sweep. Nothing fancy, and
probably could have been done in pure Ruby by storing/using object_id
and _id2ref. In fact, part of the first-pass solution did exactly that.
Used finalizers each time one of these weak reference objects was
created, that when called, amongst other tasks, wiped the value in the
weak reference.

I then used these two things together as mostly-functional form of weak
reference. The whole problem essentially reduced down to the application
of these two tools in some way. Well, that, and rewriting a whole bunch
of code that made some assumptions that clashed with how it worked.

Cheers,
Garth

On 26/01/13 16:43, Garthy D wrote:

Hi.

I recall a mention a bit back about wanting to see more interesting
problems on ruby-talk. Let’s see if I can help out.

I have a system in which a Ruby script has a number of remote object
references (to a script in another process), identified by an object
with a unique string ID, which is used to make remote calls on or
using
these objects. The system is all in place and is working fine- it’s
actually multi-way and each connection is bidirectional, but that is
unimportant for now.

I would like the script to be able to effectively ignore whether
something is remote or not, which means I’d like it to be able to
pass
around these remote objects and be able to assume that once an object
is
no longer used, it’ll just drop out of use and be garbage collected
in
time, like everything else.

One problem: The store keeps a reference to the object in its Hash,
for
use in lookups and incoming calls. Because of this, it’ll never be
collected. I’d like to somehow have it be collected if it’s just down
to
one “reference”, and the information available to the store, so it
knows
it can lose- or has just lost- the object.

The problem is essentially similar to the idea of weak pointers in a
reference-counted system- something by which you can hold a weak
reference that can detect, but doesn’t prevent, cleanup. If this was
a
C++ problem, I’d be using Boost smart pointers, the store would use a
weak_ptr, and everything else would use a shared_ptr.

However, this doesn’t really make sense in a mark-and-sweep context,
nor
does the idea of the object being “partially freeable” make sense,
ie.
aware that it is just about to vanish or can be made free.

Any ideas as to a good way or suitable mechanisms to implement this?
Is
there some way to look at an object and be able to detect that this
is
the last reference that exists to the object in a script?

NB: I already have one idea- the Ruby script is embedded, so I might
be
able to do something clever on the C/C++ side with Data_Wrap_Struct
and
family, so that when a certain object is freed, I change something on
the C side, and the objects are stored in a container that doesn’t
pass
on any of the mark calls. I’m not sure of the specifics yet, I’m sure
it’s quite possible, but I thought I’d just dig around for
suggestions
beforehand.

Cheers,
Garth