C extension: How to check if a VALUE is still alive (not being GC'ed)?

dubstep · February 6, 2011, 12:03am

Hi, I’m coding an async DNS resolver for EventMachine based on udns (a
C library).

In the Ruby C extension, when submitting a DNS query I pass the
current object (VALUE pointer) as an argument of the DNS query C
function.
Later on, when the DNS server replies, EventMachine (which watches on
the UDP socket of udns) invokes the C callback function belonging to
such query, and that callback includes the given VALUE as one of its
arguments. So I invoke a Ruby method for such VALUE (rb_funcall).

The problem is that the VALUE could be GC’ed (or being marked for so)
if the programmer assigns it to “nil” before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC’ed neither marked for being GC’ed). How can I
inspect that? (I mean in C).

Thanks a lot.

SSISSaki_Baz_C.SS_S · February 6, 2011, 12:35am

On Sun, 6 Feb 2011 08:02:39 +0900
Iñaki Baz C. [email protected] wrote:

The problem is that the VALUE could be GC’ed (or being marked for so)
if the programmer assigns it to “nil” before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC’ed neither marked for being GC’ed). How can I
inspect that? (I mean in C).

The easiest way would be to assign this pointer to the resolver object
as an instance variable (with rb_iv_set), or perhaps store it inside a
hash which maps request ID’s to pointers, if you want to be able to use
the same resolver for several simultaneous callbacks.

SSISSaki_Baz_C.SS_S · February 6, 2011, 8:02pm

On Sat, Feb 5, 2011 at 4:02 PM, Iaki Baz C. [email protected] wrote:

The problem is that the VALUE could be GC’ed (or being marked for so)
if the programmer assigns it to “nil” before the callback is executed,
so we get a coredump.

So when the callback is executed I need to check that the given VALUE
is still alive (not GC’ed neither marked for being GC’ed). How can I
inspect that? (I mean in C).

How is rb_gc_mark() insufficient here? Marking prevents objects from
being
GCed… you seem to be suggesting it’s the other way around.

If you mark all VALUEs your C extension is internally referencing in
your C
extension’s “mark” function it will prevent them from being garbage
collected.

As Peter said though, if you use rb_iv_set, you get this behavior for
free
as Ruby automatically marks instance variables for you.

SSISSaki_Baz_C.SS_S · February 6, 2011, 10:55am

2011/2/6 Peter Z. [email protected]:

The easiest way would be to assign this pointer to the resolver object
as an instance variable (with rb_iv_set), or perhaps store it inside a
hash which maps request ID’s to pointers, if you want to be able to use
the same resolver for several simultaneous callbacks.

Yes, in fact I must do in order to avoid the Query object being GC’ed.
So yes, better if I handle that hash internally within the resolver
rather than letting the user accessing the hash.

Thanks.

SSISSaki_Baz_C.SS_S · February 6, 2011, 10:17pm

2011/2/6 Tony A. [email protected]:

GCed… you seem to be suggesting it’s the other way around.
Sorry, I was wrong about the meaning of “marking an object”.

If you mark all VALUEs your C extension is internally referencing in your C
extension’s “mark” function it will prevent them from being garbage
collected.

But this is not my case. My case is as follows (an example code):

domain = ARGV[0]

module UdnsWatcher
def initialize(resolver)
@resolver = resolver
end

def notify_readable
@resolver.ioevent
end
end

EM.run do

resolver = EM::Udns::Resolver.new

EM.watch resolver.fd, UdnsWatcher, resolver do |conn|
conn.notify_readable = true
end

query = EM::Udns::Query.new
query.submit_A resolver, domain
query.callback do |result|
puts “DEFERRABLE CALLBACK: result = #{result.inspect}”
end

end

“Resolver” class is my C extension which wraps a C struct
‘dns_context’, no more Ruby objects for now.
“Query” class is pure Ruby and includes EM::Deferrable (it has nothing
else).
“query.submit_A resolver, domain” invokes a function of udns C
library:

VALUE Query_submit_query_A(VALUE self, VALUE context, VALUE str)
{
struct dns_ctx *dns_context = NULL;
char *domain;
```
 Data_Get_Struct(context, struct dns_ctx, dns_context);
 domain = StringValueCStr(str);

 dns_submit_a4(dns_context, domain, 0, dns_res_A_cb, (void*)self);
 [...]
```
}

As you can see “self” is passed as argument to dns_submit_a4(). This
is because when the DNS response arrives, a callback “dns_res_A_cb()”
function will be called, and that function contains as argument the
same (void*)self so I can know which Query instance the response
belongs to, and can invoke “set_deferred_status” by using
rb_funcall().

But in my above Ruby code I don’t store “query” in a hash or array, so
it could be GC’ed before the DNS response arrives, so when the
callback is called I’d get a coredump. I don’t want to store “query”
in a Hash or Array since it requires inserting and deleting it (so
wasted time), I just want “query” not to be GC’ed until udns callback
function is executed.

So, if I include rb_gc_mark(self) in Query_submit_query_A() function,
would it prevent “query” from being GC’ed?
But in this case, how to unmark it so it can be GC’ed after query
completes? wouldn’t leak if not? (note that Resolver instance lives
forever.

Thanks a lot.

SSISSaki_Baz_C.SS_S · February 6, 2011, 11:54pm

2011/2/6 Iñaki Baz C. [email protected]:

But in my above Ruby code I don’t store “query” in a hash or array, so
it could be GC’ed before the DNS response arrives, so when the
callback is called I’d get a coredump. I don’t want to store “query”
in a Hash or Array since it requires inserting and deleting it (so
wasted time), I just want “query” not to be GC’ed until udns callback
function is executed.

After rechecking it I strongly think I must store the object “query”
in a Hash. If not, the object clearly “dissapears” and could be
legitimately GC’ed at any time.

SSISSaki_Baz_C.SS_S · February 8, 2011, 3:32am

On Sun, Feb 6, 2011 at 3:53 PM, Iaki Baz C. [email protected] wrote:

After rechecking it I strongly think I must store the object “query”
in a Hash. If not, the object clearly “dissapears” and could be
legitimately GC’ed at any time.

I’d think the resolver object would hold on to all active queries until
completed.

SSISSaki_Baz_C.SS_S · February 8, 2011, 11:45pm

On Feb 8, 2011, at 12:53 AM, Iaki Baz C. wrote:

2011/2/8 Tony A. [email protected]:

After rechecking it I strongly think I must store the object “query”
in a Hash. If not, the object clearly “dissapears” and could be
legitimately GC’ed at any time.

I’d think the resolver object would hold on to all active queries until
completed.

Yes, that’s work: the resolver object contains a hash attribute in
which queries are stored until completed.

Use a ruby hash and supply mark and GC callbacks to Data_Wrap_Struct so
the GC will keep track of it.

SSISSaki_Baz_C.SS_S · February 9, 2011, 12:28am

2011/2/8 Eric H. [email protected]:

Yes, that’s work: the resolver object contains a hash attribute in
which queries are stored until completed.

Use a ruby hash and supply mark and GC callbacks to Data_Wrap_Struct so the GC
will keep track of it.

Thanks, but I don’t understand why I must use mark callback:

My class uses Data_Wrap_Struct containing a xxx_free function to
free the C structure when the object is GC’ed.
An instance of my class contains an @hash attribute (which is set
empty in initialize method).
Such @hash is populted with some normal Ruby objects during runtime.
Nothing special here.
So when my instance is GC’ed, at some point Ruby will GC the @hash
attribute and also the objects it contains.

Then… why do I need a mark callback? Maybe I miss something, however
Itested my code under high load and doesn’t seem to leak.
Thanks a lot.

SSISSaki_Baz_C.SS_S · February 8, 2011, 9:54am

2011/2/8 Tony A. [email protected]:

After rechecking it I strongly think I must store the object “query”
in a Hash. If not, the object clearly “dissapears” and could be
legitimately GC’ed at any time.

I’d think the resolver object would hold on to all active queries until
completed.

Yes, that’s work: the resolver object contains a hash attribute in
which queries are stored until completed.

SSISSaki_Baz_C.SS_S · February 9, 2011, 1:30am

2011/2/9 Eric H. [email protected]:

Then… why do I need a mark callback? Maybe I miss something, however
Itested my code under high load and doesn’t seem to leak.
Thanks a lot.

Since you use the instance variable you do not need the mark callback. The mark
callback is only required if you are storing ruby objects in a structure that
hides them from ruby.

Ok. So if for example my class stores some VALUE objects in a pure C
array, and these objects are not referenced at Ruby level, then I must
mark them. If not, Ruby GC could remove them from memory at any time,
am I right?

Thanks a lot.

SSISSaki_Baz_C.SS_S · February 9, 2011, 1:22am

On Feb 8, 2011, at 3:18 PM, Iaki Baz C. wrote:

Itested my code under high load and doesn’t seem to leak.
Thanks a lot.

Since you use the instance variable you do not need the mark callback.
The mark callback is only required if you are storing ruby objects in a
structure that hides them from ruby.

SSISSaki_Baz_C.SS_S · February 9, 2011, 1:59am

On Feb 8, 2011, at 4:29 PM, Iaki Baz C. wrote:

am I right?

Thanks a lot.

Correct.