This question is no doubt a function of my own lack of understanding,
but I think that asking it will at least help some other folks see
what's going on with the internals during garbage collection.
It's best summarized by example code and summary of my understanding of
the resulting output so far as marking an object is concerned.
The question in short: when an object goes out of scope and has no
references that were left to it, how does it get collected? Conceptually
this seems easy - the GC walks the heap to see if the contents are still
valid pointers to the heap. Once the pointer is invalid, it makes total
sense to me how the rest proceeds. But how does the pointer to the heap
ever become invalid?
Hopefully my example and summary will clarify:
require 'pp'
class Hash::Weak < Hash
def []( key )
# get the stored ID - a FixNum, not an object reference to our
weak-referenced object
obj_id = super( key.to_sym )
# theoretically this should cause non-referenced objects to get
cleaned up
# so long as nothing looks like a pointer or reference to it
ObjectSpace.garbage_collect
# now get our object from ID
# if it had no references it should have been GC'd and we should get
an
# rb_eRangeError "is not id value" (expected) or "is recycled
object" (possible)
obj = ObjectSpace._id2ref( obj_id )
return obj
end
def []=( key, object )
# FixNum have a constant ID for value, so can't be copied and can't
be garbage collected
# so object.__id__ cannot be a reference to a child of object and
therefore cannot prevent
# garbage collection on the object
super( key.to_sym, object.__id__ )
end
end
##################################################
weak_hash = Hash::Weak.new
class TestClass
end
test_object = TestClass.new
puts 'storing test object'
weak_hash[ :key ] = test_object
puts 'hash now contains object id: ' + weak_hash.pretty_inspect
print 'retrieving stored test object from hash (should work/be non-nil):
'
valid_key = weak_hash[ :key ]
pp valid_key
class AnotherClass
end
puts 'setting variable referring to test object (ID: ' +
test_object.__id__.to_s + ') to nil'
test_object = nil
puts 'ID for variable referring to test object is now: ' +
test_object.__id__.to_s
print 'getting test object (should fail with rb_eRangeError): '
invalid_key = weak_hash[ :key ]
pp invalid_key
# error - returns valid key
##################################################
# object is created, given an id
# variable is assigned to id
# variable is changed to new object (including nil)
# variable gets the id of new object
# previous reference made by variable remains in object space (no valid
references)
# gc starts
# rb_gc_mark calls gc_mark, marks VM instance
(#<RubyVM:0x000001008700b8>)
# gc_mark calls gc_mark_children, marks all children of VM
# first, the VM class (RubyVM), and then its children
# then its class instance (#<Class:RubyVM>), and then its children
# then its class instance (#<Class:#<Class:RubyVM>>) and its
children
# then its class instance (#<Class:#<Class:Class>>) and its
children
# then gc_mark_children calls mark_tbl to mark its table (the
class table)
# mark_tbl marks all children of the class table, starting
with Class
# Class marks its children, first of all #<Class:Module>
# #<Class:Module> marks its children, first of all
#<Class:#<Class:Module>>
# #<Class:#<Class:Module>> marks its children, which
includes a table of classes
# mark_tbl marks each of classes, which first includes
#<Class:Object>
# #<Class:Object> has a table of entries that it
marks, first of all Object
# Object has a table that it marks, first of all
its binding context (presumably main first?) #<Binding:0x00000100870068>
# #<Binding:0x00000100870068> marks its
children, which calls binding_mark, which calls rb_gc_mark, which calls
gc_mark on the Ruby environment: #<RubyVM::Env:0x00000100854bd8>
# #<RubyVM::Env:0x00000100854bd8> marks its
children which calls env_mark
# env_mark calls rb_gc_mark_locations on the
range covered by the environment's declared memory space, which calls
gc_mark_locations
# gc_mark_locations calls
mark_locations_array on the space marked by the start and length of
environment
# mark_locations_array looks at the
environment as an array of long, and calls is_pointer_to_heap on each
one
# if (long)slice is the address of a
valid pointer on the heap, returns TRUE, which causes gc_mark to be
called on the object
#***** object, defined by ID, matches
with (long)slice because it has not yet been collected; it is therefore
marked as still existing because it has a valid pointer
# => if this were true, no object would
ever be garbage collected; so how is any object ever garbage collected?
Any help understanding what's going on is much appreciated.
Thanks,
Asher
on 2010-08-25 05:42
on 2010-08-26 18:35
> The question in short: when an object goes out of scope and has no > references that were left to it, how does it get collected? Conceptually > this seems easy - the GC walks the heap to see if the contents are still > valid pointers to the heap. Once the pointer is invalid, it makes total > sense to me how the rest proceeds. But how does the pointer to the heap ever > become invalid? It walks the stack, and (for sake of ease of understanding) marks all pointers still on the stack as "live" then it marks all of their children as "live" then all of their grandchildren, etc. Then it traverses the entire heap, looking for objects that haven't been marked as live, and "frees" them. NB that these two stages are separate, so they don't conflict. HTH. -r
on 2010-08-26 18:51
Right - so how does a pointer ever get off the stack? For instance, in my example, where the variable with reference to the object has been assigned nil - the same thing occurs if the variable goes out of scope. So in both of those cases, the object "should" be garbage collected; I understand that it's possible, due to conservative GC, that it might mistake a number on the stack (a long), etc. as a valid pointer, but generally when GC runs it should decide that the var (which has no valid ruby references) is no longer live and should be GC'd. Or am I missing something? So we have a var with no references in Ruby that is being marked as live by the GC because the pointer has not yet been deallocated. So how does it ever get deallocated in order to not be marked as live? If what I am seeing is the case (and I assume it cannot be and that I am missing something) then the object would never be garbage collected. So how does GC actually occur? What causes the pointer to be deallocated? Asher
on 2010-08-26 20:36
> Right - so how does a pointer ever get off the stack? > > For instance, in my example, where the variable with reference to the object has been assigned nil - the same thing occurs if the variable goes out of scope. > > So in both of those cases, the object "should" be garbage collected; I understand that it's possible, due to conservative GC, that it might mistake a number on the stack (a long), etc. as a valid pointer, but generally when GC runs it should decide that the var (which has no valid ruby references) is no longer live and should be GC'd. Or am I missing something? That's right. > So we have a var with no references in Ruby that is being marked as live by the GC because the pointer has not yet been deallocated. So how does it ever get deallocated in order to not be marked as live? Presumably it is "not being collected" because of a false positive on the stack. So if you go "up and down" long enough on the stack, it will overwrite the false positive eventually (it's hoped), and thus clear the false positive.
on 2010-08-26 21:35
On Thu, Aug 26, 2010 at 2:36 PM, Roger Pack <rogerdpack2@gmail.com> wrote: > Presumably it is "not being collected" because of a false positive on the stack. > So if you go "up and down" long enough on the stack, it will overwrite > the false positive eventually (it's hoped), and thus clear the false > positive. https://sites.google.com/site/brentsrubypatches/ MBARI3.patch: Ruby's conservative garbage collector cannot tell whether machine words on the 'C' stack are object pointers or integers, etc. because there is no type information associated with them. A conservative collector works by "conserving" every object to which there could possibly be a reference. In the 1.8 and 1.6 series Ruby implementations, this means scanning the stack of each Thread and Continuation assuming that every word is an object pointer if it has a value could be so interpreted. In practice, this is not as bad is it may seem, as Ruby's collector does not consider pointers "inside" an object to be valid -- only those that point to its exact base address. So, even assuming thousands of live objects, a 32-bit address space will remain very sparsely populated with valid object pointers. The garbage collector's leaking memory is not really its own fault. The trouble is that the 'C' machine stack is filled with object references. The main reason for this is that gcc compilers create overly large stack frames and do not initialize many values in them. Certain 'C' constructs used in the Ruby interpreter's core recursive expression evaluator generate especially large, sparse stack frames. The function rb_eval() is the worst offender, creating kilobyte sized stack frames for each invocation of a function that may call itself hundreds of times. This results in stacks that are hundreds of kilobytes, often full of old, dead object references that may never go away. If there were a gcc compiler option to initialize all local variables to zero whever a new stack frame is built, that would let the collector do its work properly, but no such option exists.
on 2010-08-27 04:43
On 8/26/10 11:51 AM, Asher wrote: > Right - so how does a pointer ever get off the stack? > When a C function returns, the C stack pointer register (usually called "SP") is reset to the frame pointer (sometimes this register is called "FP"). The FP points to the current function arguments. The area between the SP and the FP +- the space for arguments (and the other machine registers) represent the local variables, temporaries and arguments of the current function call (sometimes called an "activation record"). Load any C program under a debugger and you can see the assembly code. The MRI GC knows where "top" (SP) and the bottom of the stack is because of mostly portable conventions on how C compilers generate code that manipulate SP and FP and how the operating system lays out the process' memory. The stack, the machine registers and some global variables are part of what is sometimes called the "root set". The MRI GC scans the root set for values that "look like they point to Ruby objects" and "marks" those objects recursively as "in use". Any unmarked objects ("not in use") are definitely not referenced by anything else and can be deallocated ("sweeped"). The GC must "stop-the-world" while it does this "marking" and "sweeping" -- nothing else can happen till this finishes. If the GC couldn't sweep anything, it allocates more memory from the OS (by calling malloc(), which calls something at a much lower level (sbrk() or mmap() or something else). > For instance, in my example, where the variable with reference to the object has been assigned nil - the same thing occurs if the variable goes out of scope. > > So in both of those cases, the object "should" be garbage collected; I understand that it's possible, due to conservative GC, that it might mistake a number on the stack (a long), etc. as a valid pointer, but generally when GC runs it should decide that the var (which has no valid ruby references) is no longer live and should be GC'd. Or am I missing something? > > So we have a var with no references in Ruby that is being marked as live by the GC because the pointer has not yet been deallocated. So how does it ever get deallocated in order to not be marked as live? > > If what I am seeing is the case (and I assume it cannot be and that I am missing something) then the object would never be garbage collected. > > So how does GC actually occur? Collection occurs in MRI when a new object is needed and there are no unused objects left around and/or there was a certain number of allocations since the last GC. > What causes the pointer to be deallocated? > "Pointers" are never allocated or deallocated as in malloc()/free(). Only objects that have no references to them are deallocated. The C compiler generates code that simply increments or decrements the SP or changes the FP -- Stacks are FIFOs. The MRI GC is a very simple "stop-the-world", "mark-and-sweep" "conservative" collector. Conservative meaning "treat anything that looks like a pointer to an object as a pointer to an object". This can cause conservative collectors to keep some objects around longer than they should. This is also be cause most C compilers leave garbage (old pointers) on the stack. The Rubinius GC is different. The MRI Enterprise Edition uses additional techniques on top of the standard MRI GC to improve performance in web servers and long-running processes. >> been marked as live, and "frees" them. >> >> NB that these two stages are separate, so they don't conflict. >> HTH. >> -r > > Yea, what Roger said. :) More here: http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29 A widely-ported and long-used GC can be downloaded here: http://www.hpl.hp.com/personal/Hans_Boehm/gc/ ACM has a long rich history of GC research -- there's even a yearly symposium on the subject. Appel's book is a great reference. This book (sadly out of print) is a good introduction: http://www.amazon.com/Topics-Advanced-Language-Implementation-Peter/dp/0262121514 There are far more complex GC algorithms that perform better in most cases. Conservative mark-and-sweep collectors are far easier to interface with C code than other approaches -- most others require considerable cooperation between the code and the collector. HTH^2, -- KAS
on 2010-08-27 17:23
I very much appreciate the response, and this is helpful in describing the narrative, but it's still a few steps behind my question - but it may very well have clarified some points that help us get there. Let's stick with the example: a local variable is set as a reference to an object, the local variable is then set to nil so there is no longer a live reference to the object. No other ruby space commands have gone on, so unless Ruby is keeping junk behind the scenes, there should be no references - not on the Ruby stack, not on the C stack. How does this object get collected? As shown by the example, it is missed during the next attempt to GC (as well as any repeated attempts at this point). So what will change to make that object collectable? Are you suggesting that because it is in Ruby's root node that it gets treated such that it won't be GC'd until the program terminates? Let's assume this is the case. I should therefore be able to write a script that creates a non-root object as a child to another object inside method scope, allow that method to go out of scope, and expect that the object will be GC'd (as it is neither a root node nor does it have any live references). This seems to be validated by the following Ruby code: > # so long as nothing looks like a pointer or reference to it > > > $weak_hash[ :key ] = child_test_object > puts 'hash now contains object id: ' + $weak_hash.pretty_inspect > end > end > test_object = TestClass.new > > test_object.test_method > puts 'id in hash should no longer be valid, as it is out of scope: ' > invalid_key = $weak_hash[ :key ] > pp invalid_key Output: > storing test object > hash now contains object id: {:key=>2160173880} > id in hash should no longer be valid, as it is out of scope: > /Users/ahaig/Projects/rp/ruby/weakhash/projects/RPWeakHash/weakhash.rb:21:in `_id2ref': 0x00000080c1a338 is recycled object (RangeError) > from /Users/ahaig/Projects/rp/ruby/weakhash/projects/RPWeakHash/weakhash.rb:21:in `[]' > from /Users/ahaig/Projects/rp/ruby/weakhash/projects/RPWeakHash/weakhash.rb:56:in `<main>' So that works as expected (so long as GC is run manually; if not run, object is obviously still valid). So let's try the same thing in this example that we did with the root node example: > $weak_hash[ :key ] = child_test_object > end > test_object = TestClass.new > > test_object.test_method > puts 'id in hash should no longer be valid, as it is out of scope: ' > invalid_key = $weak_hash[ :key ] > pp invalid_key Output: > > from /Users/ahaig/Projects/rp/ruby/weakhash/projects/RPWeakHash/weakhash.rb:62:in `<main>' So that also works as expected. It seems the only time that it does not work as expected, then, is when the object was instantiated with a reference in the root node. Of course, that is one of the most likely places for people to instantiate a reference. So can anything be done about this? Are we simply doomed to wait until program termination for any objects allocated in the root node to disappear? I intend to look into the patch suggested by brabuhr@gmail.com (https://sites.google.com/site/brentsrubypatches/), which (so far as this issue is concerned) appears to amount to: > VALUE *rb_gc_stack_end = (VALUE *)STACK_GROW_DIRECTION; > #define rb_gc_wipe_stack() { \ > VALUE *sp = alloca(0); \ > VALUE *end = rb_gc_stack_end; \ > rb_gc_stack_end = sp; \ > __stack_zero(end, sp); \ And some other basic support. I will follow up on that once I have some time to experiment (particularly sense the patch is intended for 1.8.7 not 1.9.2). Any particular thoughts on this approach? Presumably there is some reason it has not been patched to do so? In any case, any thoughts on any of this will be much appreciated. Asher On Aug 26, 2010, at 10:43 PM, Kurt Stephens wrote: > > Collection occurs in MRI when a new object is needed and there are no unused objects left around and/or there was a certain number of allocations since the last GC. > >> What causes the pointer to be deallocated? >> > "Pointers" are never allocated or deallocated as in malloc()/free(). Only objects that have no references to them are deallocated. > > The C compiler generates code that simply increments or decrements the SP or changes the FP -- Stacks are FIFOs. > > The MRI GC is a very simple "stop-the-world", "mark-and-sweep" "conservative" collector. Conservative meaning "treat anything that looks like a pointer to an object as a pointer to an object". This can cause conservative collectors to keep some objects around longer than they should. This is also be cause most C compilers leave garbage (old pointers) on the stack. <snip>
on 2010-08-27 18:16
You have introduced something called a "root node" without defining it. What do you mean by this? I'm assuming here you mean that in your case, if you allocate the object in the script body, then set the local to nil, you can observe that the object appears to not be collected. As has been stated in the thread already, this is an artifact of the conservative GC. Even though you have set the local to nil, a reference to the object may still remain on the C stack. That reference can't be seen by ruby code because it is in stack memory that gcc setup and didn't clear when the value wasn't needed anymore. This is unfortunate, but not the end of the world. It doesn't happen with every object allocated in a script body, only sometimes. The patch set you were pointed to goes to lengths to clear the stack space as much as it can so that there are none of these phantom references to confuse the GC. It does this by breaking up the main eval function into smaller functions (allowing stack space to be allocated and deallocated within the eval itself) and forcibly clearing the stack with memset. I hope this clears it up. - Evan On Aug 27, 2010, at 8:22 AM, Asher wrote: >> require 'pp' >> ObjectSpace.garbage_collect >> def []=( key, object ) >> ################################################## >> puts 'hash now contains object id: ' + $weak_hash.pretty_inspect > > >> $weak_hash[ :key ] = child_test_object >> end >> hash now contains object id: {:key=>2160328280} > >> VALUE *sp = alloca(0); \ >> VALUE *end = rb_gc_stack_end; \ >> rb_gc_stack_end = sp; \ >> __stack_zero(end, sp); \ > > And some other basic support. I will follow up on that once I have some time to experiment (particularly sense the patch is intended for 1.8.7 not 1.9.2). Any particular thoughts on this approach? Presumably there is some reason it has not been patched to do so? > > In any case, any thoughts on any of this will be much appreciated. > > Asher <snip>
on 2010-08-27 18:27
On Aug 27, 2010, at 11:22 AM, Asher wrote: > I intend to look into the patch suggested by brabuhr@gmail.com (https://sites.google.com/site/brentsrubypatches/), which (so far as this issue is concerned) appears to amount to: > >> VALUE *rb_gc_stack_end = (VALUE *)STACK_GROW_DIRECTION; >> #define rb_gc_wipe_stack() { \ >> VALUE *sp = alloca(0); \ >> VALUE *end = rb_gc_stack_end; \ >> rb_gc_stack_end = sp; \ >> __stack_zero(end, sp); \ > > And some other basic support. I will follow up on that once I have some time to experiment (particularly sense the patch is intended for 1.8.7 not 1.9.2). Any particular thoughts on this approach? Presumably there is some reason it has not been patched to do so? So as I understand it the problem is: The basic Ruby stack looks like: Ruby stack, root node FP* ruby root node locals => st_ivar_tbl ruby root node stack SP* (ruby stack frame 1 after the activation record) So when a function call is made the stack grows to look like: ruby root node locals => st_ivar_tbl ruby root node stack SP* (ruby stack frame 1 after the activation record) ruby root first child node locals => st_ivar_tbl ruby root first child node CP* So when the first child node finishes the CP* moves back to the SP and st_ivar_tbl is no longer part of the stack, which is why nested local variables get GC'd as expected. But when the local variable in the root node is set to nil, the local var data for object ID in st_ivar_tbl is set to 4 instead of object ID. This leaves a valid pointer object ID with no references. But "where" is this object ID pointer if its reference in the st_ivar_tbl is now replaced with Qnil? I presume the explanation for this is that the object actually leaves on the heap in ObjectSpace rather than in local variable space, which means that the object is allocated and a reference is given to st_ivar_table, so when st_ivar_table's reference is gone there is still a valid reference in ObjectSpace (the heap). So it seems that the root node's object is remaining around even though there are no references because its frame has not been cleared. Is this understanding correct? So if the reference to the object is always in the heap, how does the heap's pointer become invalidated when st_ivar_tbl is cleared, as in the examples where it works "as expected"? Perhaps there is something fundamental about local variable I am missing in my description here? I am trying to work through these things, so help is appreciated. Thanks for patience, Asher
on 2010-08-27 18:33
On Aug 27, 2010, at 12:09 PM, Evan Phoenix wrote: > You have introduced something called a "root node" without defining it. What do you mean by this? The first node that runs when you run a script (ie. call ruby_run_node ), which also defines the set of root references. > I'm assuming here you mean that in your case, if you allocate the object in the script body, then set the local to nil, you can observe that the object appears to not be collected. What you can see with my examples, though, is that it does happen with _all_ objects allocated on the root node > As has been stated in the thread already, this is an artifact of the conservative GC. Even though you have set the local to nil, a reference to the object may still remain on the C stack. That reference can't be seen by ruby code because it is in stack memory that gcc setup and didn't clear when the value wasn't needed anymore. Right - I understand this conceptually. I want to know "where" on the C stack this "might" remain. It shouldn't be an obtuse question - Ruby is allocating each and every object, and I'm not using any C pointers for the particular example, so there is nothing else in my C stack (in this case, "I" don't have a C stack, only Ruby does). So Ruby is holding a reference somewhere in its stack, possibly because of > This is unfortunate, but not the end of the world. In my particular use case (not the example), it is the end of the world and requires re-designing the entire way I'm handling T_DATA, such that I pass back a new T_DATA every time an existing underlying C object is requested. I want to store the first T_DATA created for this object in a weak hash and pass it back as requested - allowing it to be collected as appropriate. This seems to work in all contexts but the root node, where the result is that one expects to get a GC'd object (which can thus be caught and returned as nil) but ends up with a valid obj (which shouldn't be valid). The result is that one can ask for an object that doesn't exist, and instead of being told that it doesn't exist get back an old object that wasn't what one wanted (one wanted to know that it did not exist in this context, not get whatever random last object was created in the slot). This example also, I believe, makes it evident that "root node" is not necessarily the actual root but can also be any root relative to execution context. In other words, a variable _will not_ be GC'd until one has left the frame in which it was defined, even if all references are set nil. Example: it "can be created with a name string and home directory string" do @environment = RPDB::Environment.new( $environment_name.to_s, $environment_path ) @environment.should_not == nil @environment.is_a?( RPDB::Environment ).should == true @environment.directory.should == $environment_path end it "can be created with a name symbol" do environment = RPDB.environment_with_name( $environment_name ) environment.should == nil @environment = RPDB::Environment.new( $environment_name ) @environment.should_not == nil @environment.is_a?( RPDB::Environment ).should == true @environment.directory.should == './' end The last line of the second example does not end up with the default path ('./') because an existing reference is found when it should not be. It seems, thus, that writing a weak hash is impossible given the current state of GC. This seems rather problematic. > It doesn't happen with every object allocated in a script body, only sometimes. No, it happens _every_ time. See examples. > The patch set you were pointed to goes to lengths to clear the stack space as much as it can so that there are none of these phantom references to confuse the GC. It does this by breaking up the main eval function into smaller functions (allowing stack space to be allocated and deallocated within the eval itself) and forcibly clearing the stack with memset. Right... and I was trying to look where that would be appropriately integrated into 1.9.2, but my attempts have not been successful. I believe that this is an indication that that is not the problem in question here- that the problem has to do with the clearing of the present stack, rather than the clearing of stack frames that have been passed. In other words, the patch clears old stack frames, but the problem here is that we have data remaining in the present stack frame that is not expected to still exist. This is obviously a function of the GC's conservative nature, but I am trying to figure out what my best option is for circumventing the unexpected behavior. Additionally, On Aug 27, 2010, at 12:13 PM, Roger Pack wrote: > Unfortunately you'll have to assume that there is still some "bad ref" > around to it. > One trick is to try and nest whatever you "violently" need to be > collected deep in some sub routine, then call GC.start *after* > recursing back up from that sub routine. It does seem to be the answer that things are leaning toward, but I want to at least understand at a lower level precisely what is occurring to prevent this specific collection. It seems (based on my description of when it occurs) to be systemic rather than sporadic, so it should be possible to at least narrow it down to a specific place in code where a reference is being left, even if it is not so easy to adapt that code to do otherwise. Best, Asher
on 2010-08-27 19:44
On Fri, Aug 27, 2010 at 12:33 PM, Asher <asher@ridiculouspower.com> wrote: > On Aug 27, 2010, at 12:09 PM, Evan Phoenix wrote: >> It doesn't happen with every object allocated in a script body, only sometimes. > > No, it happens _every_ time. See examples. Modified original program: ################################################## weak_hash = Hash::Weak.new class TestClass end 1_000.times do |n| test_object = TestClass.new weak_hash[ n ] = test_object valid_key = weak_hash[ n ] p valid_key end class AnotherClass end test_object = nil 1_000.times do |n| print "getting test object (#{n}) (should fail with rb_eRangeError): " invalid_key = weak_hash[ n ] p invalid_key # error - returns valid key end ################################################## Output: $ ruby -v gc.rb ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux] #<TestClass:0xb73f6d64> #<TestClass:0xb73f6d3c> #<TestClass:0xb73f7e80> [...] #<TestClass:0xb73f7e30> #<TestClass:0xb73f7e08> #<TestClass:0xb73f7e80> getting test object (0) (should fail with rb_eRangeError): #<TestClass:0xb73f7e80> getting test object (1) (should fail with rb_eRangeError): #<TestClass:0xb73f7e80> getting test object (2) (should fail with rb_eRangeError): #<TestClass:0xb73f7e80> [...] getting test object (31) (should fail with rb_eRangeError): #<TestClass:0xb73f7e80> getting test object (32) (should fail with rb_eRangeError): #<TestClass:0xb73f7e80> gc.rb:15:in `_id2ref': 0xdb9fbf04 is recycled object (RangeError) from gc.rb:15:in `[]' from gc.rb:54 from gc.rb:52:in `times' from gc.rb:52 getting test object (33) (should fail with rb_eRangeError):
on 2010-08-27 20:05
Trying your example 1000 times simply loops infinitely on my ruby. Not sure why. Trying it 10.times and then 2.times works - throws expected error on first attempt to retrieve. Trying it 1.times does not - returns uncollected object. Your example of 1000 times causes it to work on the 33rd try. What gives? or is your point that there is no way to predict this? If that's the case, why does it work the first time consistently with my attempts 2.times, 10.times, etc.? Asher
on 2010-08-27 20:05
My knowledge about the insides of 1.9 is less strong than 1.8, so I'm not fully versed in how 1.9 now stores locals. Anyway, one issue with your testing methodology is you don't define when the GC will happen. If you do some work in a method and return from it, even though there are no references to an object, if the GC hasn't run yet, then _id2ref will be able to return it. If you demand that returning from a local scope will cause the object be be treated as garbage, then you con't blindly use _id2ref. Ruby, and just about all GC languages, don't work that way. Your examples do not include calling GC.start to force a GC, thus I wonder if this is the source of your problem. Remember that by default, the GC runs whenever it wants. So you can't depend on it to run an certain times. - Evan
on 2010-08-27 20:08
It takes place in the Hash::Weak code in []:
class Hash::Weak < Hash
def []( key )
# get the stored ID - a FixNum, not an object reference to our
weak-referenced object
obj_id = super( key )
# theoretically this should cause non-referenced objects to get
cleaned up
# so long as nothing looks like a pointer or reference to it
ObjectSpace.garbage_collect
# now get our object from ID
# if it had no references it should have been GC'd and we should get
an
# rb_eRangeError "is not id value" (expected) or "is recycled
object" (possible)
obj = ObjectSpace._id2ref( obj_id )
return obj
end
def []=( key, object )
# FixNum have a constant ID for value, so can't be copied and can't
be garbage collected
# so object.__id__ cannot be a reference to a child of object and
therefore cannot prevent
# garbage collection on the object
super( key, object.__id__ )
end
end
Asher
on 2010-08-27 20:35
On Fri, Aug 27, 2010 at 2:04 PM, Asher <asher@ridiculouspower.com> wrote: > Trying your example 1000 times simply loops infinitely on my ruby. Not sure why. > > Trying it 10.times and then 2.times works - throws expected error on first attempt to retrieve. > > Trying it 1.times does not - returns uncollected object. > > Your example of 1000 times causes it to work on the 33rd try. > > What gives? or is your point that there is no way to predict this? If that's the case, why does it work the first time consistently with my attempts 2.times, 10.times, etc.? Beats me :) ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux] Linux 2.6.32 Ubuntu i686 GNU/Linux I consistently see no RangeError until 34.times then none until 38.times: 34.times RangeError when n = 0 38.times RangeError when n = 0 40.times RangeError when n = 38 43.times RangeError when n = 0 44.times RangeError when n = 42 46.times RangeError when n = 0 47.times RangeError when n = 42 48.times RangeError when n = 0 49.times RangeError when n = 38 50.times RangeError when n = 42 52.times RangeError when n = 38 53.times RangeError when n = 42 55.times RangeError when n = 38 56.times RangeError when n = 42 58.times RangeError when n = 38 59.times RangeError when n = 42 61.times RangeError when n = 0 ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-linux] Linux 2.6.18 i686 i686 i386 GNU/Linux 34.times RangeError when n = 0 35.times RangeError when n = 33 36.times RangeError when n = 33 38.times RangeError when n = 0 39.times RangeError when n = 0 40.times RangeError when n = 33 41.times RangeError when n = 38 43.times RangeError when n = 0 44.times RangeError when n = 0 45.times RangeError when n = 38 46.times RangeError when n = 0 47.times RangeError when n = 33 48.times RangeError when n = 0 49.times RangeError when n = 38 50.times RangeError when n = 47
on 2010-08-30 05:46
On Aug 27, 2010, at 12:33 PM, Asher wrote:
> I want to know "where" on the C stack this "might" remain. It shouldn't be an obtuse question - Ruby is allocating each and every object, and I'm not using any C pointers for the particular example, so there is nothing else in my C stack (in this case, "I" don't have a C stack, only Ruby does).
So my question comes down to:
def random_method
# demo_var is internally mapped as a pointer to the newly created
Object, which is instantiated on the heap.
demo_var = Object.new
# demo_var is internally mapped to 4
demo_var = nil
# GC, in env_mark, walks (among others) space demarcated by
RubyVM::Env, which is defined by its length in objects (VALUE)
ObjectSpace.garbage_collect
end
So the environment's memory space is evaluated as a series of long
values (which were allocated during the compilation of the iseq), each
of which is potentially a pointer pointing to the heap.
So as I understand, before the GC is called here we have 2 NODE_LASGN
nodes. Is this correct?
So the first one allocates Object and assigns the reference to demo_var
in the local var table on the stack.
The second one assigns demo_var in the local var table on the stack to
4.
So where does the GC discover a reference to Object to test in order to
mark? It is clear that if a reference to Object is left (invisibly) on
the stack then it will be marked until the stack gets cleaned up. This
would obviously not take place until the frame is taken off the stack.
But I can't find anywhere that this would make sense. The only place
that I see where a reference occurs that the GC is walking is in the
locals table. But the instruction for NODE_LASGN (setlocal) changes the
pointer value for the local variable reference. So there _shouldn't_, so
far as I can tell, be a reference to Object; yet insofar as Object gets
marked by gc_mark_locations (called by gc_env_mark), it has a reference
still existing.
Can anyone help me find where this reference is occurring? My read of
the code suggests that the GC should get "4" for the slot that would
have been a pointer to Object, yet this isn't what happens.
Insight appreciated.
Asher
Re: Variable Allocation, Variable Reassignment, GC Pointer Testing (Was: Re: Garbage Collection Ques
on 2010-08-30 19:37
>> I want to know "where" on the C stack this "might" remain. It shouldn't be an obtuse question - Ruby is allocating each and every object, and I'm not using any C pointers for the particular example, so there is nothing else in my C stack (in this case, "I" don't have a C stack, only Ruby does). > ObjectSpace.garbage_collect > So where does the GC discover a reference to Object to test in order to mark? It is clear that if a reference to Object is left (invisibly) on the stack then it will be marked until the stack gets cleaned up. This would obviously not take place until the frame is taken off the stack. But I can't find anywhere that this would make sense. The only place that I see where a reference occurs that the GC is walking is in the locals table. But the instruction for NODE_LASGN (setlocal) changes the pointer value for the local variable reference. So there _shouldn't_, so far as I can tell, be a reference to Object; yet insofar as Object gets marked by gc_mark_locations (called by gc_env_mark), it has a reference still existing. > > Can anyone help me find where this reference is occurring? My read of the code suggests that the GC should get "4" for the slot that would have been a pointer to Object, yet this isn't what happens. http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/ might help. Besides that just stepping through using GCC might help you. NB that the GC marks both references from the stack and "global rooted" objects, like code segments which might be used later. GL. -r
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.