JRuby disabling ObjectSpace: what implications?

headius · October 28, 2007, 7:54am

As some of you may have heard, we’re considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we have to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists. Here’s some example performance,
from a fractal benchmark in the JRuby source:

With ObjectSpace: Ruby Elapsed 45.967000
Without ObjectSpace: Ruby Elapsed 4.280000

What’s most frustrating about this is that almost no libraries or apps
use each_object, and it’s a terrible performance hit for us.

The one really visible use of each_object is in test/unit, where the
default console-based runner does each_object(Class) to find all
subclasses of TestCase. Because this is a heavily-used library (to say
the least), I’ve made modifications to JRuby to always support
each_object(Class) by maintaining a bidirectional graph of parent and
child classes. So that much wouldn’t go away (but I’d prefer an
implementation that uses Class#inherited, since it would be cleaner,
faster, and deterministic).

So…I’m writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).

I think more and more of you may want to give JRuby another look over
the next few months, so I think we need to involve you in such
decisions.

Charlie

headius · October 28, 2007, 7:59am

From: “Charles Oliver N.” [email protected]

As some of you may have heard, we’re considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we have to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists.

Is this also true for ObjectSpace#_id2ref ?

Regards,

Bill

headius · October 28, 2007, 8:17am

Bill K. wrote:

From: “Charles Oliver N.” [email protected]

As some of you may have heard, we’re considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we have to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists.

Is this also true for ObjectSpace#_id2ref ?

Not directly. _id2ref is handled in a similar way, but we have an event
we can trigger off to start tracking an object; namely, Object#id.

When you request an id, we start tracking that object for purposes of
_id2ref. Not until. So that would not be affected by disabling
ObjectSpace.

In actually, however, _id2ref is primarily used for things like weak
references, so you can hold a virtual reference to an object without
preventing it from being collected. We could provide an implementation
of Ruby’s weak references using Java’s weak references that would allow
us to escape _id2ref entirely for that use case.

Are there other places _id2ref is used?

Charlie

headius · October 28, 2007, 8:59am

From: “Charles Oliver N.” [email protected]

Bill K. wrote:

Is this also true for ObjectSpace#_id2ref ?

Not directly. _id2ref is handled in a similar way, but we have an event
we can trigger off to start tracking an object; namely, Object#id.

When you request an id, we start tracking that object for purposes of
_id2ref. Not until. So that would not be affected by disabling ObjectSpace.

I see, thanks. Nifty.

In actually, however, _id2ref is primarily used for things like weak
references, so you can hold a virtual reference to an object without
preventing it from being collected. We could provide an implementation
of Ruby’s weak references using Java’s weak references that would allow
us to escape _id2ref entirely for that use case.

Are there other places _id2ref is used?

I think I’ve used _id2ref exactly twice. I can’t recall the first
usage; I don’t think it made it into production code. The most
recent use was to store some ruby object id’s in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.

(I suppose DRb might do something similar?)

Regards,

Bill

headius · October 28, 2007, 8:07am

ara.t.howard wrote:

hmmm. ok i’m brainstorming here which you can ignore if you like as
i
know less that nothing about jvms or implementing ruby but here goes:
what if you could invert the problem? what i objects knew about the
global ObjectSpaceThang and could be forced to register themselves on
demand somehow? without a reference i’ve no idea how, just throwing
that out there. or, another stupid idea, what if the objects
themselves
were the tree/graph of weak references parent -> children. crawling
it
would be, um, fun - but you could prune dead objects only when
walking
the graph. this should be possible in ruby since you always have the
notion of a parent object - which is Object - so all objects should
be
either reachable or leaks. now back to drinking my regularly
scheduled
beer…

Continuing this discussion here…

Please, continue to brainstorm. I don’t claim to have thought out every
aspect of this problem or every possible solution. I’d love to
discover I’ve missed an obvious fix.

Your idea has come up in the past, and it would probably eliminate the
cost of an ObjectSpace list. However that doesn’t appear to be where we
pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

Constructing an extra object for every Ruby object…namely, the
WeakReference object to point to it. So we pay a
memory/allocation/initialization cost.
WeakReference itself causes Java’s GC to have to do additional checks,
so it can notify the WeakReference that the object it points at has gone
away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some
implementations of ObjectSpace-like behavior, so it’s perfectly valid.
But again, there’s certain aspects of ObjectSpace that are just
problematic…

threading or concurrency of any kind? No, you can’t have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
determinism? Matz told me that “ObjectSpace doesn’t have to be
deterministic”…but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS isn’t
deterministic, then nobody should be relying in its contents for core
libraries, and we could reasonably claim that each_object will never
return anything.
Charlie

headius · October 28, 2007, 1:26pm

Hi,

At Sun, 28 Oct 2007 16:16:25 +0900,
Charles Oliver N. wrote in [ruby-talk:276236]:

Are there other places _id2ref is used?

drb.

headius · October 28, 2007, 2:10pm

On 28.10.2007 08:06, Charles Oliver N. wrote:

notion of a parent object - which is Object - so all objects should be
either reachable or leaks. now back to drinking my regularly scheduled
beer…

Continuing this discussion here…

Please, continue to brainstorm. I don’t claim to have thought out every
aspect of this problem or every possible solution. I’d love to
discover I’ve missed an obvious fix.

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM
has to keep track of instances anyway and implementing this in Java via
WeakReferences seems to duplicate functionality that is already there.
Did you consider using “Java Virtual Machine Tools Interface”?

You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).

Alternatively there may be another method that does not need
instrumentation and that can give you access to every (reachable) object
in the JVM.

so it can notify the WeakReference that the object it points at has gone

determinism? Matz told me that “ObjectSpace doesn’t have to be
deterministic”…but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS isn’t
deterministic, then nobody should be relying in its contents for core
libraries, and we could reasonably claim that each_object will never
return anything.

I’d reformulate the requirement here: ObjectSpace.each_object must yield
every object that was existent before the invocation and that is
strongly reachable. I believe for the typical use case (e.g. traversing
all class instances) this is enough while leaving enough flexibility for
the implementation (i.e. create s snapshot of some form, iterate through
some internal structure that may change due to new objects being created
during #each_object etc.).

Kind regards

robert

headius · October 28, 2007, 3:27pm

On Oct 28, 2007, at 1:16 AM, Charles Oliver N. wrote:

Are there other places _id2ref is used?

i use it quite often as a way to have meta-programming ‘storage’
without polluting instances:

foo = method :foo

module_eval <<-code
def foo(*a, &b)
ObjectSpace._id2ref(#{ foo.id }).bind(self).call(*a, &b)
end
code

which is fabricated - but you get the concept: string in eval maps to
live object at run time. when #define_method takes a block this
won’t be used much i think though…

cheers.

a @ http://codeforpeople.com/

headius · October 28, 2007, 5:10pm

Bill K. wrote:

I think I’ve used _id2ref exactly twice. I can’t recall the first
usage; I don’t think it made it into production code. The most
recent use was to store some ruby object id’s in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.

(I suppose DRb might do something similar?)

Yeah, sounds like that’s mostly a “poor man’s remote hash”. I’d expect
that just creating a hash specifically for that purpose and passing a
key around would be a “better” way to do it.

_id2ref is just another one of those features that gets rarely used, and
whose use cases can often be implemented in “better” ways.

Charlie

headius · October 28, 2007, 2:15pm

On Oct 28, 12:53 am, Charles Oliver N. [email protected]
wrote:

So…I’m writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).

.ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer
self, @@final.call(@hkeyfinal)
ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
{File.unlink(“tmp.txt”)}
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|
obj|
ext\Win32API\lib\win32\registry.rb:569:
ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal)
lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self,
Session::callback(@dbprot))
lib\drb\drb.rb:337:# object’s ObjectSpace id as its dRuby id. This
means that the dRuby
lib\drb\drb.rb:361: # This, the default implementation, uses an
object’s local ObjectSpace
lib\drb\drb.rb:375: ObjectSpace._id2ref(ref)
lib\finalize.rb:59: ObjectSpace.call_finalizer(obj)
lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc)
lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc)
lib\finalize.rb:180: # registering function to
ObjectSpace#add_finalizer
lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc)
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj,
HistorySavingAbility.create_finalizer)
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
|io|
lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # =>
0.
lib\singleton.rb:190: “#{ObjectSpace.each_object(klass){}} #{klass}
instance(s)”
lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc)
lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self)
lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self)
lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class)
do |klass|
lib\test\unit\autorunner.rb:54: :objectspace => proc do |r|
lib\test\unit\autorunner.rb:55: require ‘test/unit/collector/
objectspace’
lib\test\unit\autorunner.rb:56: c =
Collector::ObjectSpace.new
lib\test\unit\autorunner.rb:80: @collector =
COLLECTORS[(standalone ? :dir : :objectspace)]
lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir,
file=::File, object_space=::ObjectSpace, req=nil)
lib\test\unit\collector\objectspace.rb:10: class ObjectSpace
lib\test\unit\collector\objectspace.rb:13: NAME = ‘collected
from the ObjectSpace’
lib\test\unit\collector\objectspace.rb:15: def
initialize(source=::ObjectSpace)
lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite
for you. It then runs
lib\weakref.rb:16:# ObjectSpace.garbage_collect
lib\weakref.rb:62: ObjectSpace._id2ref(@__id)
lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final
lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final
lib\weakref.rb:98: ObjectSpace.garbage_collect
test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj|
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\ruby\test_objectspace.rb:3:class TestObjectSpace <
Test::Unit::TestCase
test\ruby\test_objectspace.rb:10: o =
ObjectSpace._id2ref(obj.object_id);
test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj|
test\testunit\collector\test_dir.rb:62: class ObjectSpace
test\testunit\collector\test_dir.rb:81: @object_space =
ObjectSpace.new
test\testunit\collector\test_objectspace.rb:6:require ‘test/unit/
collector/objectspace’
test\testunit\collector\test_objectspace.rb:11: class
TC_ObjectSpace < TestCase
test\testunit\collector\test_objectspace.rb:41: @c =
ObjectSpace.new(@object_space)
test\testunit\collector\test_objectspace.rb:44: def
full_suite(name=ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:51:
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:83: expected =
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:89: expected =
TestSuite.new(ObjectSpace::NAME)
test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do |
klass|

So, in summary, if we exclude those libraries where only tests are
affected, this would affect:

win32-registry
tk
cgi
drb
finalize
irb
shell
singleton
tempfile
test-unit
weakref

Some comments on each of these as they relate to JRuby:

win32-registry: You have no hope of implementing this without JNA
anyway, unless there’s some Java binding I don’t know about. Besides,
I couldn’t tell you why on Earth win32-registry would need a
finalizer.

tk: No one will care. They’ll use SWT or Swing bindings. Besides, you
would need JNA.

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.

drb: This could be a big deal.

finalize: Did anyone even know about this? Does anyone use it?

irb: You’ve got jirb.

shell: This could be a problem.

singleton: Ditto.

tempfile: Meh, I’m guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

test-unit: Already mentioned.

weakref: You’ve stated that Java has its own implementation.

Regards,

Dan

headius · October 28, 2007, 5:20pm

Robert K. wrote:

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM
has to keep track of instances anyway and implementing this in Java via
WeakReferences seems to duplicate functionality that is already there.
Did you consider using “Java Virtual Machine Tools Interface”?

Java SE 6 Release Notes

You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).

You just hit on exactly why we don’t use JVMTI for ObjectSpace. It would
certainly work, but it would add a lot of overhead we’d never expect
people to accept in a real application. Plus, it would track far more
object instances than we actually want tracked. We’d love to include a
JVMTI-based ObjectSpace implementation, however…it just hasn’t been a
high priority to implement since 99% of users never actually need
ObjectSpace.

Alternatively there may be another method that does not need
instrumentation and that can give you access to every (reachable) object
in the JVM.

If there is…we haven’t found it. The “linked weakref list” has been
the least overhead so far, and it’s still a lot of overhead.

checks, so it can notify the WeakReference that the object it points
potentially excludes other advanced GC designs too).
all class instances) this is enough while leaving enough flexibility for
the implementation (i.e. create s snapshot of some form, iterate through
some internal structure that may change due to new objects being created
during #each_object etc.).

The problem here is “strongly reachable”. During ObjectSpace processing,
the last strong reference to an object may go away and the garbage
collector may run. Should ObjectSpace prevent GC from running if it’s
traversed and now references that object? If not, how should it be
handled if immediately before you return an object from each_object, it
gets garbage collected? There’s no way to catch that, so each_object may
end up returning a reference to an object that’s gone away, or
reconstituting an object whose finalization has already fired. Bad
things happen.

ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory, run in parallel, and so on. It can
never be deterministic unless it can “stop the world”, so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.

Charlie

headius · October 28, 2007, 5:33pm

Daniel B. wrote:

ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
object’s local ObjectSpace
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
lib\test\unit\autorunner.rb:54: :objectspace => proc do |r|
from the ObjectSpace’
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\testunit\collector\test_objectspace.rb:11: class
TestSuite.new(ObjectSpace::NAME)
finalize
irb
shell
singleton
tempfile
test-unit
weakref

Some comments on each of these as they relate to JRuby:

Of these, only the following would be affected, since only each_object
would be disabled by default:

tk: No one will care. They’ll use SWT or Swing bindings. Besides, you
would need JNA.
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|

Quite right, and there are currently no plans (or demand) for Tk support
in JRuby. Swing is a far better GUI API, especially when wrapped in
Ruby.

irb: You’ve got jirb.
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|

This could still be supported through a similar mechanism as
each_object(Class), by keeping a weak hash of all Module instances.

shell: This could be a problem.
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO)
do

I’d be surprised if shell worked 100% correctly right now anyway, due to
process-control requirements we can’t support well on JVM. But I would
also expect this use of each_object to have a “better” implementation,
and if not it could again be a specific-purpose weak hash for IO streams
(which we almost have already since we want to be able to clean them up
on exit.

singleton: Ditto.

I’d have to look at this one. This could be another good candidate for
reimplementation in a lot less Java code; singleton support would be
pretty easy to write up in a few lines of Java.

test-unit: Already mentioned.

So pretty few libraries would be affected, and I don’t think any
couldn’t be dealt with in other ways. And to reiterate: finalizers and
_id2ref wouldn’t be affected (though I’d prefer to find alternative
mechanisms for _id2ref).

Charlie

headius · October 28, 2007, 6:13pm

Ken B. wrote:

I don’t think they’re making ObjectSpace go away. Just
ObjectSpace#each_object.

Correct.

On Sun, 28 Oct 2007 22:13:38 +0900, Daniel B. wrote:

drb: This could be a big deal.
weakref: You’ve stated that Java has its own implementation.

This uses _id2ref, which doesn’t appear to be going away.

Not that I wouldn’t like it to

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.
finalize: Did anyone even know about this? Does anyone use it?
tempfile: Meh, I’m guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

Finalizers could be implemented using Java’s finalize() method for
classes that need it. This method of implementing finalizers could
probably be compatibly exposed using ObjectSpace.

Correct; we do support finalizers already. They weren’t actually that
hard to support, since as you say Java already supports finalization.

shell: This could be a problem.

This looks broken anyway since it uses fork.

Ahh yes, fork is a killer. We will never, ever support fork.

singleton: Ditto.

one is in documentation comment, giving an example of a specific behavior
of the library. the other is in the same example, included executably
after an if FILE == $0 condition. So no actual problem here.

Whew, that’s good to hear. I know singleton is used a bit in Rails, and
most people run JRuby on Rails with ObjectSpace disabled…so this seems
to fit with your findings.

irb: You’ve got jirb.

jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you
pointed ou is class (Module) iteration, used for completion, but it’s
more general iteration than test/unit uses, and #inherited techniques
that can be used for test/unit may not work here.

inherited, perhaps not. But a JRuby-internal weak list of Module
instances would put this one to rest.

Charlie

headius · October 28, 2007, 5:55pm

I don’t think they’re making ObjectSpace go away. Just
ObjectSpace#each_object.

(I’m not a Jruby developer, so I don’t trust the correctness of anything
I say.)

On Sun, 28 Oct 2007 22:13:38 +0900, Daniel B. wrote:

drb: This could be a big deal.
weakref: You’ve stated that Java has its own implementation.

This uses _id2ref, which doesn’t appear to be going away.

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.
finalize: Did anyone even know about this? Does anyone use it?
tempfile: Meh, I’m guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

Finalizers could be implemented using Java’s finalize() method for
classes that need it. This method of implementing finalizers could
probably be compatibly exposed using ObjectSpace.

shell: This could be a problem.

This looks broken anyway since it uses fork.

singleton: Ditto.

one is in documentation comment, giving an example of a specific
behavior
of the library. the other is in the same example, included executably
after an if FILE == $0 condition. So no actual problem here.

irb: You’ve got jirb.

jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you
pointed ou is class (Module) iteration, used for completion, but it’s
more general iteration than test/unit uses, and #inherited techniques
that can be used for test/unit may not work here.

–Ken

headius · October 28, 2007, 6:17pm

On 28.10.2007 17:19, Charles Oliver N. wrote:

You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).

You just hit on exactly why we don’t use JVMTI for ObjectSpace. It would
certainly work, but it would add a lot of overhead we’d never expect
people to accept in a real application. Plus, it would track far more
object instances than we actually want tracked.

Why is that? I mean, you could selectively decide which instances to
track.

the least overhead so far, and it’s still a lot of overhead.
Hmm, but there are iteration methods like #each_object:
JVM(TM) Tool Interface 1.0.38

Did you put them down because of the “stop the world” approach? I’d say
that would be ok - at least it’s better than not having ObjectSpace.
And also, there would be no overhead. Question is only whether it’s ok
to invoke arbitrary byte code (which would happen during the iteration
callback).

checks, so it can notify the WeakReference that the object it points
potentially excludes other advanced GC designs too).
traversing all class instances) this is enough while leaving enough
flexibility for the implementation (i.e. create s snapshot of some
form, iterate through some internal structure that may change due to
new objects being created during #each_object etc.).

The problem here is “strongly reachable”. During ObjectSpace processing,
the last strong reference to an object may go away and the garbage
collector may run. Should ObjectSpace prevent GC from running if it’s
traversed and now references that object? If not, how should it be
handled if immediately before you return an object from each_object, it
gets garbage collected?

You are right: objects can “disappear” (i.e. loose their strong
reachability) during traversal. Obviously my suggested requirement was
still too strong.

There’s no way to catch that, so each_object may
end up returning a reference to an object that’s gone away, or
reconstituting an object whose finalization has already fired. Bad
things happen.

Recreation is a bad idea. I agree, objects that are no longer strongly
reachable at the moment they are about to be passed to the block should
not be passed.

ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory,

I don’t think that moving is an issue. If it were, JVM’s would not work
the way they do (object references are no pointers to memory locations).
In other words, all programs would have the same problems #each_object
had.

run in parallel, and so on. It can
never be deterministic unless it can “stop the world”, so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.

Right you are. #each_object should not be used in regular code - it’s
more for ad hoc statistics (“how many instances of a class?”) and the
like.

Kind regards

robert

headius · October 28, 2007, 8:08pm

Quoth Daniel B.:

…
shell: This could be a problem.
…

As far as I know, shell isn’t used extensively. From reading the source,
it
appears to be very much linked to the host system’s processes, files,
etc,
which may be inappropriate for JRuby anyways (I’m guessing here).

Regards,

headius · October 28, 2007, 6:40pm

Robert K. wrote:

On 28.10.2007 17:19, Charles Oliver N. wrote:

You just hit on exactly why we don’t use JVMTI for ObjectSpace. It
would certainly work, but it would add a lot of overhead we’d never
expect people to accept in a real application. Plus, it would track
far more object instances than we actually want tracked.

Why is that? I mean, you could selectively decide which instances to
track.

Actually, we do that a bit already. For example, we do not track arrays
constructed during argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example…which would cripple other JRuby apps running in
the same process.

In general, though, we haven’t explored JVMTI because we want JRuby to
be the best production environment for deploying apps, and nobody will
EVER turn on JVMTI on their production servers.

Alternatively there may be another method that does not need
instrumentation and that can give you access to every (reachable)
object in the JVM.

If there is…we haven’t found it. The “linked weakref list” has been
the least overhead so far, and it’s still a lot of overhead.

Hmm, but there are iteration methods like #each_object:
JVM(TM) Tool Interface 1.0.38

I was referring to non-JVMTI solutions, but you’re right, JVMTI does
provide this capability.

Did you put them down because of the “stop the world” approach? I’d say
that would be ok - at least it’s better than not having ObjectSpace. And
also, there would be no overhead. Question is only whether it’s ok to
invoke arbitrary byte code (which would happen during the iteration
callback).

Is it really ok? You need to remember that JRuby opens up the
possibility of running many, many applications in the same process, as
well as asynchronous algorithms with true parallel threads. We can’t
expect people to cripple all that so they can walk EVERY object in the
system. “Stop the world” is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.

But it may be that for cases where each_object is needed, this is a
reasonable thing to do. I think if someone were to submit an
implementation of each_object that uses JVMTI, we would certainly accept
it

ObjectSpace is just not compatible with any GC that requires the
ability to move objects around in memory,

I don’t think that moving is an issue. If it were, JVM’s would not work
the way they do (object references are no pointers to memory locations).
In other words, all programs would have the same problems #each_object
had.

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that’s bad especially when
we’re looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can’t lock things down like
that.

Charlie

headius · October 29, 2007, 4:17am

Charles Oliver N. wrote:

Actually, we do that a bit already. For example, we do not track arrays
constructed during argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example…which would cripple other JRuby apps running in
the same process.

[…]

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that’s bad especially when
we’re looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can’t lock things down like
that.

Sorry for the extremely uninitiated and naive question - but when you’re
about to enumerate each object in an application, aren’t you interested
only in this application’s objects anyway? So why would you have to lock
anything about the other ruby apps in the same process? Is that kind of
distinguishing objects impossible on the GC/enumeration level?

mortee

headius · October 29, 2007, 6:21am

On Oct 28, 9:19 am, Charles Oliver N. [email protected]
wrote:

ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory, run in parallel, and so on. It can
never be deterministic unless it can “stop the world”, so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.

This is the exact reason we haven’t yet implemented each_object in
Rubinius yet.

Having a generational GC that moves objects, iterating over all
objects is very,
very non-deterministic unless the GC is totally turned off while
objects are walked.

Thats at least an option we have that we may roll with for the initial
release, but
it’s less than ideal.

I think of each_object as very much a MRI implementation feature that
the rest of us
implementors struggle to implement. Because of this, the community and
core members of
each implementation need to really beginning discussing whether or not
each_object is a
Ruby feature or an MRI feature.

Evan

headius · October 29, 2007, 4:26am

Charles Oliver N. wrote:

I’d be surprised if shell worked 100% correctly right now anyway, due to
process-control requirements we can’t support well on JVM. But I would
also expect this use of each_object to have a “better” implementation,
and if not it could again be a specific-purpose weak hash for IO streams
(which we almost have already since we want to be able to clean them up
on exit.

Speaking of multiple cases of possible class-specific instance
tracking… isn’t it possible to register your interest in some such
class at some point explicitely from program code - and then any class
could be made enumerable.

mortee