Using id2ref for anything?

ObjectSpace._id2ref is another of those peculiar methods, an artifact
of a particular implementation which, due to its lack of a
copying/compacting garbage collector, can always locate in memory an
object given its “id”. This is typically not easily possible on
other VMs, where objects move around and it may even be difficult to
get a unique “id” for a given object since memory locations keep
moving and adding a numeric ID would increase object or object handle
sizes.

On JRuby, _id2ref is implemented as a pair with Object#object_id/id.
The latter, when called on an object, atomically constructs a numeric
ID for the object in question. It then asks our ObjectSpace
implementation to insert a weak reference to the object into a table
keyed on numeric ID. This allows the resulting ID to be used later for
_id2ref to retrieve the object.

Unfortunately object_id, in its #id form, is often used to get a
unique non-#hash key for an object for purposes entirely unrelated to
_id2ref. As a result, any code using object_id or id on JRuby pays a
significantly higher cost than you might expect.

If we no longer supported _id2ref, the only cost would be in producing
an ID, probably with a strictly-increasing atomic 64-bit value. There
would be no weakref map and no cost of constructing and managing the
weakrefs within that map.

So I am asking you Rubyists…does this sound like a problem? In the
1.8/1.9 stdlib, the only reference to _id2ref is one in drb.rb, which
could be replaced with a “better way”. None of the gems I have
installed use _id2ref. Originally, weakref.rb used _id2ref, but we
have a native impl of weakref that uses Java’s built-in weakrefs.
Google code search only brings up about 353 hits for “lang:ruby
_id2ref”, most of them the already-mentioned cases.

One last demonstration of the perf difference between the current
Object#object_id and one that does not use the ObjectSpace weak map:

Current:

                                          user     system

total real
1M calls to obj.object_id 0.658000 0.000000
0.658000 ( 0.658000)
1M calls to Object.new.object_id 6.636000 0.000000
6.636000 ( 6.636000)

Using object’s “identity hash”:

                                          user     system

total real
1M calls to obj.object_id 0.356000 0.000000
0.356000 ( 0.356000)
1M calls to Object.new.object_id 0.636000 0.000000
0.636000 ( 0.636000)

It’s also interesting to note that even maintaining the contract of
object_id being unique is hard. On the JVM, for example, it is not
possible to get a unique numeric id or pointer for a given object
unless you manage a weak map of objects on your own…

  • Charlie

On 10/25/2009 09:41 PM, Charles Oliver N. wrote:

So I am asking you Rubyists…does this sound like a problem? In the
1.8/1.9 stdlib, the only reference to _id2ref is one in drb.rb, which
could be replaced with a “better way”. None of the gems I have
installed use _id2ref. Originally, weakref.rb used _id2ref, but we
have a native impl of weakref that uses Java’s built-in weakrefs.
Google code search only brings up about 353 hits for “lang:ruby
_id2ref”, most of them the already-mentioned cases.

Charles, thanks for the elaborate report and request! I for my part do
not see an issue with removing _id2ref if a better solution for DRb can
be devised.

It’s also interesting to note that even maintaining the contract of
object_id being unique is hard. On the JVM, for example, it is not
possible to get a unique numeric id or pointer for a given object
unless you manage a weak map of objects on your own…

I believe there is an alternative solution which comes at the cost of
the memory overhead for every object: place the id in the instance and
use a central AtomicLong for “generating” ids. You also save the
overhead of map maintenance which would be a central synchronization
point.

Kind regards

robert

Hi,

In message “Re: Using id2ref for anything?”
on Mon, 26 Oct 2009 05:41:52 +0900, Charles Oliver N.
[email protected] writes:

|ObjectSpace._id2ref is another of those peculiar methods, an artifact
|of a particular implementation which, due to its lack of a
|copying/compacting garbage collector, can always locate in memory an
|object given its “id”. This is typically not easily possible on
|other VMs, where objects move around and it may even be difficult to
|get a unique “id” for a given object since memory locations keep
|moving and adding a numeric ID would increase object or object handle
|sizes.

Originally, _id2ref is an implementation dependent hack for weakref,
so that you can remove it, if you can provide the better way.

          matz.

On Sun, Oct 25, 2009 at 5:02 PM, Yukihiro M. [email protected]
wrote:

Originally, _id2ref is an implementation dependent hack for weakref,
so that you can remove it, if you can provide the better way.

I suspected as much, since that seemed to be the primary place for it
to be used. I guess the remaining question is about the uniqueness of
object_id. I believe on Sun’s JVMs java.lang.System.identityHashcode
of an object will be unique for the lifetime of the object, but not
unique forever (which I’m sure is the case on MRI). However, I don’t
think there’s any guarantee that the identityHashCode will be unique
across JVMs, though the documentation says an implementation should
make a “best effort” to keep it unique.

We will look at removing _id2ref in 1.5 (or making it do nothing, with
a warning) as well as modifying the one stdlib that uses it (DRb, for
reasons I have not yet explored). And I will explore whether
identityHashcode will be “unique enough” as I suspect it should be.

  • Charlie

On 10/25/09, Charles Oliver N. [email protected] wrote:

If we no longer supported _id2ref, the only cost would be in producing
an ID, probably with a strictly-increasing atomic 64-bit value. There
would be no weakref map and no cost of constructing and managing the
weakrefs within that map.

So I am asking you Rubyists…does this sound like a problem?

In my own projects, I use _id2ref/id in a couple places that I can
recall. So, I was about to object to it going away… but on
reflection, it seems like _id2ref can always be replaced by a WeakRef,
(at least when running in JRuby). So, removing it shouldn’t really be
a problem.

On Sun, Oct 25, 2009 at 4:05 PM, Robert K.
[email protected] wrote:

It’s also interesting to note that even maintaining the contract of
object_id being unique is hard. On the JVM, for example, it is not
possible to get a unique numeric id or pointer for a given object
unless you manage a weak map of objects on your own…

I believe there is an alternative solution which comes at the cost of the
memory overhead for every object: place the id in the instance and use a
central AtomicLong for “generating” ids. Â You also save the overhead of map
maintenance which would be a central synchronization point.

Yes, that may be too high a cost for us to pay. On 32/64-bit JVMs,
adding another field would cost 4 or 8 bytes per object. Stuffing it
into the instance variable table would force ivar tables to be created
when object_id is called, which comes with a base cost of 2 words (4
or 8-byte) plus the word for the reference to a Fixnum object (or 4-8
bytes for a reference to an int or long). It’s too high to put on all
Objects, especially if identityHashcode is unique enough.

  • Charlie

On Sun, Oct 25, 2009 at 6:34 PM, Caleb C. [email protected]
wrote:

In my own projects, I use _id2ref/id in a couple places that I can
recall. So, I was about to object to it going away… but on
reflection, it seems like _id2ref can always be replaced by a WeakRef,
(at least when running in JRuby). So, removing it shouldn’t really be
a problem.

Yes, all cases of _id2ref could be implemented yourself by building a
weak map from your own user-generated (or from object_id) to objects.
So I think there’s probably no good reason we need to have _id2ref
support if we have our own weakref implementation.

  • Charlie