Understanding object_id in Ruby 2.7

ernestog85 · September 15, 2020, 9:56pm

Hi, i’ve been trying to do small changes on some objects using Fiddle, such as changing flags through Fiddle::Pointer, and passing the value based on the object_id. As far as i know in MRI an object_id coincides with a VALUE of an object on the MRI. Up to ruby 2.6 i could get the correct Pointer for an object_id and also the id of an object is the same when doing a bitwise to the left and printing as hex

 pry(main)> obj = Object.new
=> #<Object:0x000056374c25dad0>
[2] pry(main)> (obj.object_id << 1).to_s(16) 
=> "56374c25dad0"

However on Ruby 2.7 i can’t get to get a match when doing the same code and getting the pointer for an object

[1] pry(main)> obj = Object.new
=> #<Object:0x000055d5df82a630>
[2] pry(main)> (obj.object_id << 1).to_s(16)
=> "708"

Does anybody know or have any insight where to have more detail on the changes made on Ruby 2.7 regarding the object_id and the relation between the MRI objects?

pcl · September 18, 2020, 8:57pm

I don’t know if this will help, but I took a little ‘dive’ to see.
The change in the source seems to occur in gc.c.

If you see the documentation for object_id in 2.6 vs 2.7, and click on ‘show source’, the latter calls rb_find_object_id: class Object - RDoc Documentation
This is a new function in 2.7 in the gc.c file. This calls a cached_object_id function which appears to do a lot of work in the symbol table, and uses an objspace ‘thing’ to get the next id.

Whereas in 2.6 the nonspecial_object_id macro is defined in gc.c and appears to be (more-or-less) the VALUE of the object. class Object - RDoc Documentation

I don’t understand much of this, but if you want to see the changes made, then I think this is where to start looking!

It’s interesting how the documentation is identical for object_id and to_s in both 2.6 and 2.7, but the id value is completely different.

ernestog85 · September 21, 2020, 5:25pm

I did more or less the same digging on the sources of Ruby, specially on gc.c, the changes are made for 2.7 on the new rb_find_object_id function

github.com

ruby/ruby/blob/a0c7c23c9cec0d0ffcba012279cd652d28ad5bf3/gc.c#L3803


      
               *  fixnum  fffffffffffffffffffffffffffffff1        bignum if required
               *
               *  where A = sizeof(RVALUE)/4
               *
               *  sizeof(RVALUE) is
               *  20 if 32-bit, double is 4-byte aligned
               *  24 if 32-bit, double is 8-byte aligned
               *  40 if 64-bit
               */
          
          
    return rb_find_object_id(obj, cached_object_id);
          }
          
          
#include "regint.h"
          
          
static size_t
          obj_memsize_of(VALUE obj, int use_all_types)
          {
              size_t size = 0;
          
          
    if (SPECIAL_CONST_P(obj)) {

for non primitive objects, the cached_object_id function is called which uses a symbol table as you mentioned. But if we print an object using puts we’ll still get the old object_id in hex format as the nonspecial_obj_id function is called, and still exists on 2.7 (ruby/gc.c at a0c7c23c9cec0d0ffcba012279cd652d28ad5bf3 · ruby/ruby · GitHub), when calling the to_s method for an object.

So one way probably get the object memory reference on 2.7, although a bit of a hack, is to call the to_s method and extract the id from there.

/:0x(.+)>$/.match(Kernel.instance_method(:to_s).bind_call(<your object instance>))[1].to_i(16)

I’ll try to dig more if i find a better way to get it or create a Ruby c extension if i can’t get a way to get the real object_id from an rb call.

SouravGoswami · September 22, 2020, 1:30pm

Kind of weird:

RVM:
$ ~/.rvm/rubies/ruby-2.7.1/bin/ruby -e “puts Object.new.object_id.<<(1).to_s(16)”
78

$ ~/.rvm/rubies/ruby-2.6.3/bin/ruby -e “puts Object.new.object_id.<<(1).to_s(16)”
558cb78027f0

$ ~/.rvm/rubies/ruby-2.5.5/bin/ruby -e “puts Object.new.object_id.<<(1).to_s(16)”
55ca2d7b62c0

$ ~/.rvm/rubies/ruby-2.4.6/bin/ruby -e “puts Object.new.object_id.<<(1).to_s(16)”
56536c0e9190

$ ~/.rvm/rubies/ruby-2.3.8/bin/ruby -e “puts Object.new.object_id.<<(1).to_s(16)”
55d614d03a78

System ruby:
$ ruby -e “puts Object.new.object_id.<<(1).to_s(16)”
78

Yes 2.7 changes it significantly.

$ for i in ~/.rvm/rubies/ruby-2.* ; do $i/bin/ruby -e "puts %Q(#{RUBY_VERSION} => #{''.object_id})" ; done
2.3.8 => 47416980288660
2.4.6 => 46983299641380
2.5.5 => 47335830802320
2.6.3 => 47393093047280
2.7.1 => 60

$ for i in ~/.rvm/rubies/ruby-2.* ; do $i/bin/ruby -e "puts %Q(#{RUBY_VERSION} => #{['a', 'b', 'c'].object_id})" ; done
2.3.8 => 47026568616500
2.4.6 => 47226028894120
2.5.5 => 47070111444620
2.6.3 => 47317018807680
2.7.1 => 60

Here’s more info:

It’s a good step towards performance, but if you ever need to modify those values:

> a = +true.to_s
=> "true"

> a << ?!
=> "true!"

ernestog85 · September 26, 2020, 7:57pm

Thanks for the replies. what i managed to do for getting the same object_id on Ruby 2.7 was to create a simple C extension that would use the rb_memory_id function that is similar to what rb_obj_id function did prior to 2.7 (seen on ruby/gc.c at 27958c2bd64b27d529f81a130bd488ccc6b9b1d4 · ruby/ruby · GitHub)

that was the cleanest way i did find for the latest ruby to get the object_id and then get the address memory to be able to access some of the internals through Fiddle.

Another way is to do the following without creating an extension:

/:0x(.+)>$/.match(Kernel.instance_method(:to_s).bind_call(<your object instance>))[1].to_i(16)

which is a bit more cryptic but it’s still gets the job done as the to_s for an instance still shows the MRI object id.