How can I pin a Ruby object in memory?

johnlester · June 1, 2006, 11:10pm

I have some data that I’m storing in a T_DATA VALUE. Is the data
that’s stored there part of the GC heap - IOW can it move in memory?
If so, is there a way to pin it so that it doesn’t move while I’m
using it?

Thanks!
-John

johnlester · June 1, 2006, 11:32pm

On 6/1/06, John L. [email protected] wrote:

I have some data that I’m storing in a T_DATA VALUE. Is the data
that’s stored there part of the GC heap - IOW can it move in memory?
If so, is there a way to pin it so that it doesn’t move while I’m
using it?

Not sure if I understand the question. A Data object has a pointer
(RDATA(obj)->data) to some block of memory that you’ve allocated, and
no, Ruby’s GC process isn’t going to assign some new value to that
pointer.

If you’re asking whether Ruby will move the address of the Data object
itself: I’m guessing that that’s possible.

johnlester · June 1, 2006, 11:35pm

Hi.

On 6/2/06, Lyle J. [email protected] wrote:

If you’re asking whether Ruby will move the address of the Data object
itself: I’m guessing that that’s possible.

I guess this is not true, because Ruby’s GC does not compact
memory (at least up to now).

Minkoo S.

johnlester · June 2, 2006, 12:44am

On 6/1/06, Lyle J. [email protected] wrote:

If you’re asking whether Ruby will move the address of the Data object
itself: I’m guessing that that’s possible.

I was wondering about the latter. I couldn’t find any APIs for pinning
objects in memory so I was worried that the object might move out from
underneath me. But on second thought I’d have the DATA pointer cached in
a
register / call stack in any event so it probably doesn’t matter if the
object moves in the future.

Cheers,
-John

johnlester · June 2, 2006, 12:50am

Lyle J. wrote:

If you’re asking whether Ruby will move the address of the Data object
itself: I’m guessing that that’s possible.

If ruby moved objects like that (whether T_DATA or T_OBJECT, T_STRING,
etc), it would be a disaster. Every VALUE that referred to the object
(in other words every reference to it in a variable, array, hash, etc.)
would become invalid, since the VALUE type is actually a pointer in
these cases. (I may be misunderstanding the question though…)

johnlester · June 2, 2006, 1:18am

On Jun 1, 2006, at 6:47 PM, Joel VanderWerf wrote:

these cases. (I may be misunderstanding the question though…)

–
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

No ruby does not move objects in memory. As to how horrible that
would be if it did, there are GCs that do work like this (Copying
GC). Believe it or not there are speed advantages to copying gcs in
that the algorithm has runtime proportional to the number of
reachable objects, rather than the size of the heap like mark-and-
sweep (which is what ruby uses). Copying collectors also compact the
the memory, reducing fragmentation. A copying GC would be difficult
in the current ruby implementation since a copying gc cannot really
be conservative (it has to change things in the root set), and ruby
uses the C stack so it is difficult to be sure if something is
definitely not a pointer. With mark-and-sweep false positives are
ok, since nothing ever gets moved. With a copying gc it could mistake
an int on the c stack for a pointer “collect” the “object” it
“pointed” to and then change the value. Which of course would be the
cause of many odd and subtle bugs in ruby code.

johnlester · June 2, 2006, 2:25am

So I would guess that Ruby memory allocation is relatively expensive?
Certainly nowhere near as fast as allocating memory off of the “end”
of the heap or the stack, right? Does it have to search a free list of
blocks itself or does it delegate allocation to the system’s malloc()
implementation?

It’s tricky doing the interop with the CLR since things like boxed
value type objects can be moved in memory, so I need create a pinned
GCHandle object to keep the GC from moving the object (this is also
bad as you could imagine since it leads to heap fragmentation). So
after spending most of the day thinking about the CLR side of the
house, I was a bit surprised to find that Ruby doesn’t move objects
around.

This makes me a bit happier in a way since I don’t have to worry about
the issues on both sides of the house, but since I figured out how to
do it on the CLR side, I was hoping to reuse that new-found experience
on the Ruby side

Thanks for the insights.
-John

johnlester · June 2, 2006, 10:00am

On Jun 1, 2006, at 8:23 PM, John L. wrote:

So I would guess that Ruby memory allocation is relatively expensive?
Certainly nowhere near as fast as allocating memory off of the “end”
of the heap or the stack, right? Does it have to search a free list of
blocks itself or does it delegate allocation to the system’s malloc()
implementation?

Speaking without any knowledge of ruby’s internals I imagine it’s
actually is just allocating from the end of some pre-allocated buffer
until it reaches the end of the buffer. So if you never run out of
room in the buffer the allocation is just incrementing a pointer.
When you reach the end you do the first GC and subsequent allocations
have to search the freelist for a big enough chunk.

johnlester · June 2, 2006, 8:26pm

On Jun 2, 2006, at 4:44 AM, Mauricio F. wrote:

implementation?
normal memory allocator does) either.
increasing
ruby relies on malloc(3) for low-level allocation, instead of doing
it all
with sbrk(2) and friends.

–
Mauricio F. - http://eigenclass.org - singular Ruby

Interesting. (-- takes notes --). Almost seems like cheating :). But
in a good way. I’m going to have read gc.c. Speaking of reading ruby
source, is there an order you would recommend? Every time I look at
it I get overwhelmed by a) not knowing where to start and b) K&R C. I
can power-through the K&R C for the most part I think, but figuring
out what to read when is tougher.

johnlester · June 2, 2006, 8:48pm

On 6/1/06, Joel VanderWerf [email protected] wrote:

If ruby moved objects like that (whether T_DATA or T_OBJECT, T_STRING,
etc), it would be a disaster. Every VALUE that referred to the object
(in other words every reference to it in a variable, array, hash, etc.)
would become invalid, since the VALUE type is actually a pointer in
these cases.

You know, I did know that, but it didn’t occur to me at the time. Good
point.

johnlester · June 2, 2006, 10:47am

On Fri, Jun 02, 2006 at 04:57:54PM +0900, Logan C. wrote:

until it reaches the end of the buffer. So if you never run out of
room in the buffer the allocation is just incrementing a pointer.
When you reach the end you do the first GC and subsequent allocations
have to search the freelist for a big enough chunk.

Ruby does not use a compacting GC and doesn’t manage memory itself (the
way a
normal memory allocator does) either.
There are two parts to allocating an object:

each non-immediate object takes a sizeof(RVALUE)-sized slot (typically
20
bytes) from one of the heaps managed by ruby (look for RVALUE and
heaps in
gc.c). It’s sizeof(RVALUE) for any object so there’s no problem with
“chunk
sizes” and fragmentation (iow. all chunks are ~20 bytes long). A
freelist
is used to find unused slots in said heaps. Additional heaps of
increasing
size will be created when there are no free slots or too few were
freed in a
GC run.
most objects need additional memory (pointed to by fields in their
corresponding slots): instance variable tables, char* for Strings,
VALUE*
for Arrays… these are allocated with malloc and will be freed when
the
corresponding object is reclaimed.

ruby relies on malloc(3) for low-level allocation, instead of doing it
all
with sbrk(2) and friends.

johnlester · June 2, 2006, 8:56pm

2006/6/2, Logan C. [email protected]:

Interesting. (-- takes notes --). Almost seems like cheating :). But
in a good way. I’m going to have read gc.c. Speaking of reading ruby
source, is there an order you would recommend? Every time I look at
it I get overwhelmed by a) not knowing where to start and b) K&R C. I
can power-through the K&R C for the most part I think, but figuring
out what to read when is tougher.

_why had an interesting article last summer about the internals of
Ruby’s memory management and how to use it efficiently:
http://whytheluckystiff.net/articles/theFullyUpturnedBin.html

johnlester · June 2, 2006, 10:26pm

Hi,

On Fri, 02 Jun 2006 20:25:02 +0200, Logan C.
[email protected]
wrote:

Interesting. (-- takes notes --). Almost seems like cheating :). But in
a good way. I’m going to have read gc.c. Speaking of reading ruby
source, is there an order you would recommend? Every time I look at it I
get overwhelmed by a) not knowing where to start and b) K&R C. I can
power-through the K&R C for the most part I think, but figuring out what
to read when is tougher.

Have you seen the “Ruby Hacking Guide” translation at
http://rhg.rubyforge.org/ ?

It’s not complete, but it should definitely get you started.

Dominik

johnlester · June 4, 2006, 10:33pm

Lyle J. wrote:

On 6/1/06, Joel VanderWerf [email protected] wrote:

If ruby moved objects like that (whether T_DATA or T_OBJECT, T_STRING,
etc), it would be a disaster. Every VALUE that referred to the object
(in other words every reference to it in a variable, array, hash, etc.)
would become invalid, since the VALUE type is actually a pointer in
these cases.

You know, I did know that, but it didn’t occur to me at the time. Good
point.

I had no doubt that you knew it; we would not have FXRuby otherwise

johnlester · June 5, 2006, 10:04pm

On Sat, Jun 03, 2006 at 03:25:02AM +0900, Logan C. wrote:

On Jun 2, 2006, at 4:44 AM, Mauricio F. wrote:
[…]

ruby relies on malloc(3) for low-level allocation, instead of doing it all
with sbrk(2) and friends.

Interesting. (-- takes notes --). Almost seems like cheating :). But
in a good way. I’m going to have read gc.c. Speaking of reading ruby
source, is there an order you would recommend? Every time I look at
it I get overwhelmed by a) not knowing where to start and b) K&R C. I
can power-through the K&R C for the most part I think, but figuring
out what to read when is tougher.

It depends on what you’re interested in (/me slaps self). The easiest
starting
points would be array.c, hash.c (st.c if you really want to see the
underlying
st_table implementation, but it’s just your regular hash table),
string.c…
that is, the core data structures. They are very easy to read, but maybe
not
that interesting ultimately due to this very straightforwardness.

As for the more interesting stuff, here are some functions to begin
with:

eval.c:
- rb_eval: the basic AST walker
- rb_call, rb_get_method_body: method dispatching (+method cache) at
  work
- rb_add_method: managing the method tables (m_tbl)
- rb_include_module: to see how proxy classes (T_ICLASS) work; bits of
  Ruby’s object model
  …
parse.y: the grammar + yylex (tricky)

This is what I answered to a similar question 3 years ago in [74002]:

Ruby Core
 * dln.c: wraps dlopen or the equiv. function of your platform, not

very
interesting
* gc.c: quite easy to follow, of interest only if you want to know
how
the GC works internally, but it’s just mark & sweep doing “common
sense” things so you can safely skip it.
* st.c: a hash table implementation used internally by Ruby, quite
straightforward
* eval.c: much harder to read as you have to know the node types to
follow it; several functions are essentially a big switch()
statement
for a node
* parse.y: this can help you see what different node types
correspond
to by having a look at the grammar.
* regex.c: whatever, don’t read it

some other .c files contain only support code

Built-in classes
Take the class you like, scroll down to the Init_xxx() function and
locate the C function that implements the method you want to study.

No
particular order required.

Hope this helps,

johnlester · June 6, 2006, 3:32pm

On Jun 5, 2006, at 4:01 PM, Mauricio F. wrote:

[snip my “homework” for the rest of the summer]

Hope this helps,

It does, thanks.