So there I was this morning, staring at an ObjectSpace counter telling
me that I’m allocating 1500 Arrays and 10000 Floats per frame. Which
pretty much ground my framerate to ground by requiring a 0.2s GC run
every other frame. So I decided to get down and rid my code of as many
allocations as possible.
The first thing I discovered was a bit of code looking like this (not
to mention that each was actually getting called several times per
frame due to a bug):
@mtime = ([@mtime] + ([@stroke||nil,
@fill||nil].compact+children).map{|c| c.mtime}).max
Quite unreadable, and it was responsible for a large part of the Array
allocations too. A quick change whittled Array allocation count for
that method to 0, with the price of making it less idiomatic:
children.each{|c|
cm = c.mtime
@mtime = cm if cm > @mtime
}
if @stroke
sm = @stroke.mtime
@mtime = sm if sm > @mtime
end
if @fill
fm = @fill.mtime
@mtime = fm if fm > @mtime
end
Now the Array allocations dropped down to hundreds, a much more
reasonable number, but still way too much compared to what was
happening in the frame. The only thing that should’ve changed was one
number. So the extra 500 Arrays were a bit of a mystery.
Some investigation revealed places where I was using
Array#each_with_index. Very nice, very idiomatic, very allocating a
new Array on each iteration. So replace by the following and watch the
alloc counts fall:
i = 0
arr.each{|e|
do_stuff_with e
i += 1
}
By doing that in a couple of strategic places and some other
optimizations, the Array allocation count fell to 150. Of which 90
were allocated in the object Z-sorting method, which’d require a C
implementation to get its allocation count to 0. The Array allocation
fight was heading towards diminishing returns, and my current scene
didn’t need to use Z-sorting, so I turned my attention to the Floats.
By now, the Float count had also dropped a great deal, but it was
still a hefty 3000 Floats per frame. With each float weighing 16
bytes, that was nearly 3MB per second when running at 60fps. Searching
for the method that was allocating all those Floats, i ran into
something weird. #transform was allocating 6-32 Floats per call. And
it’s one of the functions that get called for every scene object, in
every frame. Also, it’s written in C.
That left me stymied. Surely there must be some mistake, I thought,
the C function didn’t seem to be allocating any Ruby objects. But
little did I know.
The C function called the NUM2DBL-macro in several places to turn Ruby
numbers into doubles. Reading the source for NUM2DBL told that it
calls the rb_num2dbl C function. Which takes a Ruby number and returns
a C double. Reading the source to rb_num2dbl revealed this:
01361 double
01362 rb_num2dbl(val)
01363 VALUE val;
01364 {
01365 switch (TYPE(val)) {
01366 case T_FLOAT:
01367 return RFLOAT(val)->value;
01368
01369 case T_STRING:
01370 rb_raise(rb_eTypeError, “no implicit conversion to float
from string”);
01371 break;
01372
01373 case T_NIL:
01374 rb_raise(rb_eTypeError, “no implicit conversion to float
from nil”);
01375 break;
01376
01377 default:
01378 break;
01379 }
01380
01381 return RFLOAT(rb_Float(val))->value;
01382 }
rb_Float gets called on all Fixnums and Bignums, which there happened
to be quite a deal of in my scene state arrays. Checking out rb_Float
gave the explanation for the Float allocations:
01326 switch (TYPE(val)) {
01327 case T_FIXNUM:
01328 return rb_float_new((double)FIX2LONG(val));
01329
01333 case T_BIGNUM:
01334 return rb_float_new(rb_big2dbl(val));
In order to turn a Fixnum into a double, it’s allocating a new Float!
With that figured out, I took and rewrote rb_num2dbl as rb_num_to_dbl,
this time handling Fixnums and Bignums as special cases as well:
double rb_num_to_dbl( VALUE val )
{
switch (TYPE(val)) {
case T_FLOAT:
return RFLOAT(val)->value;
case T_FIXNUM:
return (double)FIX2LONG(val);
case T_BIGNUM:
return rb_big2dbl(val);
case T_STRING:
rb_raise(rb_eTypeError, "no implicit conversion to float from
string");
break;
case T_NIL:
rb_raise(rb_eTypeError, "no implicit conversion to float from
nil");
break;
default:
break;
}
return RFLOAT(rb_Float(val))->value;
}
The result? Float allocations fell to 700 per frame from the original
3000. And now I’m getting a GC run “only” every 36 frames. Not perfect
by any means, but a decent start.
Have stories of your own? Tips for memory management? Ways to track
allocations? Post them, please.
Cheers,
Ilmari