Minimizing memory allocations


#1

So there I was this morning, staring at an ObjectSpace counter telling
me that I’m allocating 1500 Arrays and 10000 Floats per frame. Which
pretty much ground my framerate to ground by requiring a 0.2s GC run
every other frame. So I decided to get down and rid my code of as many
allocations as possible.

The first thing I discovered was a bit of code looking like this (not
to mention that each was actually getting called several times per
frame due to a bug):

@mtime = ([@mtime] + ([@stroke||nil,
@fill||nil].compact+children).map{|c| c.mtime}).max

Quite unreadable, and it was responsible for a large part of the Array
allocations too. A quick change whittled Array allocation count for
that method to 0, with the price of making it less idiomatic:

children.each{|c|
cm = c.mtime
@mtime = cm if cm > @mtime
}
if @stroke
sm = @stroke.mtime
@mtime = sm if sm > @mtime
end
if @fill
fm = @fill.mtime
@mtime = fm if fm > @mtime
end

Now the Array allocations dropped down to hundreds, a much more
reasonable number, but still way too much compared to what was
happening in the frame. The only thing that should’ve changed was one
number. So the extra 500 Arrays were a bit of a mystery.

Some investigation revealed places where I was using
Array#each_with_index. Very nice, very idiomatic, very allocating a
new Array on each iteration. So replace by the following and watch the
alloc counts fall:

i = 0
arr.each{|e|
do_stuff_with e
i += 1
}

By doing that in a couple of strategic places and some other
optimizations, the Array allocation count fell to 150. Of which 90
were allocated in the object Z-sorting method, which’d require a C
implementation to get its allocation count to 0. The Array allocation
fight was heading towards diminishing returns, and my current scene
didn’t need to use Z-sorting, so I turned my attention to the Floats.

By now, the Float count had also dropped a great deal, but it was
still a hefty 3000 Floats per frame. With each float weighing 16
bytes, that was nearly 3MB per second when running at 60fps. Searching
for the method that was allocating all those Floats, i ran into
something weird. #transform was allocating 6-32 Floats per call. And
it’s one of the functions that get called for every scene object, in
every frame. Also, it’s written in C.

That left me stymied. Surely there must be some mistake, I thought,
the C function didn’t seem to be allocating any Ruby objects. But
little did I know.

The C function called the NUM2DBL-macro in several places to turn Ruby
numbers into doubles. Reading the source for NUM2DBL told that it
calls the rb_num2dbl C function. Which takes a Ruby number and returns
a C double. Reading the source to rb_num2dbl revealed this:

01361 double
01362 rb_num2dbl(val)
01363 VALUE val;
01364 {
01365 switch (TYPE(val)) {
01366 case T_FLOAT:
01367 return RFLOAT(val)->value;
01368
01369 case T_STRING:
01370 rb_raise(rb_eTypeError, “no implicit conversion to float
from string”);
01371 break;
01372
01373 case T_NIL:
01374 rb_raise(rb_eTypeError, “no implicit conversion to float
from nil”);
01375 break;
01376
01377 default:
01378 break;
01379 }
01380
01381 return RFLOAT(rb_Float(val))->value;
01382 }

rb_Float gets called on all Fixnums and Bignums, which there happened
to be quite a deal of in my scene state arrays. Checking out rb_Float
gave the explanation for the Float allocations:

01326 switch (TYPE(val)) {
01327 case T_FIXNUM:
01328 return rb_float_new((double)FIX2LONG(val));
01329
01333 case T_BIGNUM:
01334 return rb_float_new(rb_big2dbl(val));

In order to turn a Fixnum into a double, it’s allocating a new Float!
With that figured out, I took and rewrote rb_num2dbl as rb_num_to_dbl,
this time handling Fixnums and Bignums as special cases as well:

double rb_num_to_dbl( VALUE val )
{
switch (TYPE(val)) {
case T_FLOAT:
return RFLOAT(val)->value;

case T_FIXNUM:
  return (double)FIX2LONG(val);

case T_BIGNUM:
  return rb_big2dbl(val);

case T_STRING:
  rb_raise(rb_eTypeError, "no implicit conversion to float from 

string");
break;

case T_NIL:
  rb_raise(rb_eTypeError, "no implicit conversion to float from 

nil");
break;

default:
  break;

}
return RFLOAT(rb_Float(val))->value;
}

The result? Float allocations fell to 700 per frame from the original
3000. And now I’m getting a GC run “only” every 36 frames. Not perfect
by any means, but a decent start.

Have stories of your own? Tips for memory management? Ways to track
allocations? Post them, please.

Cheers,
Ilmari


#2

Ilmari H. wrote:

So there I was this morning, staring at an ObjectSpace counter telling
me that I’m allocating 1500 Arrays and 10000 Floats per frame. Which
pretty much ground my framerate to ground by requiring a 0.2s GC run
every other frame. So I decided to get down and rid my code of as many
allocations as possible.

< snip due to ruby-forum restrictions />

In order to turn a Fixnum into a double, it’s allocating a new Float!
With that figured out, I took and rewrote rb_num2dbl as rb_num_to_dbl,
this time handling Fixnums and Bignums as special cases as well:

< snip />

The result? Float allocations fell to 700 per frame from the original
3000. And now I’m getting a GC run “only” every 36 frames. Not perfect
by any means, but a decent start.

Nice! I wonder if this would be eligible for core patching?

Have stories of your own? Tips for memory management? Ways to track
allocations? Post them, please.

Nope, but I enjoyed reading this one, thanks!

Cheers,
Ilmari

E


#3

On 1/22/06, Ilmari H. removed_email_address@domain.invalid wrote:

pretty much ground my framerate to ground by requiring a 0.2s GC run

Argh, sorry, magnitude error. The correct GC run time is 0.02s. Not so
bad, but still a 60fps -> 20fps glitch.


#4

On Sun, 22 Jan 2006, Ilmari H. wrote:

The result? Float allocations fell to 700 per frame from the original
3000. And now I’m getting a GC run “only” every 36 frames. Not perfect
by any means, but a decent start.

Good Stuff.

Have stories of your own? Tips for memory management? Ways to track
allocations? Post them, please.

Ok, here goes…

Useful trick one…

Just because Ruby has GC, doesn’t mean it knows what is Garbage, merely
what isn’t currently referenced.

ie. Remember to actively drop references you will never need via things
like…

$big_hairy_global = nil
or
@fat_instance_variable = nil

Trick Two…

Memoization

class Foo
def initialize( thing)
end
end

foo = Foo.new( thing)

becomes…

class Foo
@@memo = Hash.new{|hash,key| hash[key] = Foo.new( key)}

def create_foo( thing)
@@memo[thing]
end

def initialize( thing)
end

end

foo = Foo.create_foo( thing)

Trick 3

Sometimes it doesn’t pay to iterate over each line.

Sometimes just read the whole ruddy file in in one bang shoot and regex
over the whole thing.

ie. Instead of…
open( ‘foo.txt’) do |inf|
inf.each_line do |line|
line.chomp!
if line =~ /thingy/
# do stuff
end
end
end

Try…

IO.read( ‘foo.txt’).scan(/thingy/){ #do stuff}

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : removed_email_address@domain.invalid
New Zealand

Carter’s Clarification of Murphy’s Law.

“Things only ever go right so that they may go more spectacularly wrong
later.”

From this principle, all of life and physics may be deduced.


#5

On Jan 22, 2006, at 8:54 AM, Ilmari H. wrote:

On 1/22/06, Ilmari H. removed_email_address@domain.invalid wrote:

pretty much ground my framerate to ground by requiring a 0.2s GC run

Argh, sorry, magnitude error. The correct GC run time is 0.02s. Not so
bad, but still a 60fps -> 20fps glitch.

In a completely separate world (Lua code running under a scene graph
written in C++; no Ruby anywhere) I recently hit a place where I
thought I needed to allocate ~200 Vector objects per frame.

(I was using a recursive function to calculate 3D bezier curves[1],
which needed to allocate and preserve 4 new Vector objects each call.)

It was causing noticeable lurching when the GC kicked in
occasionally. I found two interesting things:

  1. Running Lua’s GC manually every frame resulted in better memory
    performance and faster overall framerate than running it every 2 or
    10 or 50 or 100 frames. My only (lame) guess was that waiting longer
    yielded a larger memory pool too wade through when looking for items
    to GC. (?)

  2. Because I really didn’t need to preserve the 200 Vectors from
    frame to frame (the final results of the recursive calculation were
    copied into the position vectors for points on the line), I was able
    to remove the per-frame memory allocations altogether by abstracting
    the Vector allocation into a pooled-vector manager. Doing this gave
    me far-better frame rates than I was getting with the GC-every-frame
    approach.

This isn’t applicable specifically to Ruby, but applicable generally:
when you can’t remove memory allocations, see if you can at least re-
use them.

In an attempt to make this post Ruby-specific, I give you a pooled
object manager that I’ve just written, based on the Lua version I
wrote at work. You create a pool by specifying an object that is the
template/factory, and a method to call on that object (defaults to
‘new’). Every time you ask for an object, it will hand you one from
the pool, or create a new instance. The #reset method makes all items
in the pool re-usable again (i.e. call at the start of a new frame).

class ObjectPool
def initialize( template, method=:new, template_in_pool=false )
@template = template
@method = method
@pool = [ ]
if template_in_pool
@pool << template
end
reset
end

Make all items in the pool available again

def reset
@next_available = 0
end

Remove references to all items not currently in use

def drain
@pool[ @next_available…-1 ] = nil
end

Return a new item from the pool, creating a new one if needed

def next
unless item = @pool[ @next_available ]
@pool << ( item = @template.send( @method ) )
end
@next_available = @next_available + 1
item
end

def inspect
“<ObjectPool of #{@pool.size} #{@template}>”
end
end

class Vector
attr_accessor :x, :y, :z
def initialize( x=0, y=0, z=0 ) @x, @y, @z = x,y,z end
def clone() self.class.new( @x, @y, @z ) end
def inspect() “<Vector:0x#{object_id.to_s(16)} #@x,#@y,#@z>” end
end

##################################################################

Showing how to create a pool of class instances

##################################################################
pool = ObjectPool.new( Vector )
p pool
#=> <ObjectPool of 0 Vector>

3.times{ |i|
v = pool.next
v.x = v.y = v.z = i
p v
}
#=> <Vector:0x195518 0,0,0>
#=> <Vector:0x195392 1,1,1>
#=> <Vector:0x19520c 2,2,2>

p pool
#=> <ObjectPool of 3 Vector>

pool.reset
3.times{ p pool.next }
#=> <Vector:0x195518 0,0,0>
#=> <Vector:0x195392 1,1,1>
#=> <Vector:0x19520c 2,2,2>

p pool
#=> <ObjectPool of 3 Vector>

##################################################################

Showing how to create a pool based off of a template object

##################################################################
v = Vector.new( 1, 2, 3 )
pool2 = ObjectPool.new( v, :clone, true )
p pool2
#=> <ObjectPool of 1 #Vector:0x32ad64>

3.times{ p pool2.next }
#=> <Vector:0x1956b2 1,2,3>
#=> <Vector:0x194672 1,2,3>
#=> <Vector:0x1944ec 1,2,3>

pool2.reset
3.times{ p pool2.next }
#=> <Vector:0x1956b2 1,2,3>
#=> <Vector:0x194672 1,2,3>
#=> <Vector:0x1944ec 1,2,3>

p pool2
#=> <ObjectPool of 3 #Vector:0x32ad64>

[1] http://www.antigrain.com/research/adaptive_bezier/index.html#toc0003


#6

I don’t know anything about Lua, but Ruby is unlikely to see and speed
benefits from garbage collecting every frame unless you’re deep in
swap. Ruby uses a “mark and sweep” garbage collection scheme, which I
believe means that garbage collecting time is mostly proportional to
the number of current, referenced objects, not the amount of junk left
behind.


#7

On Wed, Jan 25, 2006 at 12:45:41AM +0900, Timothy G. wrote:

I don’t know anything about Lua, but Ruby is unlikely to see and speed
benefits from garbage collecting every frame unless you’re deep in
swap. Ruby uses a “mark and sweep” garbage collection scheme, which I
believe means that garbage collecting time is mostly proportional to
the number of current, referenced objects, not the amount of junk left
behind.

The marking time is proportional to the number of referenced objects,
and the sweeping time is proportional to the number of objects that get
swept. In general, the longer the time between sweeps, the more objects
need to be swept.

Invoking the GC every frame probably won’t improve average frame rate,
but it may help decrease the variance of the time it takes to process
each frame (so you get more consistent performance).

Paul


#8

On Jan 22, 2006, at 19:14, John C. wrote:

end

end

foo = Foo.create_foo( thing)

Er, um, huh?

All foo-links are tapping into a global-to-class hash called @@memo.
OK…
Foo.new has been rendered useless, apparently torturing anybody who
forgets by returning nil as a non-error?
Foo.create_foo(thing) does, well, I have no idea. What’s “thing”
supposed to be? And why is create_foo using square brackets?

Sigh. Sometimes Ruby is too idiomatic.

What is “memo-ization?”


#9

On Jan 26, 2006, at 10:16, removed_email_address@domain.invalid wrote:

Quoting Dave H. removed_email_address@domain.invalid:

Sigh. Sometimes Ruby is too idiomatic.

What is “memo-ization?”

It isn’t actually a ruby-specific term:

I didn’t think it was. That’s why the comment and the question are in
separate paragraphs.

I might be able to guess what it is if I could read the Ruby code. But
I can’t figure out what the code does.


#10

Quoting Dave H. removed_email_address@domain.invalid:

Sigh. Sometimes Ruby is too idiomatic.

What is “memo-ization?”

It isn’t actually a ruby-specific term:

http://en.wikipedia.org/wiki/Memoization

-mental


#11

On Fri, 27 Jan 2006, Dave H. wrote:

end
end

All foo-links are tapping into a global-to-class hash called @@memo. OK…
Foo.new has been rendered useless, apparently torturing anybody who forgets
by returning nil as a non-error?
Foo.create_foo(thing) does, well, I have no idea. What’s “thing” supposed to
be? And why is create_foo using square brackets?

Sigh. Sometimes Ruby is too idiomatic.

What is “memo-ization?”

I sense deep confuzzlement so let’s take that a lot slower.

OK, so this trick cannot be used everywhere. For example if I ask you to
give me a new pink car, I expect you to go off, build a new car, paint
it
pink and give it to me. If are ask you again, I expect you to do it
again
and give me another (different) brand new pink car.

Suppose in the particular application I had I didn’t really care it was
a
different one, I just wanted a pink car NOW, fast. If it’s the same one
as
you gave me last time, in this particular application, I don’t care so
long as it isn’t blue. I could have kept hold of the old pink car, but
that’s too much work, sometimes I use green, sometimes red cars,
sometimes
pink. This is a garbage collected system, so it’s easier for me grab a
new
pink car when I need one than to keep track of my old ones.

This form of memo-ization is just to skip the “new” when I already have
made one pink car, and just give exactly the same one as I made last
time.

Ok, now the little idiomatic bit.

Suppose @@memo is a brand spanking new empty Hash object.

What does @@memo[“pink”] return?

By default it returns nil.

But you can make it return whatever you damn well feel like.

@@memo = Hash.new{|hash,key| hash[key] = Car.new(key)}

@@memo[“pink”]

Will say, oo, I have no “pink” element, I better go off and make one
then, fortunately I have a block on my “new” so I know how to do that.

ps.

If everybody in your country had a pink car, what would you have?

Answer:

A Pink Carnation.

Ok, so using pink cars was a “blooming” silly example… :-))

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : removed_email_address@domain.invalid
New Zealand

Carter’s Clarification of Murphy’s Law.

“Things only ever go right so that they may go more spectacularly wrong
later.”

From this principle, all of life and physics may be deduced.


#12

On Fri, 27 Jan 2006, Dave H. wrote:

If everybody in your country had a pink car, what would you have?

Ha ha. Except that if everybody in my country had a Car.create_car(“pink”)
car, then the entire country would have exactly one car.

Exactly. Which is why it’s a lousy example, except in the sense it
clearly
alerts you to the deficiency of the scheme.

Where I used the trick most recently was in parsing .c and .h files for
various useful info I needed to process the header files #include’d
as well.

ie. #include “pink_car.h”

Initially my code was just…

subresult = ParseFile.new( “pink_car.h”)
but then I noticed I was reparsing common header files far more often
than
I needed to. Hence memoization trick. A very simple two line change and
off I go again with a good burst of speed.

Of course after parsing all the files I need to remember to @@memo = nil
to actually release the memory…

Why use a class variable for this? It’s Good Encapsulation. Who should
know whether you have ever produced a pink car before? All the pink
car
users? Or the car factory? Answer, the Car Factory, ie. the Car class.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : removed_email_address@domain.invalid
New Zealand

Carter’s Clarification of Murphy’s Law.

“Things only ever go right so that they may go more spectacularly wrong
later.”

From this principle, all of life and physics may be deduced.


#13

On Jan 26, 2006, at 4:18 PM, Dave H. wrote:

ps.

Often memoization is used to speed up functions that are
referentially transparent

( Referentially transparent means that
given a function f, a == b implies that f(a) == f(b). In other words,
if you call the function with the same argument, more than once
you’ll always get back the same answer. def f(x); x +1; end is
referentially transparent. Well, unless someone has been playing with
Integer#+. IO#gets for instance is not.)

when they take a long time to process. One example is computing the
fibonacci sequence (an equally silly example as the car.). If you use
memoization, all of a sudden the naive recursive implementation isn’t
nearly as bad.

since f(0 or 1) = 1
and f(x) = f(x - 1) + f(x - 2)
you’ll notice that f(x) = f(x - 2) + f(x - 3) + f(x - 2). Since we
are memoizing, it only computes f(x-2) once, the second time it needs
f(x-2) its an O(1) hash lookup, and so on. Normally of course
recursive fibonacci is O(2**n).


#14

On Jan 26, 2006, at 12:59, John C. wrote:

@@memo = Hash.new{|hash,key| hash[key] = Car.new(key)}

@@memo[“pink”]

Will say, oo, I have no “pink” element, I better go off and make one
then, fortunately I have a block on my “new” so I know how to do that.

ps.

If everybody in your country had a pink car, what would you have?

Ha ha. Except that if everybody in my country had a
Car.create_car(“pink”) car, then the entire country would have exactly
one car. I can’t figure out what it’s good for; any example I can think
of in my head is just a complicated way of doing something simple.
Either “well, I’ll just keep the object around in the first place,” or
“no, I would have to have my very OWN pink car, not be FlexCar-ing it
with other parts of the program”


#15

Hello,
I would really like to learn more about memoization
but unfortunately the gem doesnt seem to install properly on
my system (Ubuntu Linux) and I was told that in order
to get downloaded gems to work I would probably have
to uninstall the apt versions of ruby and rebuild everything
from source.

I dont really want to do that, so I was wondering if
the memoization library is small enough that I could
just put a copy into my path and include it that way,
rather than as an installed gem.

Does that sound like a sane alternative, or is my only
option to rip the ruby deb’s out of my system and
start from scratch?


#16

Ruby Hashes, Blocks & Memoization

http://opengl.geek.nz/ruby.html


#17

On Jan 27, 2006, at 6:19 PM, Alex C. wrote:

I dont really want to do that, so I was wondering if
the memoization library is small enough that I could
just put a copy into my path and include it that way,
rather than as an installed gem.

You bet. It’s a single file. Just download the source from
RubyForge and move /lib/memoize.rb into your path. It will work just
fine.

James Edward G. II


#18

Alex C. wrote:

Hello,
I would really like to learn more about memoization
but unfortunately the gem doesnt seem to install properly on
my system (Ubuntu Linux) and I was told that in order
to get downloaded gems to work I would probably have
to uninstall the apt versions of ruby and rebuild everything
from source.

Do you know this to be an Ubuntu issue, or just a problem you recently
noticed? There have been, I believe, some problems with the gem servers
the last day or so, and it may worth taking another crack at gem
installation.

James


#19

On 1/27/06, James Edward G. II removed_email_address@domain.invalid wrote:

You bet. It’s a single file. Just download the source from
RubyForge and move /lib/memoize.rb into your path. It will work just
fine.

Thanks James!
Wow, dont want to fanboy here, but wow.

$cat fib.rb
def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end
puts fib(35)

$ time ruby fib.rb
9227465

real 0m41.121s
user 0m37.128s
sys 0m3.854s

$cat mem_fib.rb
require “memoize”
include Memoize
def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end
memoize(:fib)
puts fib(35)

$ time ruby mem_fib.rb
9227465

real 0m0.011s
user 0m0.008s
sys 0m0.003s

Yikes! This is definitly something I’m going to need to learn to use,
and use
as often as possible. So anywho, took a quick peek at memoize.rb and was
again just blown away but this time because the entire file was only 39
lines
of code including white space! How can this be?

Unfortuantely, there are no comments to be had, so I was wondering
if someone would be so kind as to reply with inserted comments in the
code,
(yes I know ruby is supposed to be readable, but im new, ok? :slight_smile:
Just as a precaution so that I don’t make any weird assumptions,
something I’m prone to do. This would be greatly appreciated.

Bored? Nothing to do on a Friday night?
Well here is some source. Comment away all you bored rubyists!

module Memoize
MEMOIZE_VERSION = “1.2.0”

Memoize the method +name+. If +file+ is provided, then the method

results

are stored on disk rather than in memory. This consumes virtually

no

memory and is persistant.

def memoize(name, file=nil)
meth = method(name)

  if file
     cache = Hash.new.update(Marshal.load(File.read(file))) rescue 

{}
else
cache = {}
end

  if file
     (class << self; self; end).class_eval do
        define_method(name) do |*args|
           unless cache.has_key?(args)
              cache[args] = meth.call(*args)
              File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
           end
           cache[args]
        end
     end
  else
     (class << self; self; end).class_eval do
        define_method(name) do |*args|
           if cache.has_key?(args)
              cache[args]
           else
              cache[args] ||= meth.call(*args)
           end
        end
     end
  end
  cache

end


#20

On Jan 28, 2006, at 12:39 AM, James B. wrote:

Basically, the arguments to the target method are used as a hash
key into a cache. The cache hash is stored either in memory, or
(optionally) on disk.

It’s actually always in memory with the current implementation, and
can optionally be duplicated on disk.

(But note that an in-memory hash will exist for each instance. A
file allows for all instances to share values, and to have those
values persist from runs of the application, but may introduce race
conditions.)

The current file support doesn’t really get you all the way to
sharing a cache among objects, since it’s only read from at the time
the object is converted to use memoization.

The current file implementation is only for persistence, in my
opinion, and really for just one process at that.

See the “Rethinking Memoization” for more details on all of this…

James Edward G. II