Object#freeze as a basis for caching of method results?

Hello, ruby-talk.

After profiling my code I figured out I might want to attempt some
caching of results obtained in ‘heavy’ methods. Is there an idiomatic
way to do method-results caching in Ruby?

The first thing that came to my mind as a mean to ensure the cache’s
‘freshness’ is to freeze the object in question. Does this make sense?

For example, assume that a Blanket is a Set of Blocks, and
that there’s a computation-intesive method ‘separations’:

class Blanket < Set

def separations
seps = Set[]
# some ‘heavy’ code that builds seps
seps
end

end

Would the below make sense?

class Blanket < Set

def freeze
each { |block| block.freeze }
super
end

def separations
return @seps if @seps
@seps = Set[]
# the ‘heavy’ code from that builds @seps this time
freeze
@seps
end

end

I guess my question boils down to what does exactly Object#freeze
prevent from being modified – simply all the properties?

If so, does it mean I can only have one method like the above (because
if some other method did any freezing, I couldn’t initialise the @seps
cache when the first call to separations occurs)?

Is there any ‘best practice’ in Ruby
with regards to caching method results?

Is there any obvious other approach to such caching? (I thought about
having a @cache instance variable that would get reset on object
changes, but I’m not sure how to obtain all the list of methods that
can change an object – i.e., methods that throw error when a given
object is frozen.)

Thanks in advance for any replies/suggestions/insights!

– Shot

Hi~

On Feb 19, 2008, at 8:01 AM, Shot (Piotr S.) wrote:

that there’s a computation-intesive method ‘separations’:

def freeze
end
cache when the first call to separations occurs)?
Thanks in advance for any replies/suggestions/insights!

The usual idiom for caching heavy results is something like this:

def foo
@foo ||= some_long_calculation
end

Cheers-

Ezra Z.:

The usual idiom for caching heavy results is something like this:

def foo
@foo ||= some_long_calculation
end

Thanks. I’m aware of the ||= construct, but it doesn’t really answer the
bigger issue of how to handle cache freshness (see my original email).

Basically, I was wondering whether freezing can be (sanely) used to make
sure we don’t have ‘stale’ cache (in a bit of a reverse/radical way – by
freezing the object when we do any caching).

Alternatively, does the approach of invalidating a cache on
object-changing method calls make sense? If so, is there a way
to get all the methods that change an object (say, a Set) – i.e.,
the ones that would trigger the ‘TypeError: can’t modify frozen X’
exception?

– Shot

2008/2/19, Shot (Piotr S.) [email protected]:

After profiling my code I figured out I might want to attempt some
caching of results obtained in ‘heavy’ methods. Is there an idiomatic
way to do method-results caching in Ruby?

http://raa.ruby-lang.org/project/memoize/

I guess my question boils down to what does exactly Object#freeze
prevent from being modified – simply all the properties?

Yes. You cannot assign instance variables any more once an instance is
frozen.

If so, does it mean I can only have one method like the above (because
if some other method did any freezing, I couldn’t initialise the @seps
cache when the first call to separations occurs)?

An alternative approach would be to use current state as cache key,
i.e. create an immutable copy and stuff that along with calculation
results into a Hash.

Is there any ‘best practice’ in Ruby
with regards to caching method results?

Memoize, see above.

Is there any obvious other approach to such caching? (I thought about
having a @cache instance variable that would get reset on object
changes, but I’m not sure how to obtain all the list of methods that
can change an object – i.e., methods that throw error when a given
object is frozen.)

I suggest you do not inherit Set. In that case it’s easy: you
define which methods change the state of your class.

Kind regards

robert

2008/2/20, Shot (Piotr S.) [email protected]:

memoized method’s result does not depend on the object state or the
object state doesn’t change, right?

Correct. IIRC Memoize uses the method argument array to do cache
lookups. I don’t know whether Memoize takes measures to avoid
aliasing effects but that can be tested easily, e.g.

a = [1,2,3]
foo(a)
a << 4
foo(a)

foo must of course print something to the screen or such so you see
when it’s invoked.

#eql? and #hash are sensibly implemented (because Hash uses them for key
comparison), right?

Correct. But I believe this is true for Memoize also, i.e. if you
decide to use something as key to determine whether a calculation has
to be redone you better make sure it properly implements #hash, #eql?
etc.

I suggest you do not inherit Set. In that case it’s easy:
you define which methods change the state of your class.

Hm, that’s true; also, if I get all this right, the ones
that change the state could simply clear Memoize’s cache.

I’ll see how much of the stuff inherited from Set I actually use –
quite a bit, I assume, but then I could simply make an instance variable
of @set and pass all these method calls to it (while selectively
invalidating cache)…

Delegator may help here although in this case I’d probably rather
explicitly forward method invocations because automatic delegation
does have issues of its own, for example it does not “correct” return
values:

$ irb -r delegate
irb(main):001:0> Delegat
DelegateClass Delegater Delegator
irb(main):001:0> s=[1,2,3]
=> [1, 2, 3]
irb(main):003:0> so = SimpleDelegator.new(s)
=> [1, 2, 3]
irb(main):004:0> so.size
=> 3
irb(main):005:0> so.object_id
=> 1073413300
irb(main):006:0> s.object_id
=> 1073463120
irb(main):007:0> so.each {}.object_id
=> 1073463120
irb(main):008:0>

/so.each/ should rather return /so/ but it returns /s/.

Thanks a ton, Robert; as usual, your reply is both
invaluable and makes me think in the right direction.

Thank you! You’re welcome! I am glad that my post proved useful.

Kind regards

robert

Robert K.:

2008/2/19, Shot (Piotr S.) [email protected]:

Is there an idiomatic way to do method-results caching in Ruby?

http://raa.ruby-lang.org/project/memoize/

Ah, perfect, thanks. From both the gem’s docs and the source itself,
it looks like it nicely handles the caching, provided that either the
memoized method’s result does not depend on the object state or the
object state doesn’t change, right?

An alternative approach would be to use current state as cache key,
i.e. create an immutable copy and stuff that along with calculation
results into a Hash.

Hm, that’s actually an approach worth considering. I do need to take
memory use into account, unfortunately, but it seems this is worth
testing.

I assume this approach highly depends on (a) making sure #freeze freezes
also all referenced objects (like in my original example) and (b) #==,
#eql? and #hash are sensibly implemented (because Hash uses them for key
comparison), right?

I suggest you do not inherit Set. In that case it’s easy:
you define which methods change the state of your class.

Hm, that’s true; also, if I get all this right, the ones
that change the state could simply clear Memoize’s cache.

I’ll see how much of the stuff inherited from Set I actually use –
quite a bit, I assume, but then I could simply make an instance variable
of @set and pass all these method calls to it (while selectively
invalidating cache)…

Thanks a ton, Robert; as usual, your reply is both
invaluable and makes me think in the right direction.

– Shot

Robert K.:

An alternative approach would be to use current state as cache key,
i.e. create an immutable copy and stuff that along with calculation
results into a Hash.

What would you say about the following approach? I’m not sure it scales
well, but seems to work properly, have the smallest diff impact on the
existing codebase and take object state into account…

shot@devielle:~/work/PhD/bzr/trunk$ bzr diff
=== modified file ‘trunk/lib/art-decomp/fsm.rb’
— trunk/lib/art-decomp/fsm.rb 2008-02-17 21:36:39 +0000
+++ trunk/lib/art-decomp/fsm.rb 2008-02-21 09:03:53 +0000
@@ -1,5 +1,7 @@
class ArtDecomp::FSM < ArtDecomp::TruthTable

  • @@cache = Hash.new { |hash, key| hash[key] = {} }

  • attr_accessor :q, :qp

    def self.from_kiss string
    @@ -31,7 +33,7 @@
    end

    def beta_f

  • outputs.to_blanket
  • @@cache[Marshal.dump self][:beta_f] ||= outputs.to_blanket
    end

def beta_q

– Shot

2008/2/21, Shot (Piotr S.) [email protected]:

shot@devielle:~/work/PhD/bzr/trunk$ bzr diff
def self.from_kiss string
@@ -31,7 +33,7 @@
end

def beta_f

  • outputs.to_blanket
  • @@cache[Marshal.dump self][:beta_f] ||= outputs.to_blanket
    end

def beta_q

Funny that you mention it: I had thought of copying the key via
Marshal#dump and #load. Using the marshaled string is a nice idea!
You can make this a tad more efficient by doing

@@cache[Marshal.dump(self).freeze][:beta_f] ||= outputs.to_blanket

because there is an optimization in Hash that copies unfrozen Strings
that are used as Hash keys in order to avoid aliasing effects.

Now, whether you use the String or demarshal probably mainly depends
on memory usage. If the String is short enough that approach is
certainly preferable because it incurs less processing overhead
(demarshaling).

So the Ether Bunny goes hippety hopity down the garden path, waylaying
innocent fieldmice and anesthetising them, so he can sell their teeth
to the Tooth Fairy to support his milk-and-cookies habit.

What kind of dope are you smoking? :slight_smile:

Cheers

robert

// Thanks a lot for that delegate tutorial in the other post –
// I finally understood what’s delegation for and how to do it! :slight_smile:

Robert K.:

2008/2/21, Shot (Piotr S.) [email protected]:

  • @@cache = Hash.new { |hash, key| hash[key] = {} }
    def beta_f
  • outputs.to_blanket
  • @@cache[Marshal.dump self][:beta_f] ||= outputs.to_blanket
    end

Funny that you mention it: I had thought of copying the key via
Marshal#dump and #load. Using the marshaled string is a nice idea!

You can make this a tad more efficient by doing

@@cache[Marshal.dump(self).freeze][:beta_f] ||= outputs.to_blanket

because there is an optimization in Hash that copies unfrozen Strings
that are used as Hash keys in order to avoid aliasing effects.

I ran some very rough and unscientific tests
with ruby-prof and the results are as follows:

uncached: 44.42s
Marshal.dump: 0.60s
Marshal.dump.freeze: 0.59s
freezing the object: 0.43s

This is for 96 calls, though, so I assume freezing the dump will have
a bigger impact on more-often-called methods (but then the gain from
simply caching the value ‘as is’ and freezing the object in question
will be bigger as well, as marshalling takes its toll on every call).

Running the whole application in Ruby 1.9 is roughtly
three times faster than in 1.8, but I can’t¹ profile it. :frowning:

¹
http://groups.google.com/group/ruby-talk-google/browse_thread/thread/b3d26892c642779c

Now, whether you use the String or demarshal probably mainly depends
on memory usage. If the String is short enough that approach is
certainly preferable because it incurs less processing overhead
(demarshaling).

Ok, I’m totally lost here. I don’t see any demarshalling happening,
just marshalling (and using that as a key)… What am I missing?

If keeping the marshalled objects in the memory turns out to be
a problem, I can try trading it for speed (and a tiny bit of confidence)
by MD5-hashing them, but I assume MD5 is relatively slow, so it’d be an
exchange at a rather bad rate.

So the Ether Bunny goes hippety hopity down the garden path, waylaying
innocent fieldmice and anesthetising them, so he can sell their teeth
to the Tooth Fairy to support his milk-and-cookies habit.

What kind of dope are you smoking? :slight_smile:

I’m a PHP programmer² by day.

² http://civicrm.org/

– Shot (seriously, though, it’s a sig of unknown origin from Stewart
Stremler’s collection:
http://www-rohan.sdsu.edu/~stremler/sigs/sigs.html)

Shot (Piotr S.):

Robert K.:

Now, whether you use the String or demarshal probably mainly depends
on memory usage. If the String is short enough that approach is
certainly preferable because it incurs less processing overhead
(demarshaling).

Ok, I’m totally lost here. I don’t see any demarshalling happening,
just marshalling (and using that as a key)… What am I missing?

Ok, I got it now (stupid me) – you discuss keing the
cache with Marshal.dump(self) as opposed to keying it
with Marshal.load(Marshal.dump(self)), right?

Are marshalled objects in general much
bigger (memory-wise) than the ‘originals’?

Also, I usually define dup as Marshal.load(Marshal.dump(self))
– provided it’s not a bottleneck, does this approach has any vices?

Lastly, assuming I have a working #dup (Marshal-based or otherwise),
wouldn’t dup.freeze be a better/more elegant key than dump+load? Does
it has any vices?

– Shot

2008/2/21, Shot (Piotr S.) [email protected]:

// Thanks a lot for that delegate tutorial in the other post –
// I finally understood what’s delegation for and how to do it! :slight_smile:

Nice side effect of this thread. :slight_smile:

Robert K.:

2008/2/21, Shot (Piotr S.) [email protected]:

Now, whether you use the String or demarshal probably mainly depends
on memory usage. If the String is short enough that approach is
certainly preferable because it incurs less processing overhead
(demarshaling).

Ok, I’m totally lost here. I don’t see any demarshalling happening,
just marshalling (and using that as a key)… What am I missing?

Demarshalling was the option I had thought of. A copy through
marshalling might use less memory than the marshaled form (the
String).

What kind of dope are you smoking? :slight_smile:

I’m a PHP programmer² by day.

Uh, oh. cough :slight_smile:

² http://civicrm.org/

– Shot (seriously, though, it’s a sig of unknown origin from Stewart
Stremler’s collection: http://www-rohan.sdsu.edu/~stremler/sigs/sigs.html)

:slight_smile:

Cheers

robert

Robert K.:

Demarshalling was the option I had thought of. A copy through
marshalling might use less memory than the marshaled form
(the String).

Right, but then I’m back to defining a sensible
#==, #eql? and #hash for the class in question.

On the other hand, with serialization I’m just silently handing over the
decision to Marshal, so I might as well (from the logical point of view,
barring the resource overhead) assume they’re defined as

def == other
Marshal.dump(self) == Marshal.dump(other)
end

def eql?
Marshal.dump(self).eql? Marshal.dump(other)
end

def hash
Marshal.dump(self).hash
end

Hmmm. :slight_smile:

I’ll go with using the Marshal strings as keys and see how
expensive it turns out to be. Thanks a lot again, Robert!

– Shot