Forum: Ruby Rethinking Memoization

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-24 18:26
(Received via mailing list)
Daniel Berger and I have been having an off-list discussion about his
memoize.rb library.  You can find that library at:

http://raa.ruby-lang.org/project/memoize/

In the course of the discussion, I ended up building a library of my
own, to show Daniel what I was talking about.  Daniel thought it
might be worth moving the discussion of my new library here, to get
community feedback.

The primary difference between our two approaches is that Daniel's
library is built to memoize individual objects, while mine is
intended to link all instance method calls for a class of objects to
a single cache.

I wanted this behavior because I felt it would increase cache hits
and make memoizing methods that much more worthwhile.  Daniel pointed
out though that a finer grained control can be important, to avoid
exhausting memory with the cache.  Luckily, Ruby's syntax makes it
trivial to use my library to alter a unique object as well.

Here's an example of using my library to memoize an instance method,
the intended usage:

#!/usr/local/bin/ruby -w

# instance_methods.rb
#
#  Created by James Edward Gray II on 2006-01-23.
#  Copyright 2006 Gray Productions. All rights reserved.

require "memoizable"

class Fibonacci
   extend Memoizable

   def fib( num )
     return num if num < 2
     fib(num - 1) + fib(num - 2)
   end
   memoize :fib
end

puts "This method is memoized, and will run very fast..."
start = Time.now
puts "fib(100):  #{Fibonacci.new.fib(100)}"
puts "Run time:  #{Time.now - start} seconds"

puts
puts "All objects share a cache, so this call, is even faster..."
start = Time.now
puts "fib(100):  #{Fibonacci.new.fib(100)}"  # simple cache hit
puts "Run time:  #{Time.now - start} seconds"

__END__

Also, here is how you would use the library to affect individual
objects:

#!/usr/local/bin/ruby -w

# singleton_objects.rb
#
#  Created by James Edward Gray II on 2006-01-23.
#  Copyright 2006 Gray Productions. All rights reserved.

require "memoizable"

class Fibonacci
   def fib( num )
     return num if num < 2
     fib(num - 1) + fib(num - 2)
   end
end

slow = Fibonacci.new

puts "This method is not memoized and thus slow..."
start = Time.now
puts "slow.fib(30):  #{slow.fib(30)}"
puts "    Run time:  #{Time.now - start} seconds"

fast = Fibonacci.new
class << fast  # memoize just this object
   extend Memoizable
   memoize :fib
end

puts
puts "We can fix that for a unique object..."
start = Time.now
puts "fast.fib(30):  #{fast.fib(30)}"
puts "    Run time:  #{Time.now - start} seconds"

puts
puts "But the original is still slow..."
start = Time.now
puts "slow.fib(30):  #{slow.fib(30)}"
puts "    Run time:  #{Time.now - start} seconds"

__END__

My library also works for class/module methods and even top-level
methods, though I will spare you those examples.

The other difference between our libraries is that Daniel's supports
using a file for persistent caching, while my library supports using
a custom cache object.  That means that it's a little more work to
cache to a file using mine, but you can do other kinds of caching as
well.  Here's a file cache example:

#!/usr/local/bin/ruby -w

# file_persistance.rb
#
#  Created by James Edward Gray II on 2006-01-23.
#  Copyright 2006 Gray Productions. All rights reserved.

require "memoizable"

#
# A trivial implementation of a custom cache.  This cache uses disk
storage,
# instead of a Hash in memory.  Access is slower than using an in-
memory cache,
# though still much faster than a non-memoized method, but persistant
between
# program runs.
#
# *WARNING*:  This implementation is not thread-safe!
#
class FileCache
   def initialize( path )
     @path = path
   end

   def []( key )
     if File.exist? @path
       File.foreach(@path) do |entry|
         return entry.split(" ").last.to_i if entry =~ /\A#{key}: /
       end
     end
     nil
   end

   def []=( key, value )
     File.open(@path, "a") { |cache| cache.puts "#{key}: #{value}" }
   end
end

class Fibonacci
   extend Memoizable

   def fib( num )
     return num if num < 2
     fib(num - 1) + fib(num - 2)
   end
   memoize :fib, FileCache.new("fib_cache.txt")
end

puts "This method is memoized using a file-based cache.  See
fib_cache.txt..."
start = Time.now
puts "fib(100):  #{Fibonacci.new.fib(100)}"
puts "Run time:  #{Time.now - start} seconds"

puts
puts "Run again to see the file cache at work."

__END__

You can find an example using weak references and the actual library
code in the "Memoization" section of the following article from my blog:

http://blog.grayproductions.net/articles/2006/01/2...
memoization

The point of posting all this here is to give people a chance to
express concerns over my implementation.  Daniel was avoiding going
down this road because of issues raised by this community.  Raise
away.  ;)

If there is any interest, and we don't prove the library horribly
broken, I would be happy to package it up.

Thanks.

James Edward Gray II
Fe9b2d0628c0943af374b2fe5b320a82?d=identicon&s=25 Eero Saynatkari (rue)
on 2006-01-24 21:21
James Gray wrote:
> Daniel Berger and I have been having an off-list discussion about his
> memoize.rb library.  You can find that library at:
>
> http://raa.ruby-lang.org/project/memoize/
>
> In the course of the discussion, I ended up building a library of my
> own, to show Daniel what I was talking about.  Daniel thought it
> might be worth moving the discussion of my new library here, to get
> community feedback.
>
> < snip feature discussion due to RForum />

Good stuff! The only idea I have just from an end-user
perspective is that you could hook into Module#append_features
to enable using #include instead of #extend at the class
level (this should also allow just using #extend on individual
objects without having to explicitly open the singleton class).

> Thanks.
>
> James Edward Gray II


E
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-01-25 00:33
(Received via mailing list)
On Wed, Jan 25, 2006 at 01:27:25AM +0900, James Edward Gray II wrote:
> Daniel Berger and I have been having an off-list discussion about his
> memoize.rb library.
[...]
> The primary difference between our two approaches is that Daniel's
> library is built to memoize individual objects, while mine is
> intended to link all instance method calls for a class of objects to
> a single cache.

I also generalized Daniel's memoize.rb to support class-level
memoization some time ago[1]:

 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

Should I also release it? ;)

> class Fibonacci
>   extend Memoizable
>
>   def fib( num )
>     return num if num < 2
>     fib(num - 1) + fib(num - 2)
>   end
>   memoize :fib, FileCache.new("fib_cache.txt")
> end

I kept memoize.rb's interface, but I think I prefer this approach; maybe
special-casing for strings could make sense though...

> You can find an example using weak references and the actual library
> code in the "Memoization" section of the following article from my blog:
>
> http://blog.grayproductions.net/articles/2006/01/2...
>
> The point of posting all this here is to give people a chance to
> express concerns over my implementation.  Daniel was avoiding going
> down this road because of issues raised by this community.  Raise
> away.  ;)

I'm not sure about the
  ([Class, Module].include?(self.class) ? self : self.class)
part; that way, you cannot memoize singleton methods from Module/Class
objects. In my implementation, I distinguished between
Module#instance_memoize and Object#memoize (after including Memoize).

Also,
    original =   "_#{name}"
would need a longer prefix IMO.

Finally, not that it really matters, but the WeakCache is fairly
inefficient (one lambda per key); see that other thread (WeakRef hash)
or
  http://eigenclass.org/hiki.rb?weakhash+and+weakref
for other implementations.

[1] IIRC persistent caching in Daniel Berger's memoize.rb was inspired
by
something I wrote, I did a sort of rewrite of memoize.rb and now see my
name
mentioned in your blog; it seems memoize.rb and I are bound by the
string of
destiny (*^_^*)
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-25 05:11
(Received via mailing list)
On Jan 24, 2006, at 2:21 PM, Eero Saynatkari wrote:

>>
>> < snip feature discussion due to RForum />
>
> Good stuff! The only idea I have just from an end-user
> perspective is that you could hook into Module#append_features
> to enable using #include instead of #extend at the class
> level (this should also allow just using #extend on individual
> objects without having to explicitly open the singleton class).

That's an interesting idea.  It does seem this is what extend() is
for though and it's certainly been used like this in the past (see
Forwardable).

I do like the idea of not having to use the singleton class on
individual objects though.  Hmm, wait, doesn't that work now...
<checks>  Yep, seems to.

What do others think, is extend() okay in the classes?

James Edward Gray II
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-25 16:50
(Received via mailing list)
On Jan 24, 2006, at 5:31 PM, Mauricio Fernandez wrote:

> I also generalized Daniel's memoize.rb to support class-level
> memoization some time ago[1]:
>
>  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

I should have mentioned that.  Daniel did point me to this message
during our conversation.

> I kept memoize.rb's interface, but I think I prefer this approach;
> maybe
> special-casing for strings could make sense though...

Honestly, I wasn't planning to provide any caches at all, just let
users make their own.  Now that you mention it though, pre-loading
some is a neat idea.  People could have access to Memoize::WeakCache
and Memoize::FileCache for example.

I haven't seen a file cache I like yet though.  :)

You use a file if you want persistence or to save memory.  The file
cache used by memoize.rb gets you persistence, but really only for
one instance of your program running simultaneously.  It doesn't save
memory at all.

If we want to go all the way, we have to start reading from the file
cache too.  Then we need to make the check for an entry in the file
cache and storing it if it's not there atomic, which I guess we could
do with flock().  Of course, then another process using the file
could halt for a good period while we calculate what to add.

I guess there's no harm in checking for a value and finding it not
there, then starting our calculation.  If another process added it
before we were done, we could just save over it.  (The whole point of
memoization is that we should get the same answer.)  For that though,
we really need a database format of some sort.  We could probably use
SQLite or KirbyBase here.

My own file example does read from the cache, but is incredibly
trivial.  It also can't deal with multiple processes using the same
cache.  My point in showing it was that it's not too difficult to
roll your own, knowing what tradeoffs you can accept.

What do you (and others) think?  Should we provide default cache
objects?  If so, how should the file cache work?

>> away.  ;)
>
> I'm not sure about the
>   ([Class, Module].include?(self.class) ? self : self.class)
> part; that way, you cannot memoize singleton methods from Module/Class
> objects.

I agree.  That's my least favorite part.

> In my implementation, I distinguished between
> Module#instance_memoize and Object#memoize (after including Memoize).

If we're going to go ahead and add a method to Module, why don't we
just eliminate the need for include/extend altogether?  If I put a
method in Module and a method in Objet, they can have identical names
and interface, just work differently under the hood.

Is there any reason not to do this?

> Also,
>     original =   "_#{name}"
> would need a longer prefix IMO.

How does "__unmemoized_#{name}__" grab you?

> Finally, not that it really matters, but the WeakCache is fairly
> inefficient (one lambda per key); see that other thread (WeakRef hash)
> or
>   http://eigenclass.org/hiki.rb?weakhash+and+weakref
> for other implementations.

My goal was to keep the examples very simple, since it's really about
memoize().  However, I loved SimpleWeakCache from your blog and am
now using that.  Thanks!

James Edward Gray II
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-25 19:55
(Received via mailing list)
2006/1/24, James Edward Gray II <james@grayproductions.net>:

Going back to the original question which approach to favour...

> The primary difference between our two approaches is that Daniel's
> library is built to memoize individual objects, while mine is
> intended to link all instance method calls for a class of objects to
> a single cache.
>
> I wanted this behavior because I felt it would increase cache hits
> and make memoizing methods that much more worthwhile.  Daniel pointed
> out though that a finer grained control can be important, to avoid
> exhausting memory with the cache.  Luckily, Ruby's syntax makes it
> trivial to use my library to alter a unique object as well.

I may have missed something but from what I understand I see a
complete different problem: if you memoize for all objects of a class
then you have two options: either use only method name and args as key
(which I believe you do because you want better cache hits) *or* store
add something that identifies the instance (object id, the object
itself) which clearly would lead to cache hit ratios like in Daniel's
approach.

If you have a single cache for all (first alternative) you risk that
cached results are wrong because they may depend on individiual
object's state which you do not reflect in the cache. If you choose
the second approach you make things more complicated than necessary
(think of GC for example and how the cache gets notified if content is
invalidated etc.).

So, basically I prefer Daniel's simpler approach - especially since
you can achieve the same: just create an instance that contains all
code (either directly or by delegation) that is expected to be slow
(and thus should be memoized) and memoize on that class.

> My library also works for class/module methods and even top-level
> methods, though I will spare you those examples.
>
> The other difference between our libraries is that Daniel's supports
> using a file for persistent caching, while my library supports using
> a custom cache object.  That means that it's a little more work to
> cache to a file using mine, but you can do other kinds of caching as
> well.  Here's a file cache example:

I like the idea with a generalized cache.  Default is a hash but it
can be replaced by some file handling object that adhers to the same
interface (even multilel caching with a LRU cache in mem and the whole
cache on disk can be done... :-)

Kind regards

robert
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-25 20:26
(Received via mailing list)
On Jan 25, 2006, at 12:55 PM, Robert Klemme wrote:

> So, basically I prefer Daniel's simpler approach - especially since
> you can achieve the same: just create an instance that contains all
> code (either directly or by delegation) that is expected to be slow
> (and thus should be memoized) and memoize on that class.

Can you show an example or two here?  Using the current memoize.rb,
how do we get objects A, B, and C using the same cache?  I guess I'm
having trouble getting my head around it.

Thanks.

James Edward Gray II
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-26 14:10
(Received via mailing list)
James Edward Gray II wrote:
> On Jan 25, 2006, at 12:55 PM, Robert Klemme wrote:
>
>> So, basically I prefer Daniel's simpler approach - especially since
>> you can achieve the same: just create an instance that contains all
>> code (either directly or by delegation) that is expected to be slow
>> (and thus should be memoized) and memoize on that class.
>
> Can you show an example or two here?  Using the current memoize.rb,
> how do we get objects A, B, and C using the same cache?  I guess I'm
> having trouble getting my head around it.

class SillySomething
  extend Memoizable

  def initialize()
    @a,@b,@c = A.new, B.new, C.new
    # or other initialization
  end

  def calculate_foo(quirk, quark, quork)
     x = @a.foo(quirk) + @b.doit(quark)
     if x > 10
        x += @c.doh(quork, @a)
     else
        x - 2
     end
     x
  end

  memoize :calculate_foo
end

The idea is to group relevant parts of your object model in a single
class
and use an instance of that for memoization.

Maybe I can add an explanation to make it more clear: if you have a
class
X with method Y that calculates something results of invoking Y on
several
instances of X can only be memoized in the same cache if (i) either they
are not influenced by state of X instances or (ii) memoization somehow
uses this state to control lookup.

If (i) I don't see the point in having several objects that do exactly
the
same calculation - methods might even be class methods (i.e. singleton
methods of the *single* class instance X).  So there is actually no need
for a cache that covers several instances.

If (ii) you gain nothing by having a centralized cache because you'll
have
to keep memoized values from different instances separate; in fact you
add
a level of lookup (first object, then method and args - or do it in one
request with a larger key) and separate instance state from the
instance.
IMHO that's inefficient, memory leaking and error prone.

Kind regards

    robert
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-26 15:47
(Received via mailing list)
On Jan 26, 2006, at 7:08 AM, Robert Klemme wrote:

>      if x > 10
>         x += @c.doh(quork, @a)
>      else
>         x - 2
>      end
>      x
>   end
>
>   memoize :calculate_foo
> end

This example runs as written with the module discussed in this
thread.  It requires multiple changes to work with the current
memoize.rb library.

> you add
> a level of lookup (first object, then method and args - or do it in
> one
> request with a larger key) and separate instance state from the
> instance.
> IMHO that's inefficient, memory leaking and error prone.

I understand know.  You're just against this feature.  Fair enough.

Thanks for taking the time to explain it to me.

James Edward Gray II
This topic is locked and can not be replied to.