Memoization, files and Marshal?

Hi all,

Inspired by a recent blog entry by Mauricio F., I decided to try
to add a method to the memoize package that would allow users to
memoize using a file based approach rather than memory, either to
conserve memory or for persistance.

However, I’m not sure it’s possible. If it is, I’d love to know how.
If it’s not, well, pstore it is.

Here’s what I tried:

def memoize_to_file(name, file)
meth = method(name)
cache = Hash.new.update(Marshal.load(File.read(file))) rescue nil
cache ||= {}
(class << self; self; end).class_eval do
define_method(name) do |*args|
if cache.has_key?(args)
cache[args]
else
cache[args] ||= meth.call(*args)
File.open(file, “wb+”){ |f| Marshal.dump(cache, f) }
end
end
end
cache
end

Code snippet

include Memoize
def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end
memoize_to_file(:fib, “temp.txt”)
fib(10)

However, running that gives me this error: in fib': undefined method+’ for #<File:temp.txt (closed)> (NoMethodError)

Any ideas?

Thanks,

Dan

On Sun, 1 Jan 2006, Daniel B. wrote:

Here’s what I tried:
cache[args] ||= meth.call(*args)
return n if n < 2
Thanks,
harp:~ > cat a.rb
def memoize_to_file(name, file)
cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
(class << self; self; end).class_eval do
define_method(name) do |*args|
unless cache.has_key?(args)
cache[args] = method(name).call(*args)
File.open(file, “wb+”){|f| Marshal.dump(cache, f) }
end
cache[args]
end
end
end

def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end

memoize_to_file “fib”, “fib.cache”
n = fib 10
p n

harp:~ > ruby a.rb
55

regards.

-a

[email protected] wrote:

On Sun, 1 Jan 2006, Daniel B. wrote:

    end

p n
As is, I get a stack level too deep error on Windows. If I change the
“cache[args] =” to “cache[args] ||=”, I always get back nil.

I’m probably being thick here, but any help is appreciated.

Thanks,

Dan

[email protected] wrote:

 (class << self; self; end).class_eval do

def fib(n)

  (class << self; self; end).class_eval do

cheers.

-a

Thanks, that did the trick nicely. It’s now part of memoize 1.2.0. :slight_smile:

Regards,

Dan

On Sun, 1 Jan 2006, Daniel B. wrote:

       unless cache.has_key?(args)
fib(n-1) + fib(n-2)

end

memoize_to_file “fib”, “fib.cache”
n = fib 10
p n

As is, I get a stack level too deep error on Windows. If I change the
“cache[args] =” to “cache[args] ||=”, I always get back nil.

I’m probably being thick here, but any help is appreciated.

sorry, don’t know how that’s running on linux in the first place… you
need:

— a.rb 2005-12-31 20:30:59.000000000 -0700
+++ b.rb 2005-12-31 20:31:10.000000000 -0700
@@ -1,9 +1,10 @@
def memoize_to_file(name, file)

  • meth = method(name)
    cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
    (class << self; self; end).class_eval do
    define_method(name) do |*args|
    unless cache.has_key?(args)
  •      cache[args] = method(name).call(*args)
    
  •      cache[args] = meth.call(*args)
         File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
       end
       cache[args]
    

with that it works for me on linux and windows.

cheers.

-a

On Sun, Jan 01, 2006 at 09:32:56PM +0900, Daniel B. wrote:

        unless cache.has_key?(args)
  •      cache[args] = method(name).call(*args)
    
  •      cache[args] = meth.call(*args)
         File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
       end
       cache[args]
    

[…]

Thanks, that did the trick nicely. It’s now part of memoize 1.2.0. :slight_smile:

Dumping the cache on every miss seems expensive.
I changed the code to save it once every N misses, and made the cache
update
atomic to prevent problems with:

  • concurrent executions of the memoized method (processes and threads)
  • no HD space being left (the previous cache state is preserved)

I also applied those changes to memoize.rb 1.2.0 and generalized it to
work
with instance methods, see the patch after memo.rb.

batsman@tux-chan:/tmp$ cat memo.rb
def memoize_to_file(name, file = nil, period = -1)
meth = method(name)
cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
save_cache_proc = lambda do
tmpfile = “#{file}.#{Process.pid}”
begin
File.open(tmpfile, “wb+”){|f| Marshal.dump(cache, f) }
File.rename tmpfile, file
rescue Exception
end
end
at_exit { save_cache_proc.call } if file
calls = period
(class << self; self; end).class_eval do
define_method(name) do |*args|
unless cache.has_key?(args)
cache[args] = meth.call(*args)
if file && period != -1 && (calls -= 1) <= 0
calls = period
save_cache_proc.call
end
end
cache[args]
end
end
end

def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end

def fib2(n)
return n if n < 2
fib2(n-1) + fib2(n-2)
end

def fib3(n)
return n if n < 2
fib3(n-1) + fib3(n-2)
end

memoize_to_file “fib”, “fib.cache”, 1
memoize_to_file “fib2”, “fib2.cache”
memoize_to_file “fib3”, “fib3.cache”, 10
require ‘benchmark’
Benchmark.bmbm(10) do |bm|
bm.report(“period 1”) { fib 1000 }
bm.report(“period 10”){ fib3 1000 }
bm.report(“at_exit”) { fib2 1000 }
end
batsman@tux-chan:/tmp$ ruby memo.rb
Rehearsal ---------------------------------------------
period 1 4.030000 0.510000 4.540000 ( 5.507344)
period 10 0.430000 0.060000 0.490000 ( 0.570224)
at_exit 0.030000 0.000000 0.030000 ( 0.039652)
------------------------------------ total: 5.060000sec

            user     system      total        real

period 1 0.000000 0.000000 0.000000 ( 0.000035)
period 10 0.000000 0.000000 0.000000 ( 0.000033)
at_exit 0.000000 0.000000 0.000000 ( 0.000035)

The patch to memoize.rb:

— memoize.rb.orig 2006-01-02 11:02:54.000000000 +0100
+++ memoize.rb 2006-01-02 12:31:46.000000000 +0100
@@ -1,39 +1,106 @@
module Memoize
MEMOIZE_VERSION = “1.2.0”

  • def self.memoize(recv, name, file=nil, period=-1)
  •  class << recv; self end.instance_eval{ instance_memoize(name, 
    

file, period) }

  • end
  • def memoize(name, file=nil, period=-1)
  •  Memoize.memoize(self, name, file, period)
    
  • end
    +end

+class Module
# Memoize the method +name+. If +file+ is provided, then the method
results

  • are stored on disk rather than in memory. This consumes virtually

no

  • memory and is persistant.

  • def memoize(name, file=nil)
  •  meth = method(name)
    
  • are stored on disk in addition to the in-memory cache.

  • def instance_memoize(name, file=nil, period=-1)
  •  meth = instance_method(name)
    
  •  if file
        cache = Hash.new.update(Marshal.load(File.read(file))) rescue 
    

{}
else
cache = {}
end

  •  if file
    
  •     (class << self; self; end).class_eval do
    
  •        define_method(name) do |*args|
    
  •           unless cache.has_key?(args)
    
  •              cache[args] = meth.call(*args)
    
  •              File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
    
  •           end
    
  •           cache[args]
    
  •        end
    
  •  save_cache_proc = lambda do
    
  •     tmpfile = "#{file}.#{Process.pid}"
    
  •     begin
    
  •        File.open(tmpfile, "wb+"){|f| Marshal.dump(cache, f) }
    
  •        File.rename tmpfile, file
    
  •     rescue Exception
        end
    
  •  else
    
  •     (class << self; self; end).class_eval do
    
  •        define_method(name) do |*args|
    
  •           if cache.has_key?(args)
    
  •              cache[args]
    
  •           else
    
  •              cache[args] ||= meth.call(*args)
    
  •           end
    
  •  end
    
  •  at_exit { save_cache_proc.call } if file
    
  •  calls = period
    
  •  define_method(name) do |*args|
    
  •     unless cache.has_key?(args)
    
  •        cache[args] = meth.bind(self).call(*args)
    
  •        if file && period != -1 && (calls -= 1) <= 0
    
  •           calls = period
    
  •           save_cache_proc.call
           end
        end
    
  •     cache[args]
     end
    
  •  cache
    
    end
    -end
    \ No newline at end of file
    +end

+if FILE == $0

  • def fib(n)
  •  return n if n < 2
    
  •  fib(n-1) + fib(n-2)
    
  • end
  • def fib2(n)
  •  return n if n < 2
    
  •  fib2(n-1) + fib2(n-2)
    
  • end
  • def fib3(n)
  •  return n if n < 2
    
  •  fib3(n-1) + fib3(n-2)
    
  • end
  • include Memoize
  • module DumbFib
  •  def fib(n)
    
  •     return n if n < 2
    
  •     fib(n-1) + fib(n-2)
    
  •  end
    
  • end
  • DumbFib2 = DumbFib.clone
  • memoize :fib, “xfib.memoize.cache”, 1
  • memoize :fib2, “xfib2.memoize.cache”
  • memoize :fib3, “xfib3.memoize.cache”, 10
  • class Dummy1
  •  include DumbFib
    
  •  instance_memoize :fib, "xfib.dumb1.cache"
    
  • end
  • class Dummy2
  •  include DumbFib2
    
  • end
  • module DumbFib2
  •  instance_memoize :fib, "xfib.dumb2.cache"
    
  • end
  • require ‘benchmark’
  • runs = 1
  • Benchmark.bmbm(10) do |bm|
  •  dummy1 = Dummy1.new
    
  •  dummy2 = Dummy2.new
    
  •  bm.report("period 1") { runs.times{ fib  500  } }
    
  •  bm.report("period 10"){ runs.times{ fib3 500  } }
    
  •  bm.report("at_exit")  { runs.times{ fib2 500  } }
    
  •  bm.report("at_exit''")  { runs.times{ dummy1.fib 500  } }
    
  •  bm.report("at_exit'''")  { runs.times{ dummy2.fib 500  }; runs = 
    

100000}

  • end

+end

batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ----------------------------------------------
period 1 0.870000 0.160000 1.030000 ( 1.247975)
period 10 0.100000 0.010000 0.110000 ( 0.151390)
at_exit 0.010000 0.000000 0.010000 ( 0.034512)
at_exit’’ 0.020000 0.000000 0.020000 ( 0.030174)
at_exit’’’ 0.010000 0.000000 0.010000 ( 0.030094)
------------------------------------- total: 1.180000sec

             user     system      total        real

period 1 0.490000 0.000000 0.490000 ( 0.548635)
period 10 0.500000 0.000000 0.500000 ( 0.539279)
at_exit 0.500000 0.000000 0.500000 ( 0.562339)
at_exit’’ 0.490000 0.000000 0.490000 ( 0.534087)
at_exit’’’ 0.490000 0.000000 0.490000 ( 0.541301)
batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ----------------------------------------------
period 1 0.000000 0.000000 0.000000 ( 0.000048)
period 10 0.000000 0.000000 0.000000 ( 0.000019)
at_exit 0.000000 0.000000 0.000000 ( 0.000017)
at_exit’’ 0.000000 0.000000 0.000000 ( 0.000019)
at_exit’’’ 0.000000 0.000000 0.000000 ( 0.000018)
------------------------------------- total: 0.000000sec

             user     system      total        real

period 1 0.480000 0.000000 0.480000 ( 0.533512)
period 10 0.490000 0.000000 0.490000 ( 0.542617)
at_exit 0.500000 0.000000 0.500000 ( 0.544705)
at_exit’’ 0.500000 0.000000 0.500000 ( 0.551834)
at_exit’’’ 0.490000 0.000000 0.490000 ( 0.538747)

I also attached the full sources for your convenience.

[email protected] wrote:

  • no HD space being left (the previous cache state is preserved)

I also applied those changes to memoize.rb 1.2.0 and generalized it to work
with instance methods, see the patch after memo.rb.

nice idea.

Looks interesting but I get this with windows xp using his sample code:

ruby memoize.rb
Rehearsal ----------------------------------------------
period 1 memoize.rb:39:in has_key?': stack level too deep (SystemStackError) from memoize.rb:39:infib’
from memoize.rb:56:in fib' from memoize.rb:40:infib’
from memoize.rb:56:in fib' from memoize.rb:40:infib’
from memoize.rb:56:in fib' from memoize.rb:40:infib’
from memoize.rb:56:in fib' ... 539 levels... from c:/ruby184/lib/ruby/1.8/benchmark.rb:293:inmeasure’
from c:/ruby184/lib/ruby/1.8/benchmark.rb:261:in bmbm' from c:/ruby184/lib/ruby/1.8/benchmark.rb:259:inbmbm’
from memoize.rb:97

Regards,

Dan

On Mon, 2 Jan 2006, Mauricio F. wrote:

Dumping the cache on every miss seems expensive.
I changed the code to save it once every N misses, and made the cache update
atomic to prevent problems with:

  • concurrent executions of the memoized method (processes and threads)
  • no HD space being left (the previous cache state is preserved)

I also applied those changes to memoize.rb 1.2.0 and generalized it to work
with instance methods, see the patch after memo.rb.

nice idea.

seems like the perfect time to introduce a Memoize::Cache class wrapping
pstore though doesn’t it?

cheers.

-a

On Tue, Jan 03, 2006 at 04:32:57AM +0900, Daniel B. wrote:

    from memoize.rb:56:in `fib'
    from memoize.rb:40:in `fib'
    from memoize.rb:56:in `fib'
    from memoize.rb:40:in `fib'
    from memoize.rb:56:in `fib'
    from memoize.rb:40:in `fib'
    from memoize.rb:56:in `fib'
     ... 539 levels...

Your stack is too small; fib(500) goes 1000 levels deep. Change
those examples to fib(100) etc.:

batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ---------------------------------------------
period 1 memoize.rb:54:in fib': let's see how deep we get (RuntimeError) from memoize.rb:39:infib’
from memoize.rb:55:in fib' from memoize.rb:39:infib’
from memoize.rb:55:in fib' from memoize.rb:39:infib’
from memoize.rb:55:in fib' from memoize.rb:39:infib’
from memoize.rb:55:in fib' ... 993 levels... from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:293:inmeasure’
from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:261:in bmbm' from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:259:inbmbm’
from memoize.rb:96

given

batsman@tux-chan:/tmp$ diff -u memoize.rb.mod memoize.rb
— memoize.rb.mod 2006-01-02 21:38:18.000000000 +0100
+++ memoize.rb 2006-01-02 21:46:10.000000000 +0100
@@ -51,7 +51,7 @@

if FILE == $0
def fib(n)

  •  return n if n < 2
    
  •  raise "let's see how deep we get" if n < 2
     fib(n-1) + fib(n-2)
    
    end

@@ -97,10 +97,10 @@
dummy1 = Dummy1.new
dummy2 = Dummy2.new
bm.report(“period 1”) { runs.times{ fib 500 } }

  •  bm.report("period 10"){ runs.times{ fib3 500  } }
    
  •  bm.report("at_exit")  { runs.times{ fib2 500  } }
    
  •  bm.report("at_exit''")  { runs.times{ dummy1.fib 500  } }
    
  •  bm.report("at_exit'''")  { runs.times{ dummy2.fib 500  }; runs = 
    

100000}

  •  #bm.report("period 10"){ runs.times{ fib3 500  } }
    
  •  #bm.report("at_exit")  { runs.times{ fib2 500  } }
    
  •  #bm.report("at_exit''")  { runs.times{ dummy1.fib 500  } }
    
  •  #bm.report("at_exit'''")  { runs.times{ dummy2.fib 500  }; runs = 
    

100000}
end

end