Forum: Ruby Memoization, files and Marshal?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-01-01 01:24
(Received via mailing list)
Hi all,

Inspired by a recent blog entry by Mauricio Fernandez, I decided to try
to add a method to the memoize package that would allow users to
memoize using a file based approach rather than memory, either to
conserve memory or for persistance.

However, I'm not sure it's possible.  If it is, I'd love to know how.
If it's not, well, pstore it is.

Here's what I tried:

def memoize_to_file(name, file)
   meth = method(name)
   cache = Hash.new.update(Marshal.load(File.read(file))) rescue nil
   cache ||= {}
   (class << self; self; end).class_eval do
      define_method(name) do |*args|
         if cache.has_key?(args)
            cache[args]
         else
            cache[args] ||= meth.call(*args)
            File.open(file, "wb+"){ |f| Marshal.dump(cache, f) }
         end
      end
   end
   cache
end

# Code snippet
include Memoize
def fib(n)
   return n if n < 2
   fib(n-1) + fib(n-2)
end
memoize_to_file(:fib, "temp.txt")
fib(10)

However, running that gives me this error: in `fib': undefined method
`+' for #<File:temp.txt (closed)> (NoMethodError)

Any ideas?

Thanks,

Dan
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-01-01 02:30
(Received via mailing list)
On Sun, 1 Jan 2006, Daniel Berger wrote:

> Here's what I tried:
>            cache[args] ||= meth.call(*args)
>   return n if n < 2
> Thanks,
harp:~ > cat a.rb
   def memoize_to_file(name, file)
     cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
     (class << self; self; end).class_eval do
        define_method(name) do |*args|
           unless cache.has_key?(args)
             cache[args] = method(name).call(*args)
             File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
           end
           cache[args]
        end
     end
   end

   def fib(n)
    return n if n < 2
    fib(n-1) + fib(n-2)
   end

   memoize_to_file "fib", "fib.cache"
   n = fib 10
   p n



   harp:~ > ruby a.rb
   55


regards.

-a
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-01-01 03:00
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
> On Sun, 1 Jan 2006, Daniel Berger wrote:

<snip>

>         end
>    p n
As is, I get a stack level too deep error on Windows.  If I change the
"cache[args] =" to "cache[args] ||=", I always get back nil.

I'm probably being thick here, but any help is appreciated.

Thanks,

Dan
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-01-01 04:33
(Received via mailing list)
On Sun, 1 Jan 2006, Daniel Berger wrote:

>>            unless cache.has_key?(args)
>>     fib(n-1) + fib(n-2)
>>    end
>>
>>    memoize_to_file "fib", "fib.cache"
>>    n = fib 10
>>    p n
>
> As is, I get a stack level too deep error on Windows.  If I change the
> "cache[args] =" to "cache[args] ||=", I always get back nil.
>
> I'm probably being thick here, but any help is appreciated.

sorry, don't know how that's running on linux in the first place... you
need:

   --- a.rb	2005-12-31 20:30:59.000000000 -0700
   +++ b.rb	2005-12-31 20:31:10.000000000 -0700
   @@ -1,9 +1,10 @@
    def memoize_to_file(name, file)
   +  meth = method(name)
      cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
      (class << self; self; end).class_eval do
         define_method(name) do |*args|
            unless cache.has_key?(args)
   -          cache[args] = method(name).call(*args)
   +          cache[args] = meth.call(*args)
              File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
            end
            cache[args]

with that it works for me on linux and windows.

cheers.

-a
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-01-01 13:34
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
> >>      (class << self; self; end).class_eval do
> >>    def fib(n)
> >
>       (class << self; self; end).class_eval do
> cheers.
>
> -a
> --
>

Thanks, that did the trick nicely.  It's now part of memoize 1.2.0. :)

Regards,

Dan
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-01-02 12:39
(Received via mailing list)
On Sun, Jan 01, 2006 at 09:32:56PM +0900, Daniel Berger wrote:
> >             unless cache.has_key?(args)
> >    -          cache[args] = method(name).call(*args)
> >    +          cache[args] = meth.call(*args)
> >               File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
> >             end
> >             cache[args]
[...]
> Thanks, that did the trick nicely.  It's now part of memoize 1.2.0. :)

Dumping the cache on every miss seems expensive.
I changed the code to save it once every N misses, and made the cache
update
atomic to prevent problems with:
* concurrent executions of the memoized method (processes and threads)
* no HD space being left (the previous cache state is preserved)

I also applied those changes to memoize.rb 1.2.0 and generalized it to
work
with instance methods, see the patch after memo.rb.

batsman@tux-chan:/tmp$ cat memo.rb
def memoize_to_file(name, file = nil, period = -1)
  meth = method(name)
  cache = Hash.new.update(Marshal.load(File.read(file))) rescue {}
  save_cache_proc = lambda do
    tmpfile = "#{file}.#{Process.pid}"
    begin
      File.open(tmpfile, "wb+"){|f| Marshal.dump(cache, f) }
      File.rename tmpfile, file
    rescue Exception
    end
  end
  at_exit { save_cache_proc.call } if file
  calls = period
  (class << self; self; end).class_eval do
     define_method(name) do |*args|
        unless cache.has_key?(args)
          cache[args] = meth.call(*args)
          if file && period != -1 && (calls -= 1) <= 0
            calls = period
            save_cache_proc.call
          end
        end
        cache[args]
     end
  end
end

def fib(n)
 return n if n < 2
 fib(n-1) + fib(n-2)
end

def fib2(n)
 return n if n < 2
 fib2(n-1) + fib2(n-2)
end

def fib3(n)
 return n if n < 2
 fib3(n-1) + fib3(n-2)
end


memoize_to_file "fib", "fib.cache", 1
memoize_to_file "fib2", "fib2.cache"
memoize_to_file "fib3", "fib3.cache", 10
require 'benchmark'
Benchmark.bmbm(10) do |bm|
  bm.report("period 1") { fib  1000  }
  bm.report("period 10"){ fib3 1000  }
  bm.report("at_exit")  { fib2 1000  }
end
batsman@tux-chan:/tmp$ ruby memo.rb
Rehearsal ---------------------------------------------
period 1    4.030000   0.510000   4.540000 (  5.507344)
period 10   0.430000   0.060000   0.490000 (  0.570224)
at_exit     0.030000   0.000000   0.030000 (  0.039652)
------------------------------------ total: 5.060000sec

                user     system      total        real
period 1    0.000000   0.000000   0.000000 (  0.000035)
period 10   0.000000   0.000000   0.000000 (  0.000033)
at_exit     0.000000   0.000000   0.000000 (  0.000035)




The patch to memoize.rb:

--- memoize.rb.orig	2006-01-02 11:02:54.000000000 +0100
+++ memoize.rb	2006-01-02 12:31:46.000000000 +0100
@@ -1,39 +1,106 @@
 module Memoize
    MEMOIZE_VERSION = "1.2.0"
-
+
+   def self.memoize(recv, name, file=nil, period=-1)
+      class << recv; self end.instance_eval{ instance_memoize(name,
file, period) }
+   end
+
+   def memoize(name, file=nil, period=-1)
+      Memoize.memoize(self, name, file, period)
+   end
+end
+
+class Module
    # Memoize the method +name+.  If +file+ is provided, then the method
results
-   # are stored on disk rather than in memory.  This consumes virtually
no
-   # memory and is persistant.
-   def memoize(name, file=nil)
-      meth = method(name)
-
+   # are stored on disk in addition to the in-memory cache.
+   def instance_memoize(name, file=nil, period=-1)
+      meth = instance_method(name)
+
       if file
          cache = Hash.new.update(Marshal.load(File.read(file))) rescue
{}
       else
          cache = {}
       end
-
-      if file
-         (class << self; self; end).class_eval do
-            define_method(name) do |*args|
-               unless cache.has_key?(args)
-                  cache[args] = meth.call(*args)
-                  File.open(file, "wb+"){|f| Marshal.dump(cache, f) }
-               end
-               cache[args]
-            end
+
+      save_cache_proc = lambda do
+         tmpfile = "#{file}.#{Process.pid}"
+         begin
+            File.open(tmpfile, "wb+"){|f| Marshal.dump(cache, f) }
+            File.rename tmpfile, file
+         rescue Exception
          end
-      else
-         (class << self; self; end).class_eval do
-            define_method(name) do |*args|
-               if cache.has_key?(args)
-                  cache[args]
-               else
-                  cache[args] ||= meth.call(*args)
-               end
+      end
+
+      at_exit { save_cache_proc.call } if file
+
+      calls = period
+      define_method(name) do |*args|
+         unless cache.has_key?(args)
+            cache[args] = meth.bind(self).call(*args)
+            if file && period != -1 && (calls -= 1) <= 0
+               calls = period
+               save_cache_proc.call
             end
          end
+         cache[args]
       end
+
       cache
    end
-end
\ No newline at end of file
+end
+
+if __FILE__ == $0
+   def fib(n)
+      return n if n < 2
+      fib(n-1) + fib(n-2)
+   end
+
+   def fib2(n)
+      return n if n < 2
+      fib2(n-1) + fib2(n-2)
+   end
+
+   def fib3(n)
+      return n if n < 2
+      fib3(n-1) + fib3(n-2)
+   end
+
+   include Memoize
+
+   module DumbFib
+      def fib(n)
+         return n if n < 2
+         fib(n-1) + fib(n-2)
+      end
+   end
+
+   DumbFib2 = DumbFib.clone
+
+   memoize :fib, "xfib.memoize.cache", 1
+   memoize :fib2, "xfib2.memoize.cache"
+   memoize :fib3, "xfib3.memoize.cache", 10
+
+   class Dummy1
+      include DumbFib
+      instance_memoize :fib, "xfib.dumb1.cache"
+   end
+   class Dummy2
+      include DumbFib2
+   end
+   module DumbFib2
+      instance_memoize :fib, "xfib.dumb2.cache"
+   end
+
+   require 'benchmark'
+   runs = 1
+   Benchmark.bmbm(10) do |bm|
+      dummy1 = Dummy1.new
+      dummy2 = Dummy2.new
+      bm.report("period 1") { runs.times{ fib  500  } }
+      bm.report("period 10"){ runs.times{ fib3 500  } }
+      bm.report("at_exit")  { runs.times{ fib2 500  } }
+      bm.report("at_exit''")  { runs.times{ dummy1.fib 500  } }
+      bm.report("at_exit'''")  { runs.times{ dummy2.fib 500  }; runs =
100000}
+   end
+
+end


batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ----------------------------------------------
period 1     0.870000   0.160000   1.030000 (  1.247975)
period 10    0.100000   0.010000   0.110000 (  0.151390)
at_exit      0.010000   0.000000   0.010000 (  0.034512)
at_exit''    0.020000   0.000000   0.020000 (  0.030174)
at_exit'''   0.010000   0.000000   0.010000 (  0.030094)
------------------------------------- total: 1.180000sec

                 user     system      total        real
period 1     0.490000   0.000000   0.490000 (  0.548635)
period 10    0.500000   0.000000   0.500000 (  0.539279)
at_exit      0.500000   0.000000   0.500000 (  0.562339)
at_exit''    0.490000   0.000000   0.490000 (  0.534087)
at_exit'''   0.490000   0.000000   0.490000 (  0.541301)
batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ----------------------------------------------
period 1     0.000000   0.000000   0.000000 (  0.000048)
period 10    0.000000   0.000000   0.000000 (  0.000019)
at_exit      0.000000   0.000000   0.000000 (  0.000017)
at_exit''    0.000000   0.000000   0.000000 (  0.000019)
at_exit'''   0.000000   0.000000   0.000000 (  0.000018)
------------------------------------- total: 0.000000sec

                 user     system      total        real
period 1     0.480000   0.000000   0.480000 (  0.533512)
period 10    0.490000   0.000000   0.490000 (  0.542617)
at_exit      0.500000   0.000000   0.500000 (  0.544705)
at_exit''    0.500000   0.000000   0.500000 (  0.551834)
at_exit'''   0.490000   0.000000   0.490000 (  0.538747)


I also attached the full sources for your convenience.
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-01-02 18:26
(Received via mailing list)
On Mon, 2 Jan 2006, Mauricio Fernandez wrote:

>
> Dumping the cache on every miss seems expensive.
> I changed the code to save it once every N misses, and made the cache update
> atomic to prevent problems with:
> * concurrent executions of the memoized method (processes and threads)
> * no HD space being left (the previous cache state is preserved)
>
> I also applied those changes to memoize.rb 1.2.0 and generalized it to work
> with instance methods, see the patch after memo.rb.

nice idea.

seems like the perfect time to introduce a Memoize::Cache class wrapping
pstore though doesn't it?

cheers.

-a
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-01-02 20:33
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
> > * no HD space being left (the previous cache state is preserved)
> >
> > I also applied those changes to memoize.rb 1.2.0 and generalized it to work
> > with instance methods, see the patch after memo.rb.
>
> nice idea.

Looks interesting but I get this with windows xp using his sample code:

>ruby memoize.rb
Rehearsal ----------------------------------------------
period 1   memoize.rb:39:in `has_key?': stack level too deep
(SystemStackError)
        from memoize.rb:39:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
        from memoize.rb:40:in `fib'
        from memoize.rb:56:in `fib'
         ... 539 levels...
        from c:/ruby184/lib/ruby/1.8/benchmark.rb:293:in `measure'
        from c:/ruby184/lib/ruby/1.8/benchmark.rb:261:in `bmbm'
        from c:/ruby184/lib/ruby/1.8/benchmark.rb:259:in `bmbm'
        from memoize.rb:97

Regards,

Dan
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-01-02 21:52
(Received via mailing list)
On Tue, Jan 03, 2006 at 04:32:57AM +0900, Daniel Berger wrote:
>         from memoize.rb:56:in `fib'
>         from memoize.rb:40:in `fib'
>         from memoize.rb:56:in `fib'
>         from memoize.rb:40:in `fib'
>         from memoize.rb:56:in `fib'
>         from memoize.rb:40:in `fib'
>         from memoize.rb:56:in `fib'
>          ... 539 levels...

Your stack is too small; fib(500) goes 1000 levels deep. Change
those examples to fib(100) etc.:

batsman@tux-chan:/tmp$ ruby memoize.rb
Rehearsal ---------------------------------------------
period 1  memoize.rb:54:in `fib': let's see how deep we get
(RuntimeError)
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
        from memoize.rb:39:in `fib'
        from memoize.rb:55:in `fib'
         ... 993 levels...
        from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:293:in
`measure'
        from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:261:in `bmbm'
        from /home/batsman/usr/lib/ruby/1.8/benchmark.rb:259:in `bmbm'
        from memoize.rb:96

given

batsman@tux-chan:/tmp$ diff -u memoize.rb.mod memoize.rb
--- memoize.rb.mod      2006-01-02 21:38:18.000000000 +0100
+++ memoize.rb  2006-01-02 21:46:10.000000000 +0100
@@ -51,7 +51,7 @@

 if __FILE__ == $0
    def fib(n)
-      return n if n < 2
+      raise "let's see how deep we get" if n < 2
       fib(n-1) + fib(n-2)
    end

@@ -97,10 +97,10 @@
       dummy1 = Dummy1.new
       dummy2 = Dummy2.new
       bm.report("period 1") { runs.times{ fib  500  } }
-      bm.report("period 10"){ runs.times{ fib3 500  } }
-      bm.report("at_exit")  { runs.times{ fib2 500  } }
-      bm.report("at_exit''")  { runs.times{ dummy1.fib 500  } }
-      bm.report("at_exit'''")  { runs.times{ dummy2.fib 500  }; runs =
100000}
+      #bm.report("period 10"){ runs.times{ fib3 500  } }
+      #bm.report("at_exit")  { runs.times{ fib2 500  } }
+      #bm.report("at_exit''")  { runs.times{ dummy1.fib 500  } }
+      #bm.report("at_exit'''")  { runs.times{ dummy2.fib 500  }; runs =
100000}
    end

 end
This topic is locked and can not be replied to.