Memoize to a file


#1

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for
me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: “in
`load’:
marshal data too short (ArgumentError)”

My questions:
1 What is causing this error? (possibly Windows related?)
2 What is the purpose of the rescue{} suppressing the error info in the
first place?
3 Instead of using Marshall would using yaml be a reasonable
alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

Thanks.

– Brian B.


require ‘memoize’
include Memoize
def fib(n)
puts “running… n is #{n}”
return n if n < 2
fib(n-1) + fib(n-2)
end
h = memoize(:fib,“fib.cache”)
puts fib(10)


#2

On Jan 31, 2006, at 10:32 PM, Brian B. wrote:

cache = Hash.new.update(Marshal.load(File.read(file)))
3 Instead of using Marshall would using yaml be a reasonable
include Memoize
def fib(n)
puts “running… n is #{n}”
return n if n < 2
fib(n-1) + fib(n-2)
end
h = memoize(:fib,“fib.cache”)
puts fib(10)

Basically it’s using exceptions as flow control:

begin
cache = Hash.new.update(Marshal.load(File.read(file)))
rescue
cache = {} # empty hash
end

So for whatever reason, if loading the file fails (eg, this is the
first time the program has been run) it just starts with an empty
cache. I don’t know why its failing to read the file.


#3

On Wed, Feb 01, 2006 at 12:32:57PM +0900, Brian B. wrote:

My questions:
1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn’t open the file in binary mode; try
File.open(file, “rb”){|f| f.read}

2 What is the purpose of the rescue{} suppressing the error info in the
first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

I wouldn’t do that:

  • Marshal is faster than Syck (especially when dumping data)
  • YAML takes more space than Marshal’ed data
  • there are still more bugs in Syck than in Marshal (the nastiest memory
    issues are believed to be fixed, but there is still occasional data
    corruption)
  • Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open(“cache.yaml”, “w”) do |out|
YAML.dump(Marshal.load(File.open(“cache”, “rb”){|f| f.read}), out)
end


#4

On Wed, 1 Feb 2006, Mauricio F. wrote:

setting cache to {} if Marshal.load fails for some reason (e.g. a major
issues are believed to be fixed, but there is still occasional data
corruption)

  • Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open(“cache.yaml”, “w”) do |out|
YAML.dump(Marshal.load(File.open(“cache”, “rb”){|f| f.read}), out)
end

why not pstore - it’s done all that already and is built-in?

-a


#5

1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn’t open the file in binary mode; try
File.open(file, “rb”){|f| f.read}

Perfect. Changing

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.open(file, “rb”){|f| f.read}))
rescue { }

and it works. Should this edit go into the gem (Daniel if you’re
listening)?

2 What is the purpose of the rescue{} suppressing the error info in the

first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

Got it. The error supression here is just about always the correct way
to
handle the situation.

As for editing the cache, you can always do

File.open(“cache.yaml”, “w”) do |out|
YAML.dump(Marshal.load(File.open(“cache”, “rb”){|f| f.read}), out)
end

Ahhh. Populate that Marshal formatted file using YAML. Good thought.


#6

On Thu, 2 Feb 2006, James Edward G. II wrote:

On Feb 1, 2006, at 9:31 AM, removed_email_address@domain.invalid wrote:

why not pstore - it’s done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file storage.
If you need transactions, it’s great. Otherwise, you might as well just use
Marshal.

it’s not quite only that. it also

  • does some simple checks when creating the file (readability, etc)
  • allows db usage to be multi-processed
  • supports deletion
  • rolls backs writes on exceptions / commits using ensure to avoid
    corrupt
    data file
  • handles read vs write actions using shared/excl locks to boost
    concurrency
  • uses md5 check to avoid un-needed writes
  • opens in correct modes for all platforms

with no offense meant towards memoize authors - at least of few of the
bugs
posted regarding that package would have been addressed by using a
built-in
lib rather that rolling one’s own. and, of course, that’s the big thing

  • why
    not use something already written and tested from the core instead of
    re-inventing the wheel?

in any case, i think the pstore lib, simple as it is, is a very
underated
library since it provides simple transactional and concurrent
persistence to
ruby apps in such an incredibly simply way. now if we could just get
joels
fsdb in the core! :wink:

kind regards.

-a


#7

On Feb 1, 2006, at 9:56 AM, removed_email_address@domain.invalid wrote:

it’s not quite only that. it also

  • opens in correct modes for all platforms
    These are all great points. Thanks for the lesson. :wink:

James Edward G. II


#8

Brian B. wrote:

and it instead of silently failing I now see the error message: “in `load’:
marshal data too short (ArgumentError)”

My questions:
1 What is causing this error? (possibly Windows related?)

That is odd. I’ve run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

2 What is the purpose of the rescue{} suppressing the error info in the
first place?

The assumption (whoops!) was that if Hash.new.update failed it was
because there was no cache (i.e. first run), so just return an empty
hash.

3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

It will be slower, but it would work.

Regards,

Dan


#9

On Feb 1, 2006, at 9:31 AM, removed_email_address@domain.invalid wrote:

why not pstore - it’s done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file
storage. If you need transactions, it’s great. Otherwise, you might
as well just use Marshal.

James Edward G. II


#10

Just a thought, but you might like to load this file using the binary
option on Windows. Marshall uses a binary format and Windows does wierd
things to binary files loaded without the binary option.


#11

Daniel B. wrote:

cache = Hash.new.update(Marshal.load(File.read(file)))
error if such is the case, since Marshal is not compatible between
alternative? (I am thinking of readability of the cache file and
also capability to pre-populate it)

It will be slower, but it would work.

As you and others have pointed out this is lilely a problem caused by
not
opening the file in binary mode. IMHO lib code that uses Marshal should
ensure to open files in binary mode (regardless of platform).
Advantages
are twofold: we won’t see these kind of erros (i.e. it’s cross platform)
and documentation (you know from reading the code that the file is
expected to contain binary data).

Also, the line looks a bit strange to me. Creating a new hash and
updating it with a hash read from disk seems superfluous. I’d rather do
something like this:

cache = File.open(file, “rb”) {|io| Marshal.load(io)} rescue {}

Marshal.load and Marshal.dump can actually read from and write to an IO
object. This seems most efficient because the file contents do not have
read into mem before demarshalling and it’s fail safe the same way as
the
old impl.

Kind regards

robert

#12

That is odd. I’ve run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

I have been on 1.8.2 on Windows straight through. Mauricio’s suggestion
of
File.open instead of File.read made it work for me (see other posts).

Brian


#13

On Thu, Feb 02, 2006 at 06:49:49AM +0900, Daniel B. wrote:

and it instead of silently failing I now see the error message: “in `load’:
marshal data too short (ArgumentError)”

My questions:
1 What is causing this error? (possibly Windows related?)

That is odd. I’ve run it on Windows with no trouble in the past.

(FTR: file not opened in binary mode, [177651])

Is it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

The Marshal format hasn’t changed for a while:

batsman@tux-chan:~/Anime$ ruby182 -v -e ‘p [Marshal::MAJOR_VERSION,
Marshal::MINOR_VERSION]’
ruby 1.8.2 (2004-12-25) [i686-linux]
[4, 8]
batsman@tux-chan:~/Anime$ ruby -v -e ‘p [Marshal::MAJOR_VERSION,
Marshal::MINOR_VERSION]’
ruby 1.8.4 (2005-12-24) [i686-linux]
[4, 8]

Also note that ruby can read Marshal data in older formats if the
MAJOR_VERSION hasn’t changed (i.e. if only the MINOR_VERSION was
increased):

if (major != MARSHAL_MAJOR || minor > MARSHAL_MINOR) {
rb_raise(rb_eTypeError, "incompatible marshal file format (can't be 

read)\n
\tformat version %d.%d required; %d.%d given",
MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
}
if (RTEST(ruby_verbose) && minor != MARSHAL_MINOR) {
rb_warn(“incompatible marshal file format (can be read)\n
\tformat version %d.%d required; %d.%d given”,
MARSHAL_MAJOR, MARSHAL_MINOR, major, minor);
}

(after some searching…)

Back in Apr. 2001, matz said that “Marshal should not change too much
(unless in upper compatible way)” [14063]. The last minor change
happened after 1.6.8 (6 -> 8), and MARSHAL_MAJOR was already 4 in
v1_0_1,
7 years, 2 months ago (at which point I got tired of CVSweb).

Marshal’s format is more stable than we think.


#14

On Feb 1, 2006, at 9:56 AM, removed_email_address@domain.invalid wrote:

it’s not quite only that. it also

  • opens in correct modes for all platforms
    I’ve made a file caching example using PSTore for my toy Memoizable
    library. I just thought I would post it here, in case it helps/
    inspires others.

#!/usr/local/bin/ruby -w

pstore_caching.rb

Created by James Edward G. II on 2006-02-03.

Copyright 2006 Gray Productions. All rights reserved.

require “memoizable”
require “pstore”

A trivial implementation of a custom cache. This cache uses PStore

to provide

a multi-processing safe disk cache. The downside is that the

entire cache

must be loaded for a key check. This can require significant

memory for a

large cache.

class PStoreCache
def initialize( path )
@cache = PStore.new(path)
end

def
@cache.transaction(true) { @cache[key] }
end

def []=( key, value )
@cache.transaction { @cache[key] = value }
end
end

class Fibonacci
extend Memoizable

def fib( num )
return num if num < 2
fib(num - 1) + fib(num - 2)
end
memoize :fib, PStoreCache.new(“fib_cache.pstore”)
end

puts “This method is memoized using a file-based cache…”
start = Time.now
puts “fib(100): #{Fibonacci.new.fib(100)}”
puts “Run time: #{Time.now - start} seconds”

puts
puts “Run again to see the file cache at work.”

END

James Edward G. II