LazyLoad

Imagine, you’re building a CVS like repository. A repository
has modules, a module has branches and a branch has a lot of
items (history!). This repository is handled by a standalone
server with the FTP protocol as “frontend”.

In a naive implementation, you simply load all items (except
the contents of the items) in memory, so they are readily
available. In pure OO modeling theories, this is usually true:
“All objects live in core.”.

In reality, you don’t want to do that. You only want to load
all items of a branch if they are referred to (and optionally
unload it after a while). So we introduce Branch@loaded and
move the invocation of Branch#load from Branch#initialize to
each place in the code where Branch@items is used. Each
place, don’t forget even one place! Sooner or later, you’ll
forget one! It’s too tricky… And bad coding…

So I came up with this LazyLoad, a generic lazy-loading class.
We can initialize Branch@items to LazyLoad.new(self, :load,
:items) instead of Hash.new. Whenever this object is referred
to (e.g. with @items.keys), LazyLoad#method_missing is invoked.
This method invokes Branch#load, gets the object Branch@items
(which now refers to a filled Hash) and sends the original
message to this Branch@items. This instance of LazyLoad now
dies in peace.

I implemented LazyLoad (see below) and use it in a real
situation. Seems to work. The server starts really fast and the
user thinks that all branches are loaded.

I embedded the backend in the commandline tool as well. If you
use this commandline tool to synchronize the local workset with
the repository, you usually want to load only one branch, not
all of them. The speed benefit is huge, whereas the impact on
the code is close to zero!

The code below demonstrates this theory: Step 1 is the naive
implementation of Branch, step2 is the enhanced implementation
of Branch and step3 implements LazyLoad itself. (Steps 1 and 2
are just examples of the use of LazyLoad. They are not
complete.)

Comments? Ideas? Something I overlooked?

gegroet,
Erik V. - http://www.erikveen.dds.nl/


STEP 1, NAIVE IMPLEMENTATION

class Branch
def initialize
@items = {}

 load

end

def load
@items = {}

 # Fill @items... EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!

end
end


STEP 2, INTRODUCING LAZYLOAD

class Branch
def initialize
@items = LazyLoad.new(self, :load, :items)
end

def load
@items = {}

 # Fill @items... EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!

end
end


STEP 3, IMPLEMENTATION OF LAZYLOAD

class LazyLoad
def initialize(object, load_method, property)
@object = object
@property = property
@load_method = load_method
end

def method_missing(method_name, *parms, &block)
@object.send(@load_method)
@object.instance_eval(“@#{@property.to_s}”).send(method_name,
*parms, &block)
end
end

Erik V. wrote:

def method_missing(method_name, *parms, &block)
@object.send(@load_method)
@object.instance_eval("@#{@property.to_s}").send(method_name,
*parms, &block)
end
end

Interesting. Here’s a potential problem. You won’t hit method_missing
for
methods that Object has. In particular, a client might use dup, ==,
to_s,
class, kind_of?, and so on. You might try getting around some of this by
using Delegate from the standard library, or Facets’ BlankSlate IIRC.

I was thinking three parameters was too many for LazyLoad#initialize
(just
provide a block which loads and returns the loaded value), and started
to
rewrite it, when I remembered MenTaLguY’s lazy.rb, which does something
very
similar.

STEP 4, INTRODUCING lazy.rb

class Branch
def initialize
@items = promise do
@items = {}
# Fill @items… EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
@items
end
end
end

Cheers,
Dave

Interesting. Here’s a potential problem. You won’t hit
method_missing for methods that Object has. In particular, a
client might use dup, ==, to_s, class, kind_of?, and so on.
You might try getting around some of this by using Delegate
from the standard library, or Facets’ BlankSlate IIRC.

I was aware of this problem and was already working it out. My
solution is to overwrite all (except a few) already defined
methods, so they call method_missing.

I was thinking three parameters was too many for
LazyLoad#initialize (just provide a block which loads and
returns the loaded value), and started to rewrite it, when I
remembered MenTaLguY’s lazy.rb, which does something very
similar.

Right… Maybe using a block was just to obvious…

New versions below.

(I keep the method Branch#load, because it not only fills
Branch@items, but Branch@snapshots as well.)

Thanks.

More comments? More ideas? More things I overlooked?

gegroet,
Erik V. - http://www.erikveen.dds.nl/


class LazyLoad
instance_methods.each do |method_name|
unless [“send”, “id”].include?(method_name)
class_eval <<-“EOF”
def #{method_name}(*parms, &block)
method_missing(:#{method_name}, *parms, &block)
end
EOF
end
end

def initialize(&block)
@block = block
end

def method_missing(method_name, *parms, &block)
@block.call.send(method_name, *parms, &block)
end
end


class Branch
def initialize
@items = LazyLoad.new{load; @items}
@snapshots = LazyLoad.new{load; @snapshots}
end

def load
@items = {}
@snapshots = {}

 # Fill @items and @snapshots...
 # EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!

end
end

As a side effect of the last improvement, the attr_reader now
works too.

gegroet,
Erik V. - http://www.erikveen.dds.nl/


require “lazyload”

class Thing
attr_reader :prop1
attr_reader :prop2
attr_writer :prop2

def initialize
@prop1 = LazyLoad.new{:it_works}
@prop2 = LazyLoad.new{:nothing}
end
end

thing = Thing.new

thing.prop2 = :this_too

p thing.prop1
p thing.prop2

Quoting Erik V. [email protected]:

instance_methods.each do |method_name|
unless [“send”, “id”].include?(method_name)
class_eval <<-“EOF”
def #{method_name}(*parms, &block)
method_missing(:#{method_name}, *parms, &block)
end
EOF
end
end

I’ve seen a lot of different ways of writing this; my personal
favorite is:

instance_methods.each { |m| undef_method m unless m =~ /^__/ }

(I think I picked this up from someone on ruby-talk)

def method_missing(method_name, *parms, &block)
@block.call.send(method_name, *parms, &block)
end

One thing to be careful of – if the block can’t successfully
replace all references to the LazyLoad instance with the result
object, you will see really bizzare behavior as
each new method call on the LazyLoad reruns the (potentially
expensive) computation, only to be routed to a completely different
object.

To get around that problem, you’d want to remember the block’s
result after calling it the first time (as lazy.rb does), rather
than calling it multiple times.

 # Fill @items and @snapshots...
 # EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!

end
end

You can do this with lazy.rb too:

class Branch
def initialize
@items = promise{load; @items}
@snapshots = promise{load; @snapshots}
end

def load
@items = {}
@snapshots = {}

 # Fill @items and @snapshots...
 # EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!

end
end

Since you mentioned you were using this in production, you really
might want to look into using lazy.rb instead.

While discovering the issues for yourself can be a valuable learning
experience, I’m not sure you want to be doing that in production
code…

-mental

Quoting Erik V. [email protected]:

def initialize
@prop1 = LazyLoad.new{:it_works}
@prop2 = LazyLoad.new{:nothing}
end

Ah, careful. Every method call on @prop1 or @prop2 would rerun the
block. You won’t notice any problems if all the block does is
return a symbol, but for anything more complex … well, see my
last email.

-mental

Quoting Erik V. [email protected]:

 @items     = LazyLoad.new{load; @items}
 @snapshots = LazyLoad.new{load; @snapshots}

I forgot to mention. I think you’ve hit on a nice idiom for using
lazy evaluation in Ruby here.

In simpler cases, it could look something like:

@blah = promise { @blah = expensive_computation }

Good find!

-mental

def initialize
@prop1 = LazyLoad.new{:it_works}
@prop2 = LazyLoad.new{:nothing}
end

Ah, careful. Every method call on @prop1 or @prop2 would
rerun the block. You won’t notice any problems if all the
block does is return a symbol, but for anything more complex
… well, see my last email.

Yeah, true. Bad example. The original code (the branches) uses
a common method “load” which sets both properties. If one of
them is used, the other gets set as well. No problem there.

gegroet,
Erik V. - http://www.erikveen.dds.nl/

To get around that problem, you’d want to remember the
block’s result after calling it the first time (as lazy.rb
does), rather than calling it multiple times.

I added some things to make it thread safe (see below). One of
them was @real_object. That’s what you mean, I think.

You can do this with lazy.rb too:

I’ve never seen lazy.rb before, but by now the packages is
already on my desktop. Needs some investigation. Although doing
it yourself is a good learning experience indeed. :slight_smile:

Since you mentioned you were using this in production, you
really might want to look into using lazy.rb instead.

My “real situation” is a shadow repository for CVS. It’s not
too bad when it dies. Running the conversion job will refill
the DB. :slight_smile:

Thanks.

gegroet,
Erik V. - http://www.erikveen.dds.nl/


require “thread”

class LazyLoad
instance_methods.each do |method_name|
unless [“send”, “id”].include?(method_name)
class_eval <<-“EOF”
def #{method_name}(*parms, &block)
method_missing(:#{method_name}, *parms, &block)
end
EOF
end
end

def initialize(*parms, &block)
@parms = parms
@block = block
@mutex = Mutex.new
@evaluated = false
@real_object = nil
end

def method_missing(method_name, parms, &block)
@mutex.synchronize do
@real_object = @block.call(
@parms) unless @evaluated
@evaluated = true
end

 @real_object.send(method_name, *parms, &block)

end
end

module Kernel
def lazy(*parms, &block)
LazyLoad.new(*parms, &block)
end
end

It does work with the thread safe version I posted a couple of
minutes ago… :o)

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Quoting Erik V. [email protected]:

Yeah, true. Bad example. The original code (the branches) uses
a common method “load” which sets both properties. If one of
them is used, the other gets set as well. No problem there.

Actually there is a problem the moment such an instance variable is
assigned to another variable or passed as a parameter. The
LazyLoad instance escapes the block’s control.

class Example
def initialize
@prop = LazyLoad.new { @prop = [1, 2, 3] }
end
def show3
helper( @prop )
end
def helper( arr )
p arr.pop
p arr.pop
p arr.pop
end
end

ex = Example.new
ex.show3

What would you expect this program to output? What actually
happens?

-mental

Quoting Erik V. [email protected]:

To get around that problem, you’d want to remember the
block’s result after calling it the first time (as lazy.rb
does), rather than calling it multiple times.

I added some things to make it thread safe (see below). One of
them was @real_object. That’s what you mean, I think.

Ah, okay, yeah. Disregard my last email then. That fixes the
problem I mention there, too.

You can do this with lazy.rb too:

I’ve never seen lazy.rb before, but by now the packages is
already on my desktop. Needs some investigation.

Well, you’ve basically rewritten it at this point. :slight_smile:

But there are some issues it covers that you still have to
address… for example, what happens if the block raises an
exception? Will normal code be able to distiguish exceptions
raised by implicitly evaluated code at random locations from those
which are expected?

Also, is passing parms to the block actually useful to you? It
seems like you could get anything you needed from the block’s
lexical environment.

On the other end of things – one issue that I’ve still got to
address for myself in lazy.rb is threadsafety. So you’ve improved
on mine in that respect.

-mental

Well, you’ve basically rewritten it at this point. :slight_smile:

:o)

But there are some issues it covers that you still have to
address… for example, what happens if the block raises an
exception? Will normal code be able to distiguish exceptions
raised by implicitly evaluated code at random locations from
those which are expected?

Catching the exception, storing it and rethrowing it every time
it’s necessary (see below). Will that do?

Also, is passing parms to the block actually useful to you?

No. It’s just there. :slight_smile:

On the other end of things – one issue that I’ve still got
to address for myself in lazy.rb is threadsafety. So you’ve
improved on mine in that respect.

That’s what I like about technical discussions like this:
Everybody can watch, learn, be part of the game and feel good.

gegroet,
Erik V. - http://www.erikveen.dds.nl/


class LazyLoad
instance_methods.each do |method|
undef_method(method) unless method =~ /^__/
end

def initialize(*parms, &block)
@parms = parms
@block = block
@mutex = Mutex.new
@evaluated = false
@exception = nil
@real_object = nil
end

def method_missing(method_name, parms, &block)
@mutex.synchronize do
begin
@real_object = @block.call(
@parms) unless @evaluated
rescue Exception => e
@exception = e
ensure
@evaluated = true
end
end

 raise @exception  if @exception

 @real_object.send(method_name, *parms, &block)

end
end

Testscript:


require “my_lazy”

class Thing
attr :prop

def initialize
@prop = lazy {@prop = expensive_load}
end

def expensive_load
p :expensive_load

 raise "BOOM"       # Optionally

 7*8

end
end

thing = Thing.new

3.times do
begin
p thing.prop
rescue Exception => e
p [e.message, e.backtrace]
end
end

On the other end of things – one issue that I’ve still got
to address for myself in lazy.rb is threadsafety. So you’ve
improved on mine in that respect.

Well, there is a problem with my locking. Do you remember that
a set both Branch@items and Branch@snapshots in Branch#load,
like this?:

@snapshots = lazy {load; @snapshots}
@items = lazy {load; @items}

Both Lazy objects don’t share the same Mutex object, so,
effectively, under these circumstances, there’s no locking.

So we can get of those parms thing and introduce an external
Mutex object:

mutex = Mutex.new
@snapshots = lazy(mutex) {load; @snapshots}
@items = lazy(mutex) {load; @items}

Thoughts?

gegroet,
Erik V. - http://www.erikveen.dds.nl/


require “thread”

module EV
class Lazy
instance_methods.each do |method|
undef_method(method) unless method =~ /^__/
end

 def initialize(mutex=Mutex.new, &block)
   @mutex            = mutex
   @block            = block
   @mutex            = Mutex.new
   @evaluated        = false
   @exception        = nil
   @real_object      = nil
 end

 def method_missing(method_name, *parms, &block)
   @mutex.synchronize do
     begin
       @real_object  = @block.call() unless @evaluated
     rescue Exception => e
       @exception    = e
     ensure
       @evaluated    = true
     end
   end

   raise @exception  if @exception

   @real_object.send(method_name, *parms, &block)
 end

end
end

module Kernel
def lazy(*parms, &block)
EV::Lazy.new(*parms, &block)
end
end

Will normal code be able to distiguish exceptions raised by
implicitly evaluated code at random locations from those
which are expected?

I don’t get this. Can you please explain a bit more
specifically, please?

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Quoting Erik V. [email protected]:

Will normal code be able to distiguish exceptions raised by
implicitly evaluated code at random locations from those
which are expected?

I don’t get this. Can you please explain a bit more
specifically, please?

For example (not a great example, but hopefully you’ll get a feel
for the issue):

MAPPING = lazy {
Hash[*File.read( ‘mapping.txt’ ).map { |line|
line.chomp.split
} ].freeze
}

data = nil
begin
File.open( ‘data.txt’, ‘r’ ) { |stream|
data = stream.map { |line| MAPPING[line.chomp] }
}
rescue Errno::ENOENT

create initial data.txt if it doesn’t exist

File.open( ‘data.txt’, ‘w’ ) { |stream|
MAPPING.keys.each { |key| stream.puts key }
}
data = MAPPING.values
end

Nuking data.txt is probably not the right response to a missing
mapping.txt…

-mental

Quoting Erik V. [email protected]:

Both Lazy objects don’t share the same Mutex object, so,
effectively, under these circumstances, there’s no locking.

Well, the way I would do it is to treat load as a single computation
(which I’d be inclined to do for conceptual reasons, thread safety
aside).

That means something like:

blah = lazy {load; [ @snapshots, @items ]}
@snapshots = lazy { blah[0] }
@items = lazy { blah[1] }

(note that Array#[] forces the promise, so we add extra laziness)

One computation, one promise, one mutex.

-mental

Quoting Erik V. [email protected]:

(By the way, in your example, you use an exception to do some
business logic: “If the file doesn’t exist, create it”. In OO
theory, this isn’t the right way to go…)

Sometimes optimistically trying an operation and catching exceptions
is the only way to do something atomically.

data = nil
if File.exist? ‘data.txt’
File.open ( ‘data.txt’, ‘r’ ) { |stream|
data = stream.map { |line| MAPPING[line.chomp] }
}
else
File.open( ‘data.txt’, ‘w’ ) { |stream|
MAPPING.keys.each { |key| stream.puts key }
}
data = MAPPING.values
end

Elsewhere, at just the wrong moment:

$ rm data.txt

Then:

theory-meets-reality.rb:3:in open: No such file or directory -
data.txt (Errno::ENOENT)
from theory-meets-reality.rb:3
zsh: exit 1 ruby theory-meets-reality.rb

I general you want to avoid a check-then-do idiom with IO
operations, because:

  1. it introduces a race condition

  2. the check may not be reliable for other reasons (c.f. the
    troubles with access() on POSIX systems, or on Windows,
    GetEffectiveRightsFromAcl, particlarly with Samba shares in certain
    configurations)

Of course, even my exception-based version wasn’t free of race
conditions – see if you can spot the others that I didn’t address.

-mental

Nuking data.txt is probably not the right response to a
missing mapping.txt…

I got it. I introduced LazyException (see below) as a wrapper
for the original Exception. All methods are (once again)
delegated to the original exception with method_missing, except
for LazyException#exception.

(By the way, in your example, you use an exception to do some
business logic: “If the file doesn’t exist, create it”. In OO
theory, this isn’t the right way to go…)

What more?

gegroet,
Erik V. - http://www.erikveen.dds.nl/


class LazyException < Exception
superclass.instance_methods(false).each do |method|
undef_method(method) unless method =~ /^__/
end

def initialize(exception)
@original_exception = exception
end

def exception
self
end

def method_missing(method_name, *parms, &block)
@original_exception.send(method_name, *parms, &block)
end
end

class Lazy
instance_methods(true).each do |method|
undef_method(method) unless method =~ /^__/
end

def initialize(mutex=Mutex.new, &block)
@mutex = mutex
@block = block
@evaluated = false
@exception = nil
@real_object = nil
end

def method_missing(method_name, *parms, &block)
@mutex.synchronize do
begin
@real_object = @block.call() unless @evaluated
rescue Exception => e
@exception = LazyException.new(e)
ensure
@evaluated = true
end
end

 raise @exception  if @exception

 @real_object.send(method_name, *parms, &block)

end
end

module Kernel
def lazy(*parms, &block)
Lazy.new(*parms, &block)
end
end