Port a Library (#64)

bbazzarrakk · February 2, 2006, 2:38pm

The great element of porting a library is that you get examine another
programmer’s ideas. If you’re lucky, that may teach you a new trick or
two.
I’ll use my experience as an example.

Using Buffers

When I decided to port File::ReadBackwards, the first question I asked
myself
was, how do you read a file backwards? I decided that you would need to
put the
read head at the end of the file, then work it backwards bit by bit.
You can’t
return a line until you have the whole thing, so I knew I would need to
buffer
the reads. I guessed I would be sure I had a line when I had seen two
line
endings (whatever is between them is a line) or run into the beginning
of the
file (no more data). That actually turns out to be the rough process,
but,
luckily for me, Uri Guttman is smarter than me and reading Uri’s code
taught me
a couple of tricks.

Here’s a simplified version of my port, showing only the interesting
methods:

# Based on Perl's File::ReadBackwards module, by Uri Guttman.
class Elif
  MAX_READ_SIZE = 1 << 10  # 1024

  def initialize( *args )
    @file = File.new(*args)
    @file.seek(0, IO::SEEK_END)

    @current_pos = @file.pos

    @read_size = @file.pos % MAX_READ_SIZE
    @read_size = MAX_READ_SIZE if @read_size.zero?

    @line_buffer = Array.new
  end

  def gets( sep_string = $/ )
    return @line_buffer.pop if @line_buffer.size > 2 or

@current_pos.zero?

    @current_pos -= @read_size
    @file.seek(@current_pos, IO::SEEK_SET)

    @line_buffer[0] = "#{@file.read(@read_size)}#{@line_buffer[0]}"
    @read_size      = MAX_READ_SIZE  # Set a size for the next read.

    @line_buffer[0] =
      @line_buffer[0].scan(/.*?#{Regexp.escape(sep_string)}|.+/)
    @line_buffer.flatten!

    gets(sep_string)
  end
end

You can see the first trick Uri taught me in initialize(). Working the
pointer
backwards can be messy. You have to keep track of where you are, shift
the read
pointer back, read some, but always make sure you have that many bytes
left.
There’s an easier way.

If you pick some number of bytes you are going to read, you can consider
the
file to be in chunks that size. For example, if we are going to read
ten bytes
at a time and have a twenty-four byte file, we can deal with it in
chunks of
ten, ten, and four. The only odd chunk size is at the end, where we
start, so
we can deal with that immediately and then all future reads can be
whatever
chunk size we selected.

That’s what initialize() is doing: Open the file, jump to the end, and
set that
initial read size to handle the dangling partial chunk.

Now we can tackle gets() and I’ll tell you about the other lesson Uri
taught me.
First, the function of gets() is pretty basic: If we know we have a
line or
that we are out of lines, return it or nil. Otherwise, read some more
data,
trying to make one of those two exit conditions true, and recurse. The
only
hard part is deciding when we have a line.

I was dreaming up a complicated solution to this, when reading Uri’s
code showed
me the light. You can store the data in the buffer exactly like it is
in the
file (or whatever we’ve read of it so far). We will break it at lines
though,
because that’s what we’re interested in reading. At any given time,
it’s very
likely the buffer holds a partial line (because we haven’t seen the
rest).
However, if we have at least two lines buffered, we can return one
immediately.
One of them is likely a partial sure, but the last one, the one the user
wants,
must be full now. This is easy to code, as you can see above.

In gets(), I prepend each read to the first (likely partial) line. Then
I use
scan() to find all the lines in there, creating an Array, then
flatten!() to
fold those lines back into the buffer, discarding the extra Array.

Thanks for the lessons Uri!

You really pick these insights up from reading the code of others, which
is why
I think reading code is important. This is one of the big perks of
running the
Ruby Q… I get to read a lot of code and the submissions are always
teaching
me things. This week was no different, so now let me tell you what I
learned
from others.

Lazy Evaluation

This next library came at just the right time for me. I’m reading
Higher-Order
Perl, by Mark Jason Dominus, and trying to apply the ideas I am learning
there
to my Ruby usage. One of the big concepts in the book is “lazy” code,
which is
just a fancy way of saying, I want to run this… later.

There are a lot of advantages to something like this. If an operation
is
expensive in computational terms, we can assure that it doesn’t happen
until it
is needed. The advantage to that is that it may not be needed at all,
which
keeps us from wasting time.

Another less-used example is that we can delay evaluation until we have
more
information. The standard PStore library is a great example here. You
pass it
the path to the cache file in initialize(), but it waits until
transaction() to
actually open the file. The reason is that you can start a “read-only”
transaction, or a normal transaction that allows reading and writing.
By
waiting to open the file, PStore knows the right mode to use, the right
level of
file locking to apply, etc.

The tricky part to lazy evaluation, for me at least, is just getting
your head
around it. When you decide you’re ready though, here is an excellent
first
step:

module Trampoline
   # Instance methods
   class Bounce
      def initialize(cons, klass, *args)
         @klass, @cons, @args = klass, cons, args
      end

      def method_missing(method, *args)
         @obj = @klass.send(@cons, *@args) unless @obj
         @obj.send(method, *args)
      end
   end

   # Class methods
   class << Bounce
      alias_method :old_new, :new

      def new(*args)
         old_new(:new, *args)
      end

      def method_missing(method, *args)
         old_new(method, *args)
      end
   end
end

That’s Matthew M.'s port of the Perl Trampoline module, by Steven
Lembark.
Let’s break down what this does.

First, the instance methods. It seems that initialize() doesn’t do
anything
except store some information. We will find out what for in
method_missing().

Remember that method_missing() will be called for any message we haven’t
defined. In this case, that’s pretty much everything. (More on that in
a
minute…) When called, method_missing() makes sure @obj is defined.
If it’s
not, it is created by calling the proper cons(tructor) with the args
initialize() tucked away. Then the message is just forwarded to @obj.
That
means the object is built just before the first method call, then reused
to
handle all future method calls.

The only problem with the above strategy is that Bounce includes some
default
Ruby methods inherited from Object. This means that something like a
to_s()
call isn’t forwarded. You can fix this by adding something like the
following
to Bounce:

instance_methods.each { |m| undef_method m unless m =~ /^__/ }

Now let’s look at the class methods. You can see that new() is moved and
redefined, to change its interface for calling code. Now we see another
method_missing(), this time for the class itself. It works just like
the
redefined new(), forwarding the message to the constructor. Remember,
not all
objects are constructed with new(). Singleton objects often use
instance(), for
example. This method allows for that. Whatever is called will later be
used to
build the object.

That’s a great introduction to lazy evaluation. When that sinks in a
bit and
you’re ready for more, see the excellent lazy.rb library by MenTaLguY:

http://moonbase.rydia.net/software/lazy.rb/

Currying

Another interesting technique discussed in Higher-Order Perl was also
represented in the solutions to this quiz. Ross B. ported Perl’s
Sub::Curry module by Johan Lodin. Currying is basically the process of
using
functions to manufacture other functions, as seen in these simple
examples:

require "curry"

scale = lambda { |size, object| object * size }

puts "3 scaleded to a size of 10 is #{scale[10, 3]}."  # 30
puts

# curry some functions
double = scale.curry(2)
triple = scale.curry(3)
halve  = scale.curry(0.5)

puts "4 doubled is #{double[4]}."   # 8
puts "1 tripled is #{triple[1]}."   # 3
puts "Half of 10 is #{halve[10]}."  # 5.0

The great side of this library is that it can handle much more
complicated
argument setups than this. For example, what if the arguments to scale
had been
reversed? No problem:

scale = lambda { |object, size| object * size }

puts "3 scaleded to a size of 10 is #{scale[10, 3]}."  # 30
puts

# we can leave "holes" in the argument list
double = scale.curry(Curry::HOLE, 2)
triple = scale.curry(Curry::HOLE, 3)
halve  = scale.curry(Curry::HOLE, 0.5)

puts "4 doubled is #{double[4]}."   # 8
puts "1 tripled is #{triple[1]}."   # 3
puts "Half of 10 is #{halve[10]}."  # 5.0

That works exactly the same, but if you’re not impressed yet we can get
even
fancier. The library already supports a bunch of “spices”, like HOLE
above, but
you can also add your own:

# Lazy Evaluation meets Currying...
class LazySpice < Curry::SpiceArg
  def initialize( &promise )
    super("LAZYSPICE")

    @promise = promise
  end

  def spice_arg( args )  # called to provide the missing argument
    [@promise.call]
  end
end

logger = lambda do |time, message|
  puts "[#{time.strftime('%I:%M:%S %p %m/%d/%y')}] #{message}"
end

log_now = logger.curry(LazySpice.new { Time.now })

log_now["First Message."]   # => [12:47:53 PM 02/01/06] First Message.
sleep 3
log_now["Second Message."]  # => [12:47:56 PM 02/01/06] Second Message.

Notice how the LazySpice isn’t evaluated until the time of the call.
That lazy
execution makes sure our message is stamped with the time it was
actually
logged.

I’m not going to show the library here, since I’ve gone on long enough,
but I
hope you will agree that it’s worth a look.

Wrap Up

Please take some time to look into the other solutions I didn’t cover
here. All
of them were interesting ideas and I hope their authors will consider
packaging
them up for all to use.

Myself, and others, were worried that this would not be a popular quiz.
It far
exceeded my expectations though and I owe a big thank you to all who
made that
happen! You are all so clever it makes even me look good.

We’ll go back to a more traditional Ruby Q. problem tomorrow, I
promise. This
time we will borrow a problem from the Code Katas…

bbazzarrakk · February 2, 2006, 4:06pm

On Thu, 2006-02-02 at 22:37 +0900, Ruby Q. wrote:

log_now[“First Message.”] # => [12:47:53 PM 02/01/06] First Message.
sleep 3
log_now[“Second Message.”] # => [12:47:56 PM 02/01/06] Second Message.

Notice how the LazySpice isn’t evaluated until the time of the call. That lazy
execution makes sure our message is stamped with the time it was actually
logged.

This is very cool, thanks for the great write-up. However, there is a
small bug (in curry.rb, not this code) that can cause problems with
this, e.g:

a = [1,3,5]
b = [2,4,6]

l = lambda do |ary,aa,ba|
  ary +	[aa,ba]
end.curry(Curry::HOLE,Curry::HOLE,TestSpice.new { b.shift })

l.call
# => [1,nil,3,nil,5,nil]    (expected: 1,2,3,4,5,6)

This only shows up when special spices are at the end of the argument
list, and happens because there are no args_remain left by the time
they’re seen, and I took a short-cut in the implementation. The attached
patch (against the documented version from
http://roscopeco.co.uk/code/ruby-quiz-entries/64/curry.rb) fixes things.
There maybe a slight impact on performance, though (if that matters).

I’m considering maybe packaging Ruby Murray up as a gem and releasing it
on RubyForge. I’ll add James’ LazySpice, and would like any suggestions
others may have (esp if anyone has something we could legitimately call
“SportySpice” ;D)

Thanks again for the quiz. Cool entries everyone

bbazzarrakk · February 2, 2006, 4:09pm

On Fri, 2006-02-03 at 00:03 +0900, Ross B. wrote:

l = lambda do |ary,aa,ba|
ary + [aa,ba]
end.curry(Curry::HOLE,Curry::HOLE,TestSpice.new { b.shift })

(Oops, TestSpice is a simplified LazySpice I used in the tests.
Substitute LazySpice there.)

bbazzarrakk · February 3, 2006, 2:04am

In article
[email protected],
Ruby Q. [email protected] wrote:

It would be cool to have a monthly Port-a-library exercise that might
run in
parallel to the Ruby Q.es. Lots of libraries are too large to port in
a
weekend. it might also be interesting to have a ‘hit-list’ of libraries
to port
each month based on input from ruby-talk and other sources.

Phil

bbazzarrakk · February 3, 2006, 11:49am

Ruby Q. [email protected] wrote:

The great element of porting a library is that you get examine another
programmer’s ideas. If you’re lucky, that may teach you a new trick or two.
I’ll use my experience as an example.

I also love the way that code seems to vanish when you port it to ruby

martin

bbazzarrakk · February 3, 2006, 2:16am

On Feb 2, 2006, at 7:03 PM, Phil T. wrote:

them up for all to use.
parallel to the Ruby Q.es. Lots of libraries are too large to
port in a
weekend.

Interesting thought. If you could divide up the bigger libraries,
maybe the whole group could help do one at a time.

it might also be interesting to have a ‘hit-list’ of libraries to port
each month based on input from ruby-talk and other sources.

Ooo, I really like that idea.

Sounds like a project you just have to start, to me.

James Edward G. II