Twisting a Rope (#137)

The discussion for this quiz was very interesting. It’s probably worth
your
time to go back and read those messages, if you haven’t already. Some
points I
got out of the discussion were:

  • It’s tricky to get ropes right. Clever implementations may loose
    key
    rope qualities, like the O(1) concatenation time.
  • The functional approach, building immutable ropes, is probably
    superior considering how ropes are intended to be used.
  • Autorebalancing didn’t hurt much, at least in the quiz test case.
  • Copy-On-Write is helpful, when it works.

Many of these points came out due to Eric M.'s work in benchmarking
the
submitted solutions. Eric also submitted several variations of rope
classes, a
few of which I want to take a look at below. Let’s begin with the class
Eric
uses to build the rope trees:

module Mahurin
class Rope
include Enumerable
# form a binary tree from two ropes (possibly sub-trees)
def initialize(left,right)
@left = left
@right = right
@llength = @left.length
@length = @removed_email_address@domain.invalid
@depth = [left.depth, right.depth].max+1
end
# number of elements in this rope
def length
@length
end
# depth of the tree (to help keep the tree balanced)
def depth
@depth
end
# left rope (not needed when depth==0)
def left
@left
end
# right rope (not needed when depth==0)
def right
@right
end

  # ...

Here we see the setup and relevant attributes of Rope objects. First we
have
the fact that they are binary trees, with left and right subtrees. Next
we see
that Eric is going to track two lengths for Rope objects, both the total
length
and the length of just the left subtree. The reasoning for that will
become
apparent when we examine indexing. Finally, Eric tracks a depth, for
use in
rebalancing.

There are really two major operations that are key to a Rope
implementation:
concatenation and indexing. Here’s the concatenation side of the
puzzle:

  # ...

  # appended rope (non-modifying)
  def +(other)
    # balance as an AVL tree
    balance = other.depth-@depth
    if balance>+1
      left = other.left
      right = other.right
      if left.depth>right.depth
        # rotate other to right before rotating self+other to left
        (self + left.left) + (left.right + right)
      else
        # rotate self+other to left
        (self + left) + right
      end
    elsif balance<-1
      if @right.depth>@left.depth
        # rotate self to left before rotating self+other to right
        (@left + @right.left) + (@right.right + other)
      else
        # rotate self+other to right
        @left + (@right + other)
      end
    else
      self.class.new(self, other)
    end
  end
  alias_method(:<<, :+)

  # ...

This method is only this long because it automatically rebalances the
tree as
needed. In fact, if you glance down to the final else clause, you will
see the
trivial implementation, which is just to construct a new Rope from the
current
Rope and the concatenated element.

The rebalancing done here is, as the comment suggests, a textbook AVL
implementation. With an AVL tree, you subtract the left depth from the
right
depth to get a tree’s balance factor. Anything in the range of -1 to 1
is a
balanced tree. If the factor is outside of that range, one or two
rotations are
required to rebalance the tree.

I’m not going to go into the specific rotations. If you would like to
read up
on them, I recommend:

http://fortheloot.com/public/AVLTreeTutorial.rtf

Let’s move on to the indexing methods used to pull information back out
of a
Rope:

# ...

# slice of the rope
def slice(start, len)
  return self if start.zero? and len==@length
  rstart = start-@llength
  return @right.slice(rstart, len) if rstart>=0
  llen = @llength-start
  rlen = len-llen
  if rlen>0
    @left.slice(start, llen) + @right.slice(0, rlen)
  else
    @left.slice(start, len)
  end
end
# element at a certain position in the rope
def at(index)
  rindex = index-@llength
  if rindex<0
    @left.at(index)
  else
    @right.at(rindex)
  end
end

# ...

Have a look at the at() method first, because it’s easier to digest and
it shows
the concept of indexing this tree structure. Essentially, to index in
the tree
you check a position against the left length. If it’s less than, index
the left
side. If it’s greater than, index the right. This search tactic is the
hallmark attribute of binary trees.

The slice() method works the same way. It’s just more complicated
because it
has to work with two indices instead of one. Finding the start index is
the
same strategy we saw in at(). If that index is in the right subtree,
the end
will be as well and the code makes the recursive hand-off without
further
checks. When it’s in the left, the end must be located. If that end
point
turns out to also be in the left, the hand-off is made to the left side.
When
it is in the right, a partial slice() is made from both halves and
combined.
This covers all the cases.

Eric added a couple more methods to the Rope class that cover iteration
and
converting to a String:

  # ...

  # iterate through the elements in the rope
  def each(&block)
    @left.each(&block)
    @right.each(&block)
  end
  # flatten the rope into a string (optionally starting with a 

prefix)
def to_s(s=“”)
@right.to_s(@left.to_s(s))
end
end

# ...

That covers the tree implementation. What we haven’t seen yet though,
are the
leaf nodes. We need two types for the implementation I want to examine,
EmptyRope and StringRope. Here’s the first of those:

# ...

EmptyRope = Object.new
class << EmptyRope
  include Enumerable
  def length
    0
  end
  def depth
    0
  end
  def +(other)
    other
  end
  alias_method(:<<, :+)
  def slice(start, len)
    self
  end
  def each
  end
  def to_s
    ""
  end
end

# ...

This implementation is kind of a poor man’s singleton instance (the
design
pattern, not the Ruby concept, though we see both here). There
shouldn’t be any
surprises in this code. The attributes are zeroed, concatenation
results in
whatever the concatenated element is, and slice()ing just returns self
which
doubles as an empty String.

That leaves just one last leaf, StringRope:

# ...

class StringRope
  include Enumerable
  def self.new(*args)
    if args.empty?
      EmptyRope
    else
      super
    end
  end
  def initialize(data)
    @data = data
  end
  def length
    @data.length
  end
  def depth
    0
  end

  # ...

This class just wraps a String to support the Rope API we’ve been
examining.
About the only interesting trick here is the is that the default new()
for this
class is overridden to allow returning an EmptyRope as needed. Anytime
the
argument is provided though, this method does hand-off to the default
new()
implementation.

Here’s the concatenation method:

# ...

def +(other)
  balance = other.depth
  if balance>1
    left = other.left
    right = other.right
    if left.depth>right.depth
      # rotate other to right before rotating self+other to left
      (self + left.left) + (left.right + right)
    else
      # rotate self+other to left
      (self + left) + right
    end
  else
    Rope.new(self, other)
  end
end
alias_method(:<<, :+)

# ...

We see some more balance work here, but the algorithm is simplified
since only
the right side can be a subtree. Beyond that, we’ve seen this code
before.

Here are the final few methods:

  # ..

  def slice(start, len)
    return self if start.zero? and [email protected]
    # depend on ruby's COW mechanism to just reference the slice 

data
self.class.new(@data.slice(start, len))
end
def at(index)
@data[index]
end
def each(&block)
@data.each_char(&block)
end
def to_s(s=“”)
s.concat(@data.to_s)
end
end
end

Note here that slice() was overridden to return a StringRope instead of
a
String. As the comment says, Ruby internally uses some Copy On Write
semantics
to reference sliced Strings. This should keep it from wildly
duplicating the
data, but it was found that a couple of solutions had problems with this
for
unknown reasons.

That covers a basic Rope implementation. We won’t bother to go into the
destructive methods as you are probably better off working without them.

My thanks to all who explored this newly popular data structure with us.
It
looks like there will be a talk on ropes at this year’s Rubyconf, so
hopefully
we gave the speaker some extra material to work with.

This week’s Ruby Q. will start a day early, to adjust for my Lone Star
Rubyconf schedule this weekend. The no-spoiler period is still 48 hours
and I
will still summarize it at the same time, you just get an extra day to
work on
it. Stay tuned for that problem in just a few minutes…

On 9/6/07, Ruby Q. [email protected] wrote:

Many of these points came out due to Eric M.'s work in benchmarking the
submitted solutions. Eric also submitted several variations of rope classes, a
few of which I want to take a look at below.

I’m glad to see that you didn’t let requests of the quiz get in the
way. There were a bunch of things that I didn’t do that the quiz
asked for (in the classes you presented):

  • be able to append/prepend/shift mutable ropes (made immutable ropes
    instead)
  • allow ropes to operate directly with strings (required a separate
    StringRope proxy class instead)
  • provide a normalize method (used auto-balancing concatenation instead)
  • O(1) concatenation - the max and average number of rotations for my
    auto-balancing concatenation is O(log(n)). Slicing is also O(log(n)),
    so it isn’t a big problem.

Eric

Hi All,

I’m very sorry that this quiz ended up coming out over the labor day
weekend and the start of my semester, I hardly had a chance to look at
it all week, but I’m excited about what I’ve read. I wanted to add a
couple of comments based on trying to use these thing in practice. The
example was a good test case because it showed a large difference
between strings and ropes. Real situations are not quite as neat and
clean. In particular concatenating single characters to a string as it
is build and removing them from the front as it is used are very common
operations that the test case didn’t measure. No one really addressed
these two operations because there was not a test case to show them.

James G. wrote:

The rebalancing done here is, as the comment suggests, a textbook AVL
implementation. With an AVL tree, you subtract the left depth from the
right
depth to get a tree’s balance factor. Anything in the range of -1 to 1
is a
balanced tree. If the factor is outside of that range, one or two
rotations are
required to rebalance the tree.

This was the other interesting part of the quiz. Every answer I saw
balanced the tree giving each leaf the same weight. The leaves on the
other hand varied in length between 8 characters and 512k. The
suggested way to balance a Rope was based on the length of the leaves in
such a way that longer leaves were nearer the root because presumably
they will be accessed more often.

This is to take nothing away from the solutions that came out. They
were very well written and I am most humbled by them. More this is an
address to the comment about “Where is the Gem?” If and when Ropes do
become a general use library, these are things that I believe need to be
addressed. I also am looking forward to hearing about the presentation
at RubyConf.

Thanks to everybody who worked on this. I wish I had more time to join
in.

John M.

“John M.” [email protected] wrote in message
news:[email protected]

Hi All,

This was the other interesting part of the quiz. Every answer I saw
balanced the tree giving each leaf the same weight. The leaves on the
other hand varied in length between 8 characters and 512k. The
suggested way to balance a Rope was based on the length of the leaves in
such a way that longer leaves were nearer the root because presumably
they will be accessed more often.

At least two implementations did this (James K.'s and mine). OTOH
this
added a lot of mess to the code, and a simple change on e.g. Eric’s
implementation (to use length instead of depth) will achieve practically
the
same.

–EK

“Eric M.” [email protected] wrote in message
news:[email protected]

such a way that longer leaves were nearer the root because presumably
–EK
You could do something like this (using depth=log2(length) and a little
math):

if other.length>2*@length

elsif @length>2*other.length

In reality all in-time short-cut rebalancings will perform worse than
the
full one. For now I do not see any algorithm better than in original
article, though it is also not ideal (but good on memory usage)

Everything is about probability of access to particular symbol. You can
optimize for particular scenario (in this one - you do not want any
rebalancing at all), but these optimizations will behave miserably in
any
other cases

–EK

On 9/7/07, Eugene K. [email protected] wrote:

At least two implementations did this (James K.'s and mine). OTOH this
added a lot of mess to the code, and a simple change on e.g. Eric’s
implementation (to use length instead of depth) will achieve practically the
same.

–EK

I’d think you’d treat depth as log2(length) and do something similar.
Instead of this for concatenation:

balance = other.depth-@depth
if balance>+1

elsif balance<-1

You could do something like this (using depth=log2(length) and a little
math):

if other.length>2*@length

elsif @length>2*other.length

Maybe you’d also need some info about the smallest, largest, leftmost,
and/or rightmost leaf lengths so that you aren’t trying to balance
something that can’t be. I haven’t thought this all the way through.

I think this is a good idea. It increases the max run-time for
slicing (but still O(log(n)), but hopefully reduces the average.

On the other hand, I don’t know that it is a good assumption that
you’ll be accessing longer leaves more often than shorter leaves.
This assumes random access of the elements in the rope. For example,
if you are using a rope for a text editor buffer, there will be a
you’ll only be accessing a small part of the rope you’ll be accessing
while typing stuff in. And at that point, you’ll probably be dealing
with short leaves since things are changing around where you are
editing. You may want the most recently used closer to the root of
the tree.