Editing Text (#145)

Implementing an efficient data structure for this problem can be tricky.
A
couple of the solutions turned out to have a higher complexity than it
first
appeared. That doesn’t mean they aren’t valuable to examine though.

Let’s have a look at Holger’s code below. It’s an implementation of the
quiz-recommended gap buffer data structure. It’s a very clean
implementation of
the idea that helped me understand what a gap buffer really is. Here’s
the
constructor:

class GapBuffer
def initialize(data=“”, i=0)
@data = data
@gap_start = i
@gap_len = 0
@GAP = " "*64
end

# ...

The actual content of our text editing session will be kept in @data.
However,
that data may contain some extra material called a “gap,” which will
begin at
the index @gap_start and be @gap_len characters long. @GAP is just a
constant
used to insert new gaps when that last one is exhausted.

How this interacts in usage is well shown by the insert methods:

# ...

def insert_before(ch)
  if @gap_len.zero?
    @data[@gap_start, 0] = @GAP
    @gap_len = @GAP.length
  end
  @data[@gap_start] = ch
  @gap_start += 1
  @gap_len -= 1
end

def insert_after(ch)
  if @gap_len.zero?
    @data[@gap_start, 0] = @GAP
    @gap_len = @GAP.length
  end
  @data[@gap_start+@gap_len-1] = ch
  @gap_len -= 1
end

# ...

The idea for both of these methods is simple. To add content, we place
it in
either end of the gap, shrinking the space available there. To insert a
character before the caret we must add the character to the beginning of
the
gap, bump the gap start index beyond the new character, and decrease our
count
of available space in the gap. Adding after the caret just means
placing the
character at the end and decreasing the length counter.

The if statement at the beginning of both of these methods watches for
the gap
to be exhausted. When we run out of space, a new gap is inserted and
our length
counter is reset.

Deleting characters is just the opposite operation:

# ...

def delete_before
  return if @gap_start.zero?
  @gap_start -= 1
  @gap_len += 1
  @data[@gap_start]
end

def delete_after
  return if @gap_start+@gap_len >= @data.length
  @gap_len += 1
  @data[@gap_start+@gap_len-1]
end

# ...

To remove something all we really have to do is to manipulate our index
and/or
length to put the discarded character back into the gap. Once there,
later
operations can overwrite it normally.

Note that the last lines of these methods return the “deleted” character
and the
first lines skip the operations when there are no characters beyond the
gap.

Now the harder operations of a gap buffer are moving the caret. We’ll
start
with left() and right(), which are the easier moves:

# ...

def left
  return if @gap_start.zero?
  @data[@gap_start+@gap_len-1] = @data[@gap_start-1]
  @gap_start -= 1
  @data[@gap_start]
end

def right
  return if @gap_start+@gap_len>[email protected]
  @data[@gap_start] = @data[@gap_start + @gap_len]
  @gap_start += 1
  @data[@gap_start - 1]
end

# ...

Moving the caret to either side means transferring a character from one
end of
the buffer to the other. Then we can just expand the gap to include the
moved
character’s previous location, so future editing operations will swallow
in up.
For example, given the following data:

The gap buffer[ ], or our caret, is shown here with brackets.

A move to the left() is gives us:

The gap buffe[r ]r, or our caret, is shown here with brackets.

The moved character is on the right of the buffer while the original is
inside.
Characters inside the buffer don’t really exist in the content so we
don’t have
any duplication here.

Now for the hard moves:

# ...

def up
  col = column
  cursor = @gap_start-col
  return if cursor.zero?
  cursor_line = @data.rindex(?\n, cursor-2)
  cursor_line = 0 if cursor_line.nil?
  cursor_line += col+1
  if cursor_line > cursor-1
    cursor_line = cursor-1
  end
  left while @gap_start > cursor_line
  true
end

def down
  col = column
  cursor = @data.index(?\n, @gap_start + @gap_len)
  return if cursor.nil?
  cursor_line = cursor+1+col
  cursor = @data.index(?\n, cursor+1)
  cursor = @data.length if cursor.nil?
  cursor_line = cursor if cursor_line > cursor
  right while @gap_start + @gap_len < cursor_line
  true
end

# ...

While this looks like a lot of code, the procedure is simple.

The column() helper method, which we will see shortly, figures out where
we are
in the current line. For the down() method, that’s really all we need
to mark
our current location. When moving up() though, we want to reach the
line before
ours, or somewhere between one and two newlines back. To make that
easier, up()
first backs up by the column count to start the search from the
beginning of the
line.

Then we have a terrific use index()/rindex() with the optional minimum
index
parameter to find the next newline past the end of the gap buffer. When
that
search fails, the caret is moved to the beginning or end of the data
which must
be the last line we needed to cross. Having found the desired line, the
current
column is added on, and trimmed to match the data if it exceeds the
line.

Together, these operations give us a where we are and where we need to
get to.
The actual move is performed with repeated calls to left()/right(), one
for each
character we need to travel.

Here are a couple of location methods, one of which we just saw in use:

# ...

def position
  @gap_start
end

def column
  lbreak = @data.rindex(?\n, @gap_start-1)
  lbreak.nil? ? @gap_start : (@gap_start-(lbreak+1))
end

# ...

As you can see, position() is just an alias for @gap_start. The
column() method
though is just more clever use of the rindex() method. It really works
just
like we saw in up(), save that it doesn’t need to skip over a newline.

The last operation supported by this solution is to stringify the data:

# ...

def to_s
  return @data[0, @gap_start] +
         @data[@gap_start+@gap_len, 

@data.length-@gap_start-@gap_len]
end
end

The actual data is just the concatenation of what’s on both sides of the
gap.
The code above assembles that with simple indexing.

Holger had hoped the solution was O(n), but unfortunately it turns out
to be
O(n**2). Eric explains why this is and provides tips for how to get it
to O(n)
in this email:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/277308

While the code does need the mentioned changes to reduce its complexity,
I still
felt it was a very clean implementation of the gap buffer and quite easy
to
follow. This should make fixing it up a snap.

My thanks to those who did manage to find the time for the quiz. It was
a
tricky problem to do well and all of the solutions showed off neat
approaches.

Tomorrow we will do some vehicle counting, programmer style…

On 11/8/07, Ruby Q. [email protected] wrote:

Implementing an efficient data structure for this problem can be
tricky. A
couple of the solutions turned out to have a higher complexity than it
first
appeared. That doesn’t mean they aren’t valuable to examine though.

I thought we had a good mix of data structures in the solutions.
Anybody
with an interest in data structures should take a look. Here are the
data
structures (or features) I saw in the solutions :

  • conventional gap buffer
  • “double-ended” gap buffer (one using Array and another String)
  • deferred gap movement until editing operations
  • linked list of lines (strings)
  • circular buffer
  • linked list of buffers (gap and file)

I thought all of them were very good solutions. Other possibilities
were
also mentioned.

Let’s have a look at Holger’s code below.

      def insert_before(ch)
        if @gap_len.zero?
          @data[@gap_start, 0] = @GAP
          @gap_len = @GAP.length
        end
        @data[@gap_start+@gap_len-1] = ch
        @gap_len -= 1
      end

Holger had hoped the solution was O(n), but unfortunately it turns out
to be

follow. This should make fixing it up a snap.
BTW, the simple way to make Holger’s code O(n) is to replace this line
(in
both inserts):

@data[@gap_start, 0] = @GAP

with this:

@data[@gap_start , 0] = @data

This makes the gap (garbage data) the same size as the real data. This
amortizes the O(n) gap increase across O(n) operations, so that on
average
each insert is still O(1). The gap just needs to be increased to a size
relative to the size of the real data. This is the same way the
reallocating arrays/strings (like ruby’s C-level Array/String
implementations) are done so that operations at the end (where the “gap”
is)
are amortized O(1).

Eric