Twisting a Rope (#137)

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this quiz
    until
    48 hours have passed from the time on this message.

  2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

  1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by John M.

This week’s task is to implement the Rope data structure as a Ruby
class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5
million
character string this year.

What is a Rope:

You may not realize it, but for many changes the content to a String,
Ruby
creates a new copy of the original with the modifications applied. For
small
strings that are created once and read often this is actually a very
efficient
way to do thing, but what happens when the string starts to get long,
and is
undergoing a lot of changes? First, the program will spend more and
more of its
processing cycles just copying bits around. Second, the garbage
collector will
be called more and more often to pick up the little stringy scraps
you’ve left
all over the memory.

Ropes (the name is a pun for a heavy duty string) are tree structures
where a
node represents the concatenation of its left branch with its right, and
leaves
are flat strings. (This is a little sloppy. A rope may contain shared
subtrees,
and is thus really a directed acyclic graph, where the out-edges of each
vertex
are ordered. We will continue to be sloppy.) E.g. To prepend Text A to
Text B,
one creates a Node (call it N1) with A as its left branch and B as its
right.
To further append Text C create a new Node N2 with its left branch
pointing to
N1 and its right to C. Easy, right? To find out more see Boehm,
Atkinson and
Plass “Ropes: an Alternative to Strings” at:

http://rubyurl.com/2FRbO

The task comes in three parts, each increasing in difficulty:

Part one:

Create a Rope class that can do the following:

a. ‘append’ or ‘prepend’ a String or another Rope
(alias the << operator to the append function)
b. Return the fully concatenated text with ‘to_s’
c. define ‘slice’ to call to_s.slice
d. provide a ‘length’ method

Part two:

Add the following:

a. Add the ability to ‘index’ a single character given a 0-based
offset
from the beginning of the string.
b. Add the ability to ‘shift’ a single character from the front of a
Rope.
(Remove and return the character)
c. Add your own ‘slice’ method that returns a String. Implement as
many of
the String method’s forms as possible. To run the example code
this
function will only need to understand the slice(offset,length)
form.
Major Bonus for Regex and Substring forms.
d. “Balance” the tree with a ‘normalize’ method.
(see Boehm, Atkinson and Plass 1319 Rebalancing)

Part three: (bonus)

Add the following:

a. Change the ‘slice’ method to return a Rope. Ideally this method
should
do as little string copying as possible. (Doing this will well
dramatically increase the speed of the example code)
b. Allow ‘shift’ to optionally accept an integer number of characters
to
remove and return.
c. modify the ‘<<’ operator so that can efficiently append a few
characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
d. Major Bonus Add the ability to append and prepend IO classes in a
lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)

The following code may help you explore how efficient your code is and
show
where Ropes are useful. ruby -r /path/to/your/rope/class this_script.rb Rope
will run the test with your code. Run the script without arguments to
see how
well String does. Also play around with the SIZE and CHUNKS constants
to get a
feel for how they affect performance.

require ‘benchmark’

#This code make a String/Rope of CHUNCKS chunks of text
#each chunck is SIZE bytes long. Each chunck starts with
#an 8 byte number. Initially the chuncks are shuffled the
#qsort method sorts them into ascending order.

#pass the name of the class to use as a parameter
Ruby -r rope.rb this_file Rope

puts ‘preparing data…’
TextClass = Object.const_get(ARGV.shift || :String)

def qsort(text)
return TextClass.new if text.length == 0
pivot = text.slice(0,8).to_s.to_i
less = TextClass.new
more = TextClass.new
offset = 8+SIZE
while (offset < text.length)
i = text.slice(offset,8).to_s.to_i
(i < pivot ? less : more) << text.slice(offset,8+SIZE)
offset = offset + 8+SIZE
end
print “*”
return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
end

SIZE = 512 * 1024
CHUNCKS = 128
CHARS = %w[R O P E]
data = TextClass.new
bulk_string =
TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
puts ‘Building Text…’
build = Benchmark.measure do
(0…CHUNCKS).sort_by { rand }.each do |n|
data<< sprintf(“%08i”,n) << bulk_string
end
data.normalize if data.respond_to? :normalize
end
GC.start
sort = Benchmark.measure do
puts “Sorting Text…”
qsort(data)
puts"\nEND"
end

puts “Build: #{build}Sort: #{sort}”

My apologies,

The first line of the quiz should read:
“You may not realize it, but for many changes to the content of a
String,”

I am really looking forward to seeing some ingenious solutions to this
problem. The concept seemed very simple to me but my first attempt kept
becoming mired down in edge cases. I hope everyone has a good weekend
and a good Labor Day weekend to those in the US (plenty of time to work
on this weeks quiz :wink:

John M.

On Sep 1, 2007, at 12:20 AM, John M. wrote:

The first line of the quiz should read:
“You may not realize it, but for many changes to the content of a
String,”

I’ve fixed this on the web site. Thanks for pointing it out.

James Edward G. II

Hi All -

I posted my solution in the ‘rope.rb’ file here:
http://pastie.caboo.se/93242
I slightly modified the nice benchmark tool because normalize doesn’t
work
on the rope itself for me.

I ran a few iterations with the benchmark with
SIZE = 512 * 1024
CHUNKS = 256.

The following is pretty representative of what I observed:

  1. Normal strings:
    Build: 0.733000 0.406000 1.139000 ( 1.193000)
    Sort: 7.800000 7.238000 15.038000 ( 15.924000)

  2. Ropes
    Build: 0.047000 0.016000 0.063000 ( 0.060000)
    Sort: 10.686000 0.016000 10.702000 ( 10.902000)

  3. Ropes with the normalization function
    Build: 0.156000 0.000000 0.156000 ( 0.156000)
    Sort: 5.741000 0.047000 5.788000 ( 6.212000)

Regards,
Himadri

I’ve modified my Rope so it runs with the benchmark (and fixed some
bugs).

http://pastie.caboo.se/93256

with the included benchmark:

badcarl@navi> ruby -r lib/rope.rb benchmark.rb String
Build: 0.150000 0.080000 0.230000 ( 0.234209)
Sort: 1.700000 1.850000 3.550000 ( 3.613295)

badcarl@navi> ruby -r lib/rope.rb benchmark.rb Rope
Build: 0.000000 0.000000 0.000000 ( 0.009324)
Sort: 0.280000 0.080000 0.360000 ( 0.372736)

I’m getting around 10x speedup on sorting.

On 8/31/07, Ruby Q. [email protected] wrote:

This week’s task is to implement the Rope data structure as a Ruby class.

My implementation is attached along with a modified test. I made a
few changes to what was asked, but this also gave more flexibility.
Here is what I did:

  • Just as the implementation in the paper referenced in this quiz, my
    implementation gives immutable ropes. #<< is non-destructive, so I
    had to change the test to do <<= instead of just <<. Also, I didn’t
    implement #shift, because it must be destructive. The same
    functionality can be achieved with 2 calls to #slice (one could be #at
    if you only need to shift one element). There are very good reasons
    to make ropes immutable. I’d imagine almost every other
    implementation with modifiable ropes will fail when someone starts
    modifying a sub-rope that is shared. You’d really need a good COW
    (copy-on-write) scheme to allow both transparent sharing and
    modification. I didn’t see that it was worth the
    effort/complexity/risk. I chose the simple functional programming
    approach (immutable objects).

  • I chose to automatically balance the ropes during concatenation (no
    normalize). I used the same tree rotations that are used with AVL
    trees. Another option could be to treat these as red-black trees
    which might save on some rotations. One reason I automatically
    balanced is that it simplifies the interface. The user of the API
    doesn’t have to worry about when to normalize. A second reason is
    that every other rope operation is O(log(n)), so there probably isn’t
    much benefit in making only concatenation O(1).

  • I don’t allow ropes to directly use Strings. Instead, a StringRope
    is used as a proxy for a String. To use a String directly, I would
    have had to check the class, use respond_to, modify String to look
    like a rope, etc. Instead, I just went with the pure duck-typing
    approach and made multiple Rope-like classes that use different types
    of data. My basis for these is the class ArrayRope. There is no
    reason why a rope data-structure can’t be used with any sequence of
    objects instead of just characters. ArrayRope takes an Array-like
    object. An rope built out of these is to Array as a conventional rope
    is to String. I also added an IORope class to lazily work with files.
    Using IORope you could make a text editor that didn’t have to read
    the whole file in at once. There is no reason you can’t mix and match
    any of these leaf rope classes (depth 0) within a rope tree.

  • #each iterates through elements (i.e. characters) in the rope. It
    annoys me that String#each (and IO#each for that matter) iterates over
    lines - it should be characters (not necessarily bytes). All of the
    Enumerable methods are accessible too.

  • I used #at instead of #index because #index is used for searching in
    String/Array. Array#at is an exact match.

The main thing I didn’t do was implement any regex stuff. I don’t see
how this is doable since all of the regex methods are completely tied
to String (not duck-typed). You’d have to convert the whole rope to
string to do anything (which defeats the purpose of the rope).

On 9/3/07, Eric M. [email protected] wrote:

On 8/31/07, Ruby Q. [email protected] wrote:

This week’s task is to implement the Rope data structure as a Ruby class.

My implementation is attached along with a modified test. I made a
few changes to what was asked, but this also gave more flexibility.

Forgot about the results. Here’s what I found on my machine:

String
Build: 0.120000 0.080000 0.200000 ( 0.206264)
Sort: 1.680000 0.420000 2.100000 ( 2.112934)
VmPeak: 1192112 kB
VmSize: 493708 kB

StringRope
Build: 0.010000 0.000000 0.010000 ( 0.014414)
Sort: 0.150000 0.020000 0.170000 ( 0.176734)
VmPeak: 38940 kB
VmSize: 37920 kB

so, 20X faster for build and 12.4X faster for sort. Memory looks to
be 10-30X smaller. I ran the build and sort 8 times during the
benchmark and took the min. The memory kept increasing for the String
runs for each of these 8 iterations which doesn’t make sense.

I’ve done parts one and two so far. I’ll try to add more in the next
few days.

For simplicity and speed, append and prepend don’t modify the
receiver.

If anyone has any questions about my code, I’ll be happy to answer.

Carl

require “rubygems”
require “facets”
require “kernel/with”
require “symbol/to_proc”

class String
def shift
return nil if empty?
returning self[0].chr do
self[0] = “”
end
end
end

class Rope
attr_reader :left, :right, :length

def Rope.new(*args)
if args.size <= 2
super
else # build balanced tree
mid = args.size / 2
args[mid,2] = Rope.new(*args[mid,2])
Rope.new(*args)
end
end

def initialize(left="",right="")
@left, @right = left, right
@length = @left.length + @right.length
end

def replace(rope)
initialize(rope.left,rope.right)
self
end

clean out empty strings and rebuild tree

def normalize
Rope.new(*flat_strings.reject(&:empty?))
end

def to_s
flat_strings.join(’’)
end

def append(str)
Rope.new(self,str)
end
alias_method :<<, :append

def prepend(str)
Rope.new(str,self)
end

def slice(start,length=@length-start)
if start > left.length # right
right.slice(start-left.length,length-left.length)
elsif left.length < length # left and right
left.slice(start,left.length) +
right.slice(start-left.length,length-left.length)
else # left
left.slice(start,length)
end
end

def shift
if letter = left.shift || right.shift
@length -= 1
letter
else
nil
end
end

def index(letter,start=0)
if start > left.length # right
left.length + right.index(letter,start - left.length)
else # left
left.index(letter,start) || left.length + right.index(letter)
end
rescue
nil
end

TODO implement rest of methods

def method_missing(method, *args, &block)
result = to_s.send(method, *args, &block)
if result.is_a?(String)
if method.to_s =~ /!$/
replace(Rope.new(result))
else
Rope.new(result)
end
else
result
end
end

protected

traverse the tree

def traverse(&block)
@left.is_a?(Rope) ? @left.traverse(&block) : block.call(@left)
@right.is_a?(Rope) ? @right.traverse(&block) : block.call(@right)
end

collect all the flat strings

def flat_strings
returning [] do |ary|
traverse { |str| ary << str }
end
end

end

On 8/31/07, Ruby Q. [email protected] wrote:

by John M.

This week’s task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.

My solution deviates slightly from the problem specification.
The most important difference is that instead of implementing
a binary tree to store the strings, they are all stored in an
array instead.

The class itself is quite long, since I wanted to implement
many of the methods of the built-in String class. Some of the
methods will require significant work to actually implement.
Most notably the Regexp-related methods, since there is no
way to instruct the Regexp matcher to ask for more characters
once it has reached the end of a string.

Actually, all String methods are implemented, by implementing
method missing and delegating to the composed string if the
string class can handle the method. After delegating to the
string class, we convert any String result to a new rope and
we also make sure to replace our content by the string we
delegated to, to make sure that destructive methods works as
expected.

In fact, we replace the content of our rope as soon as to_s
is called. since the reason for lazy concatenation is to
avoid the concatenation cost, we can just as well keep the
concatenated string when we already had to pay the price.

The benchmark results are:

String

Build: 0.170000 0.150000 0.320000 ( 0.324341)
Sort: 3.470000 3.630000 7.100000 ( 7.126741)

ArrayRope

Build: 0.010000 0.010000 0.020000 ( 0.009744)
Sort: 0.130000 0.000000 0.130000 ( 0.138330)

ArrayRope (with dup)

Build: 0.240000 0.160000 0.400000 ( 0.402201)
Sort: 0.160000 0.000000 0.160000 ( 0.163022)

For the unprotected case, sorting was ~52 times faster,
and building was ~33 times faster.

However, since the string class is not immutable, there is a
slight problem with this approach. The strings could added
to the rope could be modified “under the hood”. We can easily
protect against that by making copies of the strings when we
add them to the rope. Since the built-in String shares the
actual data between the two instances (until they change),
this is not so much of a memory issue as one could expect.

By adding dup (in initialize/append/prepend) we end up with
the following times, which trades of some of our speed/memory
for a bit of safety. (This is actually the submitted solution)

Compared with the string approach, building is now (for obvious
reasons) slower than the String, but only about 25%.
Sorting is still much faster than the String case, namely ~44
times as fast.

!g

On Fri, Aug 31, 2007 at 10:19:54PM +0900, Ruby Q. wrote:

by John M.

This week’s task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.

I happened to have implemented ropes in OCaml recently, so I generated a
Ruby
extension using rocaml to see how well it would perform.

Without further ado, here are the results I’m getting for SIZE = 512 *
1024,
CHUNKS = 512:

$ time ruby -r himadri_choudhury.rb bm.rb Rope
Build: 0.130000 0.000000 0.130000 ( 0.129476)
Sort: 10.340000 0.050000 10.390000 ( 10.648223)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
Build: 0.020000 0.000000 0.020000 ( 0.018946)
Sort: 0.100000 0.000000 0.100000 ( 0.108499)

$ ruby eric_mahurin.rb StringRope
[…]
Build: 0.060000 0.000000 0.060000 ( 0.057299)
Sort: 0.870000 0.000000 0.870000 ( 0.896493)

For SIZE = 1024, CHUNKS = 16384:

$ ruby eric_mahurin.rb StringRope
[…]
Build: 3.470000 0.040000 3.510000 ( 3.588875)
Sort: 89.110000 0.700000 89.810000 ( 92.179962)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
[…]
Build: 0.360000 0.000000 0.360000 ( 0.378352)
Sort: 3.940000 0.040000 3.980000 ( 4.079140)

At that point the pure Ruby rope is taking over 6 times more memory than
the OCaml one. I ascribe this to iv_tbls being very heavy and to memory
fragmentation.

I benchmarked Himadri’s implementation first and was surprised by the
exceedingly large speed difference — I expected one, not two orders of
magnitude for this code, as there’s enough Ruby code in common in qsort
to
mask the speed gains in the rope operations. However, Eric’s solution
proved
that it was just due to a slow Ruby implementation.

Here’s the interface definition (extconf.rb):

EXT_NAME = “ocaml_rope”
OCAML_PACKAGES = %w[]
CAML_LIBS = %w[]
CAML_OBJS = %w[]
CAML_FLAGS = “”
CAML_INCLUDES = []

require ‘rocaml’

Interface.generate(“ocaml_rope”) do |iface|
def_class(“Rope”, :under => “OCaml”) do |c|
rope = c.abstract_type

fun "empty", UNIT => rope, :as => "new_empty"
fun "of_string", STRING => rope, :as => "new_from_string"

method "sub", [rope, INT, INT] => rope, :as => "slice"
method "concat", [rope, rope] => rope
method "length", rope => INT
method "get", [rope, INT] => INT
method "to_string", rope => STRING, :as => "to_s"

end
end

require ‘rocaml_extconf’

As you can see, OCaml::Rope is purely functional, and the interface
differs a
bit from that expected by bm.rb (a modified version that works with
immutable
ropes is attached), so I adapted it with the following ocaml_rope.rb,
which
also loads the extension:

module OCaml # Rope will be placed in this module
end

require “ocaml_rope.so”

module OCaml
class Rope
def self.new(str = “”)
case str
when String; new_from_string str
when Rope; str
when “”; new_empty
else new_from_string(str.to_str) rescue new_from_string(str.to_s)
end
end

def prepend(rope)
  rope.append(self)
end

alias_method :append, :concat
alias_method :<<, :append

end
end

The OCaml code is attached, in case anybody wants to look at it.
Incidentally, it weighs a bit under 220 lines, which is also the amount
taken
by Himadri’s and Eric’s solutions. Unlike them, rope.ml features O(1)
concatenation for small elements; this accounts for a large part of the
code
and the complexity of some patterns. O(1) concatenation doesn’t really
affect
performance in the use case exerted by bm.rb anyway.

On Mon, Sep 03, 2007 at 05:52:31PM +0900, Mauricio F. wrote:

I happened to have implemented ropes in OCaml recently, so I generated a Ruby
extension using rocaml to see how well it would perform.

[…]

Sort: 3.940000 0.040000 3.980000 ( 4.079140)
Also for laughs, playing with the GC parameters and with a qsort
implemented
in OCaml:

$ OCAMLRUNPARAM=s=512k ruby -rocaml_rope bm.rb OCaml::Rope
[…]
Build: 0.290000 0.000000 0.290000 ( 0.290908)
Sort: 3.410000 0.040000 3.450000 ( 3.547792)
Sort’: 0.830000 0.000000 0.830000 ( 0.845180)

Yielding the expected >100x gain over Ruby.

The interface part:

method "qsort", [rope, INT] => rope
method "qsort2", [rope, INT] => rope

I implemented the qsort function both in functional and imperative
style, for
the sake of those who prefer something that reads similarly to the Ruby
version. The two variants are equally fast.

let (<<) = concat (* OCaml allows you to define new operators *)
let to_i = int_of_string
let to_s = to_string

let rec qsort’ size = function
Empty -> Empty
| rope ->
let pivot = to_i (to_s (sub 0 8 rope)) in
let len = 8 + size in
let less = ref Empty in
let more = ref Empty in
let off = ref len in
while !off < length rope do
let slice = sub !off len rope in
if to_i (to_s (sub !off 8 rope)) < pivot then
less := !less << slice
else
more := !more << slice;
off := !off + len
done;
qsort’ size !less << sub 0 len rope << qsort’ size !more

let rec qsort size = function
Empty -> Empty
| rope ->
let rec loop r pivot off len max less more =
if off < max then begin
if to_i (to_s (sub off 8 r)) < pivot then
loop r pivot (off+len) len max (less << (sub off len r))
more
else
loop r pivot (off+len) len max less (more << (sub off len
r))
end else (less, more) in

  let pivot = to_i (to_s (sub 0 8 rope)) in
  let len = 8 + size in
  let less, more = loop rope pivot len len (length rope) Empty Empty 

in
qsort size less << sub 0 len rope << qsort size more

Ok, So I’m still trying to figure out what stores the characters in a
Rope. A string or an array?

Judging from the fact that you have ArrayRope, I’m thinking it might
be an Array.

On Sep 3, 2007, at 5:42 AM, Gustav M. wrote:

ArrayRope

Build: 0.010000 0.010000 0.020000 ( 0.009744)
Sort: 0.130000 0.000000 0.130000 ( 0.138330)

ArrayRope (with dup)

Build: 0.240000 0.160000 0.400000 ( 0.402201)
Sort: 0.160000 0.000000 0.160000 ( 0.163022)

Help!
Ari
--------------------------------------------|
If you’re not living on the edge,
then you’re just wasting space.

On 9/3/07, Ari B. [email protected] wrote:

Ok, So I’m still trying to figure out what stores the characters in a
Rope. A string or an array?

Judging from the fact that you have ArrayRope, I’m thinking it might
be an Array.

The actual characters are stored in strings. The rope is an array of
strings,
stored in the @segments instance variable.

!g

Here is my solution. As a side note, it was probably not the best idea
to
stick to the benchmarking code in the quiz, as it forced not the best
design
decisions (I’d rather not have default constructor for Rope, and for
sure -
no normalizing ‘in place’). But - what’s done is done.
For slice variants I’ve decided to have two argument ones in #slice, and
one
arg - in #[]

Results:
String

Build: 0.188000 0.032000 0.220000 ( 0.219000)
Sort: 1.578000 0.843000 2.421000 ( 2.422000)

Rope

Build: 0.031000 0.000000 0.031000 ( 0.031000)
Sort: 0.203000 0.000000 0.203000 ( 0.234000)

Code:

class NilClass
def length; 0; end
end

class String
def shift
return nil if empty?
res=self[0]
self[0]=""
res
end
end

class Rope
attr_reader :length, :left, :right

def initialize(left=nil,right=nil)
@left=left if left
@right=right if right
@length=left.length+right.length
end

def append(what)
len=what.length
if (len>0)
@left=self.dup if @right.length>0
@right=what
@length+=len
end
self
end

alias << append

def prepend(what)
len=what.length
if (len>0)
@right=self.dup if @left.length>0
@left=what
@length+=len
end
self
end

def to_s
@removed_email_address@domain.invalid_s
end

def
return i.match(self.to_s)[0] if i.kind_of? Regexp
if i.kind_of? Range
pos,last=i.first,i.last
pos=@length+pos if pos<0
last=@length+last if last<0
return nil if pos<0 || last<0
return slice(pos,last-pos+1)
end
i=@length+i if i<0
return nil if i<0 || i>@length-1
[email protected]
i<llen ? @left[i] : @right[i-llen]
end

def []=(i,val)
#fixnum only
i=@length+i if i<0
“”[i]=0 if i<0 || i>@length-1
@length+=val.length-1
[email protected]
i<llen ? @left[i]=val : @right[i-llen]=val
end

def slice(pos,len)
return pos.match(self.to_s)[len] if pos.kind_of? Regexp
pos=@length+pos if pos<0
return nil if pos<0 || len<0 || pos>@length-1
[email protected]
return @left.slice(pos,len) if pos+len<=llen
return @right.slice(pos-llen, len) if pos>=llen
Rope.new(@left.slice(pos,len),@right.slice(0,len+pos-llen))
end

def shift
return nil if @length==0
@length-=1
[email protected]>0 ? @left.shift : @right.shift
@left=nil if @left.length==0
@right=nil if @right.length==0
res
end

def normalize
r=Rebalancer.new(@length)
self.traverse { |str| r.append(str) }
@left,@right=r.get_ropes
end

def traverse(&blck)
@left.kind_of?(String) ? yield(@left) : @left.traverse(&blck) if
@left
@right.kind_of?(String) ? yield(@right) : @right.traverse(&blck) if
@right
end

end

class Rebalancer
def initialize len
@limits=[1,2]
@slots=[]
n=2
@limits<<n=@limits[-2]+@limits[-1] while n<len
end

def append str
@slots[0]=@slots[0] ? Rope.new(@slots[0],str) : str
i=0
while @slots[i].length>@limits[i]
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots[i]) :
@slots[i]
@slots[i]=nil
i+=1
end
end

def get_ropes
@slots.compact!
(@slots.length-1).times { |i|
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots[i]) :
@slots[i]
@slots[i]=nil
i+=1
}
[@slots[-1].left,@slots[-1].right]
end
end

On 8/31/07, Ruby Q. [email protected] wrote:

This week’s task is to implement the Rope data structure as a Ruby class.

I modified my implementation a bit more and provided results along
with the other ruby implementations (sorry Mauricio) submitted. The
benchmark test I used is attached. It can run the original build/sort
that assumes mutable ropes and a build/sort that can also be used with
immutable ropes (in addition to mutable ropes). These tests assume
that << can only take another rope. I included some testing to ensure
the results are correct. I also used the linux /proc/$$/status to get
the memory.

Mahurin::StringRope is almost the same as my previous submission. The
main change was handling a boundary case better so I don’t
unecessarily concat an empty rope (a < should have been a <=) - this
almost doubled the performance.

I added the class Mahurin::MutableStringRope which is a wrapper around
an immutable rope. I just reassign the instance variable (@rope) to
make changes. I implemented a bunch of String/Array mutable methods
in this class. This wrapper class hurt performance much more than I
expected (double the run-time).

I also tried out not auto-balancing and using an explicit normalize
(Mahurin::DenormalStringRope). This gave much faster build time as
expected, but the sort time slowed down just as much. I guess for
this random data set (I use a fixed seed), qsort doesn’t keep the tree
balanced (pivot doesn’t necessarily partition equally). The larger
depth for the non-auto-balanced rope hurts the slice time. I think
biting the bullet for auto-balancing is the better way to go.

I added a subclass for handling flattening concatenations of short
strings (Mahurin::ShortStringRope) just to be complete. It isn’t
useful in this benchmark, but also doesn’t hurt much (within the 0.01
second error margin).

CPU(user+sys,sec) mem(peak,MB)


build sort total build sort class


0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope *
0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

CPU : minimum from 40 iterations
mem : peak over the 40 iterations

  • : self-checking failed
  • : immutable benchmark needed

Eric, thanks for taking the time to run benchmarks for everyone’s
solutions. It clears things up when they’re run all on the same
machine.

Carl

“Eric M.” wrote in message
news:[email protected]

0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

CPU : minimum from 40 iterations
mem : peak over the 40 iterations

  • : self-checking failed
  • : immutable benchmark needed

Eric, thanks a lot for your test, it helped me to find a bug in my
solution.
The only thing I do not understand - you’ve changed the game rules on
flight
by making “data = data.normalize” instead of “data.normalize”. I’m
wondering
how could you get sorting time > 0 for my solution (as it played by old
rules and would give you nothing for such assignment)? :slight_smile:

Anyway, I’m posting a fixed solution (for the bug and for your test) in
the
next message, and I’d appreciate if you can re-run it

– EK

On Sep 3, 2007, at 7:48 PM, Eric M. wrote:

0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

CPU : minimum from 40 iterations
mem : peak over the 40 iterations

enough of this nonsense already! where is the gem! :wink:

a @ http://drawohara.com/

Fixes for previously posted solution:

  • empty node guard in slice
  • normalize returning self for Eric’s test

class NilClass
def length; 0; end
end

class String
def shift
return nil if empty?
res=self[0]
self[0]=""
res
end
end

class Rope
attr_reader :length, :left, :right

def initialize(left=nil,right=nil)
@left=left if left
@right=right if right
@length=left.length+right.length
end

def append(what)
len=what.length
if (len>0)
@left=self.dup if @right.length>0
@right=what
@length+=len
end
self
end

alias << append

def prepend(what)
len=what.length
if (len>0)
@right=self.dup if @left.length>0
@left=what
@length+=len
end
self
end

def to_s
@left.to_s + @right.to_s
end

def
return i.match(self.to_s)[0] if i.kind_of? Regexp
if i.kind_of? Range
pos,last=i.first,i.last
pos = @length+pos if pos<0
last = @length+last if last<0
return nil if pos<0 || last<0
return slice(pos,last-pos+1)
end
i = @length+i if i<0
return nil if i<0 || i>@length-1
llen = @left.length
i<llen ? @left[i] : @right[i-llen]
end

def []=(i,val)
#fixnum only
i = @length+i if i<0
“”[i]=0 if i<0 || i> @length-1
@length+=val.length-1
llen = @left.length
i<llen ? @left[i]=val : @right[i-llen]=val
end

def slice(pos,len)
return pos.match(self.to_s)[len] if pos.kind_of? Regexp
pos = @length+pos if pos<0
return nil if pos<0 || len<0 || pos>@length-1
llen = @left.length
return @left.slice(pos,len) if pos+len<=llen || ! @right
return @right.slice(pos-llen, len) if pos>=llen
Rope.new(@left.slice(pos,llen-pos),@right.slice(0,len+pos-llen))
end

def shift
return nil if @length==0
@length-=1
res = @left.length>0 ? @left.shift : @right.shift
@left=nil if @left.length==0
@right=nil if @right.length==0
res
end

def normalize
r=Rebalancer.new(@length)
self.traverse { |str| r.append(str) }
@left, @right=r.get_ropes
self
end

def traverse(&blck)
@left.kind_of?(String) ? yield( @left) : @left.traverse(&blck) if
@left
@right.kind_of?(String) ? yield( @right) : @right.traverse(&blck) if
@right
end

end

class Rebalancer
def initialize len
@limits=[1,2]
@slots=[]
n=2
@limits<< n = @limits[-2] + @limits[-1] while n<len
end

def append str
@slots[0] = @slots[0] ? Rope.new( @slots[0],str) : str
i=0
while @slots[i].length>@limits[i]
@slots[i+1] = @slots[i+1] ? Rope.new( @slots[i+1],@slots[i]) :
@slots[i]
@slots[i] = nil
i+=1
end
end

def get_ropes
@slots.compact!
(@slots.length-1).times { |i|
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots[i]) :
@slots[i]
@slots[i]=nil
i+=1
}
[@slots[-1].left,@slots[-1].right]
end
end

0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

Thanks for running the tests! I do think it is a shame, however,
that the table does not include the notion of safety that I mentioned
in my submission. If any of these classes should every go into a
gem as ara mentioned, the user of that gem at least ought to be
able to choose the safe version.

Here is a very simple scenario where such a problem is introduced.
One could probably do worse changes to the source strings, but this
was the first example I came up with.

ss = %w[test append]
r1, r2 = ss.map {|x| Mahurin::StringRope.new(x) }
r1 <<= r2
ss.first << “ing”
r1.length # => 10
r1.to_s.length # => 13

I have only tested Mahurin::StringRope, but would like to know for
which other classes I need to be extra careful when adding strings
to the rope. I did find, however, that by removing my “safety dups”,
the combined running time (on my machine) is about the same for
my solution as for Mahurin::StringRope.

!g