Rope Data Structures

Greetings All,

I have been plugging away a a parser for ruby Endo’s DNA from this years
International Functional Programming Completion.
(http://www.icfpcontest.org/) Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms. Many of the
successful teems ended up using the ‘rope’ from the C++ Standard
Template Library.

please see:



http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to 48.5
iterations per second – to successfully solve the problem I estimated I
would need ~2000 iterations per second)

I learned several things from this attempt:

  • Avoid creating edge cases they are bad for business and my poor rope
    implementation had way too many of them.
  • Working with long strings is a slow process (endo’s dna is a string
    ~7.5M chars long)
  • For long running programs calling GC.start at points where the number
    of living object is at a minimum will save time and Hard Drive actuator
    motors.
  • The folks who wrote the ruby-prof gem deserve an award for making an
    excelent tool.

One thing that got me thinking however was that even my poorly
implemented pure Ruby version of ropes gave me an order of magnitude
extra speed. It made me wonder: How hard would it be to implement ropes
as a compiled binary library for ruby, and how much would using such a
library (even a pure ruby version) speed up rails, which does a lot of
string concatenation?

On Aug 22, 2007, at 2:03 PM, John M. wrote:

Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms.

We used Ruby and didn’t switch, but the code was slow.

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to
48.5
iterations per second – to successfully solve the problem I
estimated I
would need ~2000 iterations per second)

I debated for some time about making this task, building a rope for
Ruby, a Ruby Q… Can I ask how long it took you? Also, just as a
rough metric, how many lines of code did you write?

James Edward G. II

James G. wrote:

On Aug 22, 2007, at 2:03 PM, John M. wrote:

Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms.

We used Ruby and didn’t switch, but the code was slow.

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to
48.5
iterations per second – to successfully solve the problem I
estimated I
would need ~2000 iterations per second)

I debated for some time about making this task, building a rope for
Ruby, a Ruby Q… Can I ask how long it took you? Also, just as a
rough metric, how many lines of code did you write?

James Edward G. II

It took about a week of causal programming to get a String version
running with no RNA output. To write writing a Rope class was another
three days of sloppy programming. Replacing String with Rope took about
2 hours. I then spent more time then I can recall fixing edge cases in
the Rope class. There are still a few loose ends, and the code has
become so full of edge case checks that it really isn’t as fast as it
should be.

Looking only at the first 10,000 iterations, the fast majority of
instructions were copying huge swaths of the original dna. I decided
that a node should be able to represent a ‘slice’ of another node
without copying when these slices start having to span concatenations
life gets messy. When I added a pop command that could make the left
branch unused things got even more messy. When I wanted a << operator
that would add characters directly to the end of ‘short’ node things
became atrocious. And when finally I required the Environment DNA from
matchreplace(where my code spent 94.2% of it’s time) not create new
copies of the Ropes it was holding when it was perpended to the dna the
code pretty much buckled under its own weight. At the moment there is a
bug in the normalization code that is introducing errors. Without being
able to renormalizes the data structure on th rope gets too deep and
everything grinds to a halt.

Lesson: Keep you data structures simple and elegant.

My rope implementation was initially ~120 (source file) lines. Now that
it is doing all this stuff it is 366. 44 of that is for a kmp_search
algorithm.

I would love to see Ropes as a Ruby quiz.

John M.

Just a musing on this, but if we want a fast and scalable rope structure
in Ruby (and speed and scalability are the reasons for using a rope, I
believe?), wouldn’t it be better (though not quite so much fun) to
simply wrap a c / c++ rope library, rather than implement it from
scratch in Ruby?

Perhaps it’d even be possible to get Ruby’s string to switch internal
implementation using some huristics about its size.

Cheers,
Benjohn

On Behalf Of John M.:

I tried writing a rope library in ruby and got an order of magnitude

better performance when compared to a String implementation (6.5 to

48.5

wow, that’s quite a helpful boost. specially since a large percentage of
the web and database time is spent on string manipulation. post your
work as a gem, pls.

kind regards -botp

On Aug 22, 2007, at 9:59 PM, John M. wrote:

James Edward G. II

It took about a week of causal programming to get a String version
running with no RNA output.

Thanks for all of the information.

I would love to see Ropes as a Ruby quiz.

Contact me off-list if you are interested in helping me put it together.

James Edward G. II

Just a musing on this, but if we want a fast and scalable rope structure
in Ruby (and speed and scalability are the reasons for using a rope, I
believe?), wouldn’t it be better (though not quite so much fun) to
simply wrap a c / c++ rope library, rather than implement it from
scratch in Ruby?

Perhaps it’d even be possible to get Ruby’s string to switch internal
implementation using some huristics about its size.

Cheers,
Benjohn

I agree that for something like this a binary distribution is best, but
they still cause problems in cross platform environments. (eg. JRuby) I
think the best solutions are 1.) Have Multiple Gem distros (java,
mswin32, source) and include a pure ruby version. Or 2.) integrate this
into the interpreter core as another built in class.

Peña, Botp wrote:

On Behalf Of John M.:

I tried writing a rope library in ruby and got an order of magnitude

better performance when compared to a String implementation (6.5 to

48.5

wow, that’s quite a helpful boost. specially since a large percentage of
the web and database time is spent on string manipulation. post your
work as a gem, pls.

kind regards -botp

The ICFP task was written in such a way as to make maximum use of Ropes.
I think that processing eRB template would also see pretty good gains.
However, one must be careful to use Ropes only when string concatenation
and slicing (Ropes do these in O(1) ) are much more common then
accessing the string content (O(log(n)). An application where a string
is build once and the accessed repeatedly will have poorer performance
if it is converted to a rope. On the other hand a string that is
repeatedly manipulated before finally being printed a single time will
see performance gains proportional to the average length of the
String/Rope.

Databases do spend lots of time with strings, but there are many more
accesses to strings then changes. Beyond that most of the strings that
databases deal with are “short.”

I will see what I can do about getting out some sample code.

John M.

Mauricio F., the author of rocaml, is working on a ropes
implementation called oropes, cf. http://eigenclass.org/repos/oropes/
and
http://eigenclass.org/repos/oropes/head/doc/Rope.html . For a discussion
see “Ropes and rope-like functional extensible vectors with O(1)
prepend/append”,
http://groups.google.com/group/fa.caml/browse_thread/thread/4e70beff0f714229
.

Cheers,

j.k.

John M. wrote:

please see:

http://en.wikipedia.org/wiki/Rope_(computer_science)
http://www.sgi.com/tech/stl/ropeimpl.html
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to 48.5
iterations per second – to successfully solve the problem I estimated I
would need ~2000 iterations per second)

Did you ever publish this?

Regards,

Dan

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs