Which library to write a parser

luislavena · January 16, 2012, 8:48am

Hi,

I need to write an easy parser and i don’t know if i can use an existing
library.

Thanks

tamc · January 16, 2012, 9:07am

thomas carlier писал 16.01.2012 11:48:

Hi,

I need to write an easy parser and i don’t know if i can use an
existing
library.

Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

tamc · January 16, 2012, 9:34am

Try Treetop:
http://treetop.rubyforge.org/

And then try parslet:
http://kschiess.github.com/parslet/

k

tamc · January 16, 2012, 10:40am

Peter Z. wrote in post #1041059:

thomas carlier писал 16.01.2012 11:48:

Hi,

I need to write an easy parser and i don’t know if i can use an
existing
library.

Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

Thanks and nice blog, very useful informations.
I’ll try treetop

tamc · January 16, 2012, 9:48am

Gotta give a nod to kpeg:

tamc · January 16, 2012, 11:32am

Tony A. писал 16.01.2012 12:47:

And then try parslet:

http://kschiess.github.com/**parslet/http://kschiess.github.com/parslet/

k

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail?
Just
wondering.

tamc · January 18, 2012, 4:18am

Kaspar S. wrote in post #1041114:

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

As one of the authors of one of these libraries which are slow as a
dying snail, I am wondering: How do you measure? Would you like to
contribute your benchmark, so that our (quite extensive) optimization
efforts can go your way? And finally, what is the ground speed of a
dying snail?

My measurements (to contribute to the thread as well) are here:
press play on tape – Parslet and its friends

k

Hi,

Which one will you recommend for a simple grammar like

expressions : expression+
;
expression : [{]content+[}]
;
content : token
| TOKEN[|]content
|TOKEN[|]expression
;
TOKEN : .*
;

The grammar is not correct, but you’ll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars

tamc · January 18, 2012, 5:07am

On Jan 17, 2012, at 19:18 , thomas carlier wrote:

TOKEN : .*
;

The grammar is not correct, but you’ll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars

Write your own by hand. It’ll be faster and smaller than anything
mentioned above.

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

tamc · January 25, 2012, 9:21am

On 18.01.12 05:07, Ryan D. wrote:

Write your own by hand. It’ll be faster

And for another definition of faster (faster to code): parslet.

greetings,
k

tamc · January 16, 2012, 4:59pm

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

As one of the authors of one of these libraries which are slow as a
dying snail, I am wondering: How do you measure? Would you like to
contribute your benchmark, so that our (quite extensive) optimization
efforts can go your way? And finally, what is the ground speed of a
dying snail?

My measurements (to contribute to the thread as well) are here:
http://blog.absurd.li/2011/02/02/parslet_and_its_friends.html

k

tamc · January 30, 2012, 11:58pm

Kaspar S. wrote in post #1042425:

On 18.01.12 05:07, Ryan D. wrote:

Write your own by hand. It’ll be faster

And for another definition of faster (faster to code): parslet.

greetings,
k
Hello,

I tried out parslet today, and I really appreciate its design (i.e.,
what parsers look like) as well as the beautiful webpage you built.

I could not use it for my current project, however, as it was way to
slow. I need to be able to parse millions of constraint specifications,
each based on a fairly simple grammar (20 rules or so). I found my
original implementation (regular expression matching and scanning) to
be ugly, inefficient (scanning a string a few times) and rather
slow (~15K constraints per second). The goal was to get both more
beautiful and faster by using a parser framework. It got more beautiful,
but too slow to be useful to me.

If I had benchmarked the MiniP parser posted on the homepage of parslet,
it would have been obvious that parslet is too slow for my purposes: On
my machine, it parses 180 lines / second, given the test string
“puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 11, 2, 3, 5, 7, 11,19)”
So while I appreciate the neat interface of those fancy new parsers,
people should know that they are slow.

But, I think it would be awesome to have a beautiful parser lib that is
fast, and given that the performance of the regular expression engine is
quite decent, it should be feasible to build such a parser.
a) Do you think there is any chance to get a faster (say, factor 100)
implementation for parslet with the same (or a similar) interface?
b) If not, is there any maintained ruby parsing library or parser
generator (no need to be in pure ruby) which is fast enough? How fast is
antlr for ruby?

Kind Regards,
Benedikt

tamc · January 31, 2012, 1:05pm

Hello again,
I found one combinator-based parser library which seems to provide quite
decent performance:

rsec-ext (http://rsec.heroku.com/)

Here is the code implementing parslet’s MiniP parser:

require ‘rubygems’
require “rsec”
include Rsec::Helpers

id = /[a-z]+/.r.fail ‘id’
int = /[0-9]+/.r.fail ‘int’
op = one_of_(‘+’)
comma = /\s*,\s*/.r
sum = seq(int, op, lazy{expr})
arglist= ‘(’.r >> lazy{expr}.join(comma).even << ‘)’
funcall= seq_(id, arglist)
expr = funcall | sum | int
parser = expr.eof

File.readlines(ARGV.first).each { |line| parser.parse!(line) }

I do not claim that the parsers are equivalent (different datastructures
for the result), and so the comparison is a little bit unfair and only
shows a trend. I’d like to share it anyway:
Parsing 10000 lines, each containing
puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 23532 + 11, 2, 3, 5, 7, 11,
19)
the parsers need:

parslet / ruby 1.8: 162.8s
parslet / ruby 1.9: 49.5s
rsec-ext / ruby 1.9: 0.7s

Kind Regards,
Benedikt

tamc · January 31, 2012, 11:24pm

On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

For heavy lifting, there’s always Racc. ruby_parser uses it, and it’s
pretty fast.

Racc
racc | RubyGems.org | your community gem host
File: README — Documentation for racc (1.4.7)

I did, but with all the complexity of ruby_parser, not the grammar in
this thread. I did both the 10k testcase above as well as 10k lines of
puts(2 + 3) on both ruby 1.8 and 1.9:

1.8:

116.05s: 86.17 l/s: 6.23 Kb/s: 722 Kb:10000
loc:…/dev/blah1_10k.rb
8.68s: 1152.15 l/s: 13.50 Kb/s: 117 Kb:10000
loc:…/dev/blah2_10k.rb

1.9:

84.80s: 117.92 l/s: 8.52 Kb/s: 722 Kb:10000
loc:…/dev/blah1_10k.rb
5.48s: 1825.23 l/s: 21.39 Kb/s: 117 Kb:10000
loc:…/dev/blah2_10k.rb

Not an entirely fair comparison by using ruby_parser instead of an
incredibly restrained grammar… but there you have it.

That said, I will say that I only barely tolerate LR based parser
generators. I would love to have a fully conformant LL-based parser for
ruby. I’m not convinced it is possible as ruby’s grammar is seriously
fucked up.

tamc · February 1, 2012, 12:48am

Hi Thomas,

I use ANTLR to generate C++ code which I subsequently extend into Ruby.
It’d be absolute overkill if you’re not already extending/embedding for
your project though!

I believe ANTLR is LL(*). It also does lexers.

There is apparently a Ruby target for ANTLR as well. I’ve never tried it
myself.

You’ll need Java to build, but not to run- that’s the case in my current
project but I’m not using the latest ANTLR.

Some links:

http://www.antlr.org/

http://www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget

Another thing to explore, maybe.

Good luck!

Garth

PS. If someone has already suggested ANTLR, my apologies. I skimmed the
replied but it was hardly a detailed search.

tamc · January 31, 2012, 6:31pm

For heavy lifting, there’s always Racc. ruby_parser uses it, and it’s
pretty fast.

http://i.loveruby.net/en/projects/racc/
http://rubygems.org/gems/racc
http://rubydoc.info/gems/racc/1.4.7/frames

(Didn’t do benchmarks.)

– Matma R.

tamc · February 1, 2012, 1:57am

Ryan D. wrote in post #1043324:

On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

For heavy lifting, there’s always Racc. ruby_parser uses it, and it’s
pretty fast.

Racc
racc | RubyGems.org | your community gem host
File: README — Documentation for racc (1.4.7)
I did, but with all the complexity of ruby_parser, not the grammar in
this thread.
Thanks for the numbers. I wrote a small grammar for the MiniP language
in racc, and repeated the experiments (on a different machine, ruby
1.9.3). racc’s speed is ok, but it seems to be slower than rsec-ext.

parslet
53.10s
racc
2.05s
rsec-ext
0.72s

That said, I will say that I only barely tolerate LR based parser
generators. I would love to have a fully conformant LL-based parser for
ruby.
I believe both racc and ANTLR won’t be faster than rsec-ext, as they
generate ruby code. For many less-convoluted grammars (ruby is not a
good example ;)), a PEG-style parser library is a good and pleasant to
use alternative to a parser generator.

For my prototype, I rewrote a constraint file parser to rsec-ext,
which works great. I can’t use it at the moment, because it is 1.9
only, but that’s a different story.

Kind Regards,
Benedikt