Which library to write a parser

Hi,

I need to write an easy parser and i don’t know if i can use an existing
library.

Thanks

thomas carlier писал 16.01.2012 11:48:

Hi,

I need to write an easy parser and i don’t know if i can use an
existing
library.

Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

Try Treetop:
http://treetop.rubyforge.org/

And then try parslet:
http://kschiess.github.com/parslet/

k

Peter Z. wrote in post #1041059:

thomas carlier писал 16.01.2012 11:48:

Hi,

I need to write an easy parser and i don’t know if i can use an
existing
library.

Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

Thanks and nice blog, very useful informations.
I’ll try treetop

Gotta give a nod to kpeg:

Tony A. писал 16.01.2012 12:47:

And then try parslet:

http://kschiess.github.com/**parslet/http://kschiess.github.com/parslet/

k

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail?
Just
wondering.

Kaspar S. wrote in post #1041114:

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

As one of the authors of one of these libraries which are slow as a
dying snail, I am wondering: How do you measure? Would you like to
contribute your benchmark, so that our (quite extensive) optimization
efforts can go your way? And finally, what is the ground speed of a
dying snail?

My measurements (to contribute to the thread as well) are here:
press play on tape – Parslet and its friends

k

Hi,

Which one will you recommend for a simple grammar like

expressions : expression+
;
expression : [{]content+[}]
;
content : token
| TOKEN[|]content
|TOKEN[|]expression
;
TOKEN : .*
;

The grammar is not correct, but you’ll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars

On Jan 17, 2012, at 19:18 , thomas carlier wrote:

TOKEN : .*
;

The grammar is not correct, but you’ll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars

Write your own by hand. It’ll be faster and smaller than anything
mentioned above.

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

On 18.01.12 05:07, Ryan D. wrote:

Write your own by hand. It’ll be faster

And for another definition of faster (faster to code): parslet.

greetings,
k

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail? Just
wondering.

As one of the authors of one of these libraries which are slow as a
dying snail, I am wondering: How do you measure? Would you like to
contribute your benchmark, so that our (quite extensive) optimization
efforts can go your way? And finally, what is the ground speed of a
dying snail?

My measurements (to contribute to the thread as well) are here:
http://blog.absurd.li/2011/02/02/parslet_and_its_friends.html

k

Kaspar S. wrote in post #1042425:

On 18.01.12 05:07, Ryan D. wrote:

Write your own by hand. It’ll be faster

And for another definition of faster (faster to code): parslet.

greetings,
k
Hello,

I tried out parslet today, and I really appreciate its design (i.e.,
what parsers look like) as well as the beautiful webpage you built.

I could not use it for my current project, however, as it was way to
slow. I need to be able to parse millions of constraint specifications,
each based on a fairly simple grammar (20 rules or so). I found my
original implementation (regular expression matching and scanning) to
be ugly, inefficient (scanning a string a few times) and rather
slow (~15K constraints per second). The goal was to get both more
beautiful and faster by using a parser framework. It got more beautiful,
but too slow to be useful to me.

If I had benchmarked the MiniP parser posted on the homepage of parslet,
it would have been obvious that parslet is too slow for my purposes: On
my machine, it parses 180 lines / second, given the test string
“puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 11, 2, 3, 5, 7, 11,19)”
So while I appreciate the neat interface of those fancy new parsers,
people should know that they are slow.

But, I think it would be awesome to have a beautiful parser lib that is
fast, and given that the performance of the regular expression engine is
quite decent, it should be feasible to build such a parser.
a) Do you think there is any chance to get a faster (say, factor 100)
implementation for parslet with the same (or a similar) interface?
b) If not, is there any maintained ruby parsing library or parser
generator (no need to be in pure ruby) which is fast enough? How fast is
antlr for ruby?

Kind Regards,
Benedikt

Hello again,
I found one combinator-based parser library which seems to provide quite
decent performance:

rsec-ext (http://rsec.heroku.com/)

Here is the code implementing parslet’s MiniP parser:

require ‘rubygems’
require “rsec”
include Rsec::Helpers

id = /[a-z]+/.r.fail ‘id’
int = /[0-9]+/.r.fail ‘int’
op = one_of_(‘+’)
comma = /\s*,\s*/.r
sum = seq(int, op, lazy{expr})
arglist= ‘(’.r >> lazy{expr}.join(comma).even << ‘)’
funcall= seq_(id, arglist)
expr = funcall | sum | int
parser = expr.eof

File.readlines(ARGV.first).each { |line| parser.parse!(line) }

I do not claim that the parsers are equivalent (different datastructures
for the result), and so the comparison is a little bit unfair and only
shows a trend. I’d like to share it anyway:
Parsing 10000 lines, each containing
puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 23532 + 11, 2, 3, 5, 7, 11,
19)
the parsers need:

parslet / ruby 1.8: 162.8s
parslet / ruby 1.9: 49.5s
rsec-ext / ruby 1.9: 0.7s

Kind Regards,
Benedikt

On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

For heavy lifting, there’s always Racc. ruby_parser uses it, and it’s
pretty fast.

Racc
racc | RubyGems.org | your community gem host
File: README — Documentation for racc (1.4.7)

I did, but with all the complexity of ruby_parser, not the grammar in
this thread. I did both the 10k testcase above as well as 10k lines of
puts(2 + 3) on both ruby 1.8 and 1.9:

1.8:

116.05s: 86.17 l/s: 6.23 Kb/s: 722 Kb:10000
loc:…/dev/blah1_10k.rb
8.68s: 1152.15 l/s: 13.50 Kb/s: 117 Kb:10000
loc:…/dev/blah2_10k.rb

1.9:

84.80s: 117.92 l/s: 8.52 Kb/s: 722 Kb:10000
loc:…/dev/blah1_10k.rb
5.48s: 1825.23 l/s: 21.39 Kb/s: 117 Kb:10000
loc:…/dev/blah2_10k.rb

Not an entirely fair comparison by using ruby_parser instead of an
incredibly restrained grammar… but there you have it.

That said, I will say that I only barely tolerate LR based parser
generators. I would love to have a fully conformant LL-based parser for
ruby. I’m not convinced it is possible as ruby’s grammar is seriously
fucked up.

Hi Thomas,

I use ANTLR to generate C++ code which I subsequently extend into Ruby.
It’d be absolute overkill if you’re not already extending/embedding for
your project though!

I believe ANTLR is LL(*). It also does lexers.

There is apparently a Ruby target for ANTLR as well. I’ve never tried it
myself.

You’ll need Java to build, but not to run- that’s the case in my current
project but I’m not using the latest ANTLR.

Some links:

http://www.antlr.org/

http://www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget

Another thing to explore, maybe. :slight_smile:

Good luck!

Garth

PS. If someone has already suggested ANTLR, my apologies. I skimmed the
replied but it was hardly a detailed search.

For heavy lifting, there’s always Racc. ruby_parser uses it, and it’s
pretty fast.

http://i.loveruby.net/en/projects/racc/
http://rubygems.org/gems/racc
http://rubydoc.info/gems/racc/1.4.7/frames

(Didn’t do benchmarks.)

– Matma R.

Ryan D. wrote in post #1043324:

On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

For heavy lifting, there’s always Racc. ruby_parser uses it, and it’s
pretty fast.

Racc
racc | RubyGems.org | your community gem host
File: README — Documentation for racc (1.4.7)
I did, but with all the complexity of ruby_parser, not the grammar in
this thread.
Thanks for the numbers. I wrote a small grammar for the MiniP language
in racc, and repeated the experiments (on a different machine, ruby
1.9.3). racc’s speed is ok, but it seems to be slower than rsec-ext.

parslet
53.10s
racc
2.05s
rsec-ext
0.72s

That said, I will say that I only barely tolerate LR based parser
generators. I would love to have a fully conformant LL-based parser for
ruby.
I believe both racc and ANTLR won’t be faster than rsec-ext, as they
generate ruby code. For many less-convoluted grammars (ruby is not a
good example ;)), a PEG-style parser library is a good and pleasant to
use alternative to a parser generator.

For my prototype, I rewrote a constraint file parser to rsec-ext,
which works great. I can’t use it at the moment, because it is 1.9
only, but that’s a different story.

Kind Regards,
Benedikt