Forum: Ruby Which library to write a parser

Posted by thomas carlier (tomii)
on 2012-01-16 08:48
Hi,

I need to write an easy parser and i don't know if i can use an existing
library.

Thanks
Posted by Peter Zotov (Guest)
on 2012-01-16 09:07
(Received via mailing list)
thomas carlier писал 16.01.2012 11:48:
> Hi,
>
> I need to write an easy parser and i don't know if i can use an
> existing
> library.
>
> Thanks

Try Treetop:
http://treetop.rubyforge.org/

You may find one of my blog posts useful:
http://whitequark.org/blog/2011/09/08/treetop-typical-errors/
Posted by Kaspar Schiess (kaspar)
on 2012-01-16 09:34
(Received via mailing list)
> Try Treetop:
> http://treetop.rubyforge.org/

And then try parslet:
http://kschiess.github.com/parslet/

k
Posted by Tony Arcieri (Guest)
on 2012-01-16 09:48
(Received via mailing list)
Gotta give a nod to kpeg:

https://github.com/evanphx/kpeg
Posted by thomas carlier (tomii)
on 2012-01-16 10:40
Peter Zotov wrote in post #1041059:
> thomas carlier писал 16.01.2012 11:48:
>> Hi,
>>
>> I need to write an easy parser and i don't know if i can use an
>> existing
>> library.
>>
>> Thanks
>
> Try Treetop:
> http://treetop.rubyforge.org/
>
> You may find one of my blog posts useful:
> http://whitequark.org/blog/2011/09/08/treetop-typical-errors/

Thanks and nice blog, very useful informations.
I'll try treetop
Posted by Peter Zotov (Guest)
on 2012-01-16 11:32
(Received via mailing list)
Tony Arcieri писал 16.01.2012 12:47:
>>>
>>
>> And then try parslet:
>>
>> http://kschiess.github.com/**parslet/<http://kschiess.github.com/parslet/>
>>
>> k

Is there a PEG parser around here which does not keep all the symbols
each in
its own node? Or, maybe, any one which is faster than a dying snail?
Just
wondering.
Posted by Kaspar Schiess (kaspar)
on 2012-01-16 16:59
(Received via mailing list)
> Is there a PEG parser around here which does not keep all the symbols
> each in
> its own node? Or, maybe, any one which is faster than a dying snail? Just
> wondering.

As one of the authors of one of these libraries which are slow as a
dying snail, I am wondering: How do you measure? Would you like to
contribute your benchmark, so that our (quite extensive) optimization
efforts can go your way? And finally, what is the ground speed of a
dying snail?

My measurements (to contribute to the thread as well) are here:
http://blog.absurd.li/2011/02/02/parslet_and_its_friends.html

k
Posted by thomas carlier (tomii)
on 2012-01-18 04:18
Kaspar Schiess wrote in post #1041114:
>> Is there a PEG parser around here which does not keep all the symbols
>> each in
>> its own node? Or, maybe, any one which is faster than a dying snail? Just
>> wondering.
>
> As one of the authors of one of these libraries which are slow as a
> dying snail, I am wondering: How do you measure? Would you like to
> contribute your benchmark, so that our (quite extensive) optimization
> efforts can go your way? And finally, what is the ground speed of a
> dying snail?
>
> My measurements (to contribute to the thread as well) are here:
> http://blog.absurd.li/2011/02/02/parslet_and_its_friends.html
>
> k

Hi,

Which one will you recommend for a simple grammar like

expressions : expression+
;
expression : [{]content+[}]
;
content : token
              | TOKEN[|]content
              |TOKEN[|]expression
;
TOKEN : .*
;

The grammar is not correct, but you'll understand the main idea

Example : some text {text {text|text}} some text {text|text|text}

input size [100 - 5000] chars
Posted by Ryan Davis (Guest)
on 2012-01-18 05:07
(Received via mailing list)
On Jan 17, 2012, at 19:18 , thomas carlier wrote:

> TOKEN : .*
> ;
>
> The grammar is not correct, but you'll understand the main idea
>
> Example : some text {text {text|text}} some text {text|text|text}
>
> input size [100 - 5000] chars

Write your own by hand. It'll be faster and smaller than anything 
mentioned above.

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf
Posted by Kaspar Schiess (kaspar)
on 2012-01-25 09:21
(Received via mailing list)
On 18.01.12 05:07, Ryan Davis wrote:
> Write your own by hand. It'll be faster

And for another definition of faster (faster to code): parslet.

greetings,
k
Posted by Benedikt Huber (visq)
on 2012-01-30 23:58
Kaspar Schiess wrote in post #1042425:
> On 18.01.12 05:07, Ryan Davis wrote:
>> Write your own by hand. It'll be faster
>
> And for another definition of faster (faster to code): parslet.
>
> greetings,
> k
Hello,

I tried out parslet today, and I really appreciate its design (i.e.,
what parsers look like) as well as the beautiful webpage you built.

I could not use it for my current project, however, as it was way to
slow. I need to be able to parse millions of constraint specifications,
each based on a fairly simple grammar (20 rules or so). I found my
original implementation (regular expression matching and scanning) to
be ugly, inefficient (scanning a string a few times) and rather
slow (~15K constraints per second). The goal was to get both more
beautiful and faster by using a parser framework. It got more beautiful,
but too slow to be useful to me.

If I had benchmarked the MiniP parser posted on the homepage of parslet,
it would have been obvious that parslet is too slow for my purposes: On
my machine, it parses 180 lines / second, given the test string
  "puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 11, 2, 3, 5, 7, 11,19)"
So while I appreciate the neat interface of those fancy new parsers,
people should know that they are slow.

But, I think it would be awesome to have a beautiful parser lib that is
fast, and given that the performance of the regular expression engine is
quite decent, it should be feasible to build such a parser.
a) Do you think there is any chance to get a faster (say, factor 100)
implementation for parslet with the same (or a similar) interface?
b) If not, is there any maintained ruby parsing library or parser
generator (no need to be in pure ruby) which is fast enough? How fast is
antlr for ruby?

Kind Regards,
Benedikt
Posted by Benedikt Huber (visq)
on 2012-01-31 13:05
Hello again,
I found one combinator-based parser library which seems to provide quite
decent performance:

  rsec-ext (http://rsec.heroku.com/)

Here is the code implementing parslet's MiniP parser:

  require 'rubygems'
  require "rsec"
  include Rsec::Helpers

  id     = /[a-z]+/.r.fail 'id'
  int    = /[0-9]+/.r.fail 'int'
  op     = one_of_('+')
  comma  = /\s*,\s*/.r
  sum    = seq(int, op, lazy{expr})
  arglist= '('.r >> lazy{expr}.join(comma).even << ')'
  funcall= seq_(id, arglist)
  expr   = funcall | sum | int
  parser = expr.eof

  File.readlines(ARGV.first).each { |line| parser.parse!(line) }

I do not claim that the parsers are equivalent (different datastructures
for the result), and so the comparison is a little bit unfair and only
shows a trend. I'd like to share it anyway:
Parsing 10000 lines, each containing
  puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 23532 + 11, 2, 3, 5, 7, 11,
19)
the parsers need:

  parslet / ruby 1.8: 162.8s
  parslet / ruby 1.9: 49.5s
  rsec-ext / ruby 1.9: 0.7s


Kind Regards,
Benedikt
Posted by Bartosz Dziewoński (matmarex)
on 2012-01-31 18:31
(Received via mailing list)
For heavy lifting, there's always Racc. ruby_parser uses it, and it's
pretty fast.

http://i.loveruby.net/en/projects/racc/
http://rubygems.org/gems/racc
http://rubydoc.info/gems/racc/1.4.7/frames

(Didn't do benchmarks.)

-- Matma Rex
Posted by Ryan Davis (Guest)
on 2012-01-31 23:24
(Received via mailing list)
On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:

> For heavy lifting, there's always Racc. ruby_parser uses it, and it's
> pretty fast.
>
> http://i.loveruby.net/en/projects/racc/
> http://rubygems.org/gems/racc
> http://rubydoc.info/gems/racc/1.4.7/frames

I did, but with all the complexity of ruby_parser, not the grammar in 
this thread. I did both the 10k testcase above as well as 10k lines of 
puts(2 + 3) on both ruby 1.8 and 1.9:

1.8:

116.05s:    86.17 l/s:    6.23 Kb/s:  722 Kb:10000 
loc:../dev/blah1_10k.rb
  8.68s:  1152.15 l/s:   13.50 Kb/s:  117 Kb:10000 
loc:../dev/blah2_10k.rb

1.9:

 84.80s:   117.92 l/s:    8.52 Kb/s:  722 Kb:10000 
loc:../dev/blah1_10k.rb
  5.48s:  1825.23 l/s:   21.39 Kb/s:  117 Kb:10000 
loc:../dev/blah2_10k.rb

Not an entirely fair comparison by using ruby_parser instead of an 
incredibly restrained grammar... but there you have it.

That said, I will say that I only barely tolerate LR based parser 
generators. I would love to have a fully conformant LL-based parser for 
ruby. I'm not convinced it is possible as ruby's grammar is seriously 
fucked up.
Posted by Garthy D (Guest)
on 2012-02-01 00:48
(Received via mailing list)
Hi Thomas,

I use ANTLR to generate C++ code which I subsequently extend into Ruby.
It'd be absolute overkill if you're not already extending/embedding for
your project though!

I believe ANTLR is LL(*). It also does lexers.

There is apparently a Ruby target for ANTLR as well. I've never tried it
myself.

You'll need Java to build, but not to run- that's the case in my current
project but I'm not using the latest ANTLR.

Some links:

http://www.antlr.org/
http://en.wikipedia.org/wiki/ANTLR
http://www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget

Another thing to explore, maybe. :)

Good luck!

Garth

PS. If someone has already suggested ANTLR, my apologies. I skimmed the
replied but it was hardly a detailed search.
Posted by Benedikt Huber (visq)
on 2012-02-01 01:57
Ryan Davis wrote in post #1043324:
> On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote:
>
>> For heavy lifting, there's always Racc. ruby_parser uses it, and it's
>> pretty fast.
>>
>> http://i.loveruby.net/en/projects/racc/
>> http://rubygems.org/gems/racc
>> http://rubydoc.info/gems/racc/1.4.7/frames
> I did, but with all the complexity of ruby_parser, not the grammar in
> this thread.
Thanks for the numbers. I wrote a small grammar for the MiniP language
in racc, and repeated the experiments (on a different machine, ruby
1.9.3). racc's speed is ok, but it seems to be slower than rsec-ext.

parslet
53.10s
racc
2.05s
rsec-ext
0.72s


> That said, I will say that I only barely tolerate LR based parser
> generators. I would love to have a fully conformant LL-based parser for
> ruby.
I believe both racc and ANTLR won't be faster than rsec-ext, as they
generate ruby code. For many less-convoluted grammars (ruby is not a
good example ;)), a PEG-style parser library is a good and pleasant to
use alternative to a parser generator.

For my prototype, I rewrote a constraint file parser to rsec-ext,
which works great. I can't use it at the moment, because it is 1.9
only, but that's a  different story.

Kind Regards,
Benedikt
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.