Hi, I need to write an easy parser and i don't know if i can use an existing library. Thanks
on 2012-01-16 08:48
on 2012-01-16 09:07
thomas carlier писал 16.01.2012 11:48: > Hi, > > I need to write an easy parser and i don't know if i can use an > existing > library. > > Thanks Try Treetop: http://treetop.rubyforge.org/ You may find one of my blog posts useful: http://whitequark.org/blog/2011/09/08/treetop-typical-errors/
on 2012-01-16 09:34
> Try Treetop: > http://treetop.rubyforge.org/ And then try parslet: http://kschiess.github.com/parslet/ k
on 2012-01-16 10:40
Peter Zotov wrote in post #1041059: > thomas carlier писал 16.01.2012 11:48: >> Hi, >> >> I need to write an easy parser and i don't know if i can use an >> existing >> library. >> >> Thanks > > Try Treetop: > http://treetop.rubyforge.org/ > > You may find one of my blog posts useful: > http://whitequark.org/blog/2011/09/08/treetop-typical-errors/ Thanks and nice blog, very useful informations. I'll try treetop
on 2012-01-16 11:32
Tony Arcieri писал 16.01.2012 12:47: >>> >> >> And then try parslet: >> >> http://kschiess.github.com/**parslet/<http://kschiess.github.com/parslet/> >> >> k Is there a PEG parser around here which does not keep all the symbols each in its own node? Or, maybe, any one which is faster than a dying snail? Just wondering.
on 2012-01-16 16:59
> Is there a PEG parser around here which does not keep all the symbols > each in > its own node? Or, maybe, any one which is faster than a dying snail? Just > wondering. As one of the authors of one of these libraries which are slow as a dying snail, I am wondering: How do you measure? Would you like to contribute your benchmark, so that our (quite extensive) optimization efforts can go your way? And finally, what is the ground speed of a dying snail? My measurements (to contribute to the thread as well) are here: http://blog.absurd.li/2011/02/02/parslet_and_its_friends.html k
on 2012-01-18 04:18
Kaspar Schiess wrote in post #1041114: >> Is there a PEG parser around here which does not keep all the symbols >> each in >> its own node? Or, maybe, any one which is faster than a dying snail? Just >> wondering. > > As one of the authors of one of these libraries which are slow as a > dying snail, I am wondering: How do you measure? Would you like to > contribute your benchmark, so that our (quite extensive) optimization > efforts can go your way? And finally, what is the ground speed of a > dying snail? > > My measurements (to contribute to the thread as well) are here: > http://blog.absurd.li/2011/02/02/parslet_and_its_friends.html > > k Hi, Which one will you recommend for a simple grammar like expressions : expression+ ; expression : [{]content+[}] ; content : token | TOKEN[|]content |TOKEN[|]expression ; TOKEN : .* ; The grammar is not correct, but you'll understand the main idea Example : some text {text {text|text}} some text {text|text|text} input size [100 - 5000] chars
on 2012-01-18 05:07
On Jan 17, 2012, at 19:18 , thomas carlier wrote: > TOKEN : .* > ; > > The grammar is not correct, but you'll understand the main idea > > Example : some text {text {text|text}} some text {text|text|text} > > input size [100 - 5000] chars Write your own by hand. It'll be faster and smaller than anything mentioned above. http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf
on 2012-01-25 09:21
On 18.01.12 05:07, Ryan Davis wrote:
> Write your own by hand. It'll be faster
And for another definition of faster (faster to code): parslet.
greetings,
k
on 2012-01-30 23:58
Kaspar Schiess wrote in post #1042425: > On 18.01.12 05:07, Ryan Davis wrote: >> Write your own by hand. It'll be faster > > And for another definition of faster (faster to code): parslet. > > greetings, > k Hello, I tried out parslet today, and I really appreciate its design (i.e., what parsers look like) as well as the beautiful webpage you built. I could not use it for my current project, however, as it was way to slow. I need to be able to parse millions of constraint specifications, each based on a fairly simple grammar (20 rules or so). I found my original implementation (regular expression matching and scanning) to be ugly, inefficient (scanning a string a few times) and rather slow (~15K constraints per second). The goal was to get both more beautiful and faster by using a parser framework. It got more beautiful, but too slow to be useful to me. If I had benchmarked the MiniP parser posted on the homepage of parslet, it would have been obvious that parslet is too slow for my purposes: On my machine, it parses 180 lines / second, given the test string "puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 11, 2, 3, 5, 7, 11,19)" So while I appreciate the neat interface of those fancy new parsers, people should know that they are slow. But, I think it would be awesome to have a beautiful parser lib that is fast, and given that the performance of the regular expression engine is quite decent, it should be feasible to build such a parser. a) Do you think there is any chance to get a faster (say, factor 100) implementation for parslet with the same (or a similar) interface? b) If not, is there any maintained ruby parsing library or parser generator (no need to be in pure ruby) which is fast enough? How fast is antlr for ruby? Kind Regards, Benedikt
on 2012-01-31 13:05
Hello again,
I found one combinator-based parser library which seems to provide quite
decent performance:
rsec-ext (http://rsec.heroku.com/)
Here is the code implementing parslet's MiniP parser:
require 'rubygems'
require "rsec"
include Rsec::Helpers
id = /[a-z]+/.r.fail 'id'
int = /[0-9]+/.r.fail 'int'
op = one_of_('+')
comma = /\s*,\s*/.r
sum = seq(int, op, lazy{expr})
arglist= '('.r >> lazy{expr}.join(comma).even << ')'
funcall= seq_(id, arglist)
expr = funcall | sum | int
parser = expr.eof
File.readlines(ARGV.first).each { |line| parser.parse!(line) }
I do not claim that the parsers are equivalent (different datastructures
for the result), and so the comparison is a little bit unfair and only
shows a trend. I'd like to share it anyway:
Parsing 10000 lines, each containing
puts(3 + 2 + 61235 + 24 + 51, 252 + 235 + 23532 + 11, 2, 3, 5, 7, 11,
19)
the parsers need:
parslet / ruby 1.8: 162.8s
parslet / ruby 1.9: 49.5s
rsec-ext / ruby 1.9: 0.7s
Kind Regards,
Benedikt
on 2012-01-31 18:31
For heavy lifting, there's always Racc. ruby_parser uses it, and it's pretty fast. http://i.loveruby.net/en/projects/racc/ http://rubygems.org/gems/racc http://rubydoc.info/gems/racc/1.4.7/frames (Didn't do benchmarks.) -- Matma Rex
on 2012-01-31 23:24
On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote: > For heavy lifting, there's always Racc. ruby_parser uses it, and it's > pretty fast. > > http://i.loveruby.net/en/projects/racc/ > http://rubygems.org/gems/racc > http://rubydoc.info/gems/racc/1.4.7/frames I did, but with all the complexity of ruby_parser, not the grammar in this thread. I did both the 10k testcase above as well as 10k lines of puts(2 + 3) on both ruby 1.8 and 1.9: 1.8: 116.05s: 86.17 l/s: 6.23 Kb/s: 722 Kb:10000 loc:../dev/blah1_10k.rb 8.68s: 1152.15 l/s: 13.50 Kb/s: 117 Kb:10000 loc:../dev/blah2_10k.rb 1.9: 84.80s: 117.92 l/s: 8.52 Kb/s: 722 Kb:10000 loc:../dev/blah1_10k.rb 5.48s: 1825.23 l/s: 21.39 Kb/s: 117 Kb:10000 loc:../dev/blah2_10k.rb Not an entirely fair comparison by using ruby_parser instead of an incredibly restrained grammar... but there you have it. That said, I will say that I only barely tolerate LR based parser generators. I would love to have a fully conformant LL-based parser for ruby. I'm not convinced it is possible as ruby's grammar is seriously fucked up.
on 2012-02-01 00:48
Hi Thomas, I use ANTLR to generate C++ code which I subsequently extend into Ruby. It'd be absolute overkill if you're not already extending/embedding for your project though! I believe ANTLR is LL(*). It also does lexers. There is apparently a Ruby target for ANTLR as well. I've never tried it myself. You'll need Java to build, but not to run- that's the case in my current project but I'm not using the latest ANTLR. Some links: http://www.antlr.org/ http://en.wikipedia.org/wiki/ANTLR http://www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget Another thing to explore, maybe. :) Good luck! Garth PS. If someone has already suggested ANTLR, my apologies. I skimmed the replied but it was hardly a detailed search.
on 2012-02-01 01:57
Ryan Davis wrote in post #1043324: > On Jan 31, 2012, at 09:30 , Bartosz Dziewoński wrote: > >> For heavy lifting, there's always Racc. ruby_parser uses it, and it's >> pretty fast. >> >> http://i.loveruby.net/en/projects/racc/ >> http://rubygems.org/gems/racc >> http://rubydoc.info/gems/racc/1.4.7/frames > I did, but with all the complexity of ruby_parser, not the grammar in > this thread. Thanks for the numbers. I wrote a small grammar for the MiniP language in racc, and repeated the experiments (on a different machine, ruby 1.9.3). racc's speed is ok, but it seems to be slower than rsec-ext. parslet 53.10s racc 2.05s rsec-ext 0.72s > That said, I will say that I only barely tolerate LR based parser > generators. I would love to have a fully conformant LL-based parser for > ruby. I believe both racc and ANTLR won't be faster than rsec-ext, as they generate ruby code. For many less-convoluted grammars (ruby is not a good example ;)), a PEG-style parser library is a good and pleasant to use alternative to a parser generator. For my prototype, I rewrote a constraint file parser to rsec-ext, which works great. I can't use it at the moment, because it is 1.9 only, but that's a different story. Kind Regards, Benedikt
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.