Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:
http://rubyforge.org/projects/grammar/
and the presentation is here (before a few modifications):
http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf
Here are some release notes:
- second complete rewrite since Grammar 0.5
- BNF-like grammar written directly in Ruby
- LL(1) parser
- separation of Grammar and parser generator (“engine”)
- engines could generate a parser in a particular target language or do
something else with the Grammar tree
- current engines: Ruby (generate stand-alone very flat Ruby parser),
Ruby0
(directly parse while traversing Grammar), Ruby2CExt (Ruby + convert to
C
using ruby2cext)
- lexer and parser grammars are specified in exactly the same way - one
parses characters and the other tokens
- can write a lexer-free parser
- may define tokens however is appropriate
- methods for a lexer (or just token) Grammar to a parser Grammar -
including multi-threading
- arbitrary lookahead can be achieved with the backtrack method
- tail-call optimization of recursion (right recursion)
- directly handles left recursion (historically only doable with (LA)LR
parsers only).
- very very fast - on par with the best hand crafted/tuned recursive
descent
LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
generators. And that is before Ruby2CExt.
Comments and feedback are quite welcome.
Eric
On Sat, Sep 6, 2008 at 9:20 PM, Eric M. [email protected]
wrote:
Here are some release notes:
- lexer and parser grammars are specified in exactly the same way - one
descent
LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
generators. And that is before Ruby2CExt.
Comments and feedback are quite welcome.
Eric
Here is a benchmark for parsing JSON (JSON->ruby). “size” is the size
of
the parser specification (stripped out insignificant spacing).
“bandwidth”
is the size of the JSON / runtime. hand* and json* are manually created
(well optimized) JSON parsers. The others are from general parser
generators. *ext and *Ext have underlying C code to help (Regexp could
be
considered like that also). 3 engines are shown with Grammar 0.8 along
with
Grammar 0.5 (5X pure ruby improvement from 0.5->0.8). The benchmark
code
along with the JSON parser specs are in the gem (benchmark dir).
parser generator size bandwidth
json/ext 12041 4.25M/s
Grammar/Ruby2CExt 1083 1.36M/s
hand/RE 1482 619K/s
hand/LL1 3342 483K/s
Grammar/Ruby 1083 417K/s
RACC/ext 1389 296K/s
json/pure 3979 283K/s
Grammar/0.5 1242 83.7K/s
RACC/ruby 1389 77.1K/s
Treetop 1121 8.98K/s
GhostWheel 1076 6.88K/s
Grammar/Ruby0 1083 2.59K/s
Peggy 2975 1.7K/s
Also, for any that looked at the presentation, the demos referred to are
in
test/test_demo.rb.
On Sep 6, 2008, at 8:20 PM, Eric M. wrote:
Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:
http://rubyforge.org/projects/grammar/
and the presentation is here (before a few modifications):
http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf
i look forward to using this - looks like a very nice piece of work!
a @ http://codeforpeople.com/
On Sun, Sep 7, 2008 at 11:22 PM, ara.t.howard
[email protected]wrote:
http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf
i look forward to using this - looks like a very nice piece of work!
Thanks!
I believe I finally have infrastructure I’m happy with that is very
extensible. At this point, it would be nice to have help with
development.
Here are some of the ideas I’m thinking:
- more engines (the things that traverse Grammar trees)
++ direct Ruby C (target: on par with Florian F.'s Ragel generated
JSON C
extension)
++ engines call lambda’s for actions instead of inlining (slower, but
more
usable/flexible)
++ generation in other languages using native (non-ruby) objects: C++,
Java,
python, …
++ generation of Graphiz dot diagram or other visualization formats
++ LALR or packrat parser generation and/or add some features of those
to
current engines (i.e. memoization)
- conversion mechanisms
++ from ANTLR
++ from YACC/RACC/RACC
++ possibly layer these so that these formats could be used directly
- specific language grammars
++ *ML formats - should be simple
++ parse Ruby itself - quite difficult, but should be possible with
Grammar
++ other programming languages - ANTLR grammars are probably a good
start
- better documentation and examples
++ especially a web page - not my forte
- additional functionality at the Grammar level
++ external object references (not just clones or Marshaled as now)
++ additional common Grammar patterns
- many other things I’m just not seeing
Contact me with any interest.
Eric