Grammar 0.8

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

Here are some release notes:

  • second complete rewrite since Grammar 0.5
  • BNF-like grammar written directly in Ruby
  • LL(1) parser
  • separation of Grammar and parser generator (“engine”)
  • engines could generate a parser in a particular target language or do
    something else with the Grammar tree
  • current engines: Ruby (generate stand-alone very flat Ruby parser),
    Ruby0
    (directly parse while traversing Grammar), Ruby2CExt (Ruby + convert to
    C
    using ruby2cext)
  • lexer and parser grammars are specified in exactly the same way - one
    parses characters and the other tokens
  • can write a lexer-free parser
  • may define tokens however is appropriate
  • methods for a lexer (or just token) Grammar to a parser Grammar -
    including multi-threading
  • arbitrary lookahead can be achieved with the backtrack method
  • tail-call optimization of recursion (right recursion)
  • directly handles left recursion (historically only doable with (LA)LR
    parsers only).
  • very very fast - on par with the best hand crafted/tuned recursive
    descent
    LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
    generators. And that is before Ruby2CExt.

Comments and feedback are quite welcome.

Eric

On Sat, Sep 6, 2008 at 9:20 PM, Eric M. [email protected]
wrote:

Here are some release notes:

  • lexer and parser grammars are specified in exactly the same way - one
    descent
    LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
    generators. And that is before Ruby2CExt.

Comments and feedback are quite welcome.

Eric

Here is a benchmark for parsing JSON (JSON->ruby). “size” is the size
of
the parser specification (stripped out insignificant spacing).
“bandwidth”
is the size of the JSON / runtime. hand* and json* are manually created
(well optimized) JSON parsers. The others are from general parser
generators. *ext and *Ext have underlying C code to help (Regexp could
be
considered like that also). 3 engines are shown with Grammar 0.8 along
with
Grammar 0.5 (5X pure ruby improvement from 0.5->0.8). The benchmark
code
along with the JSON parser specs are in the gem (benchmark dir).

parser generator size bandwidth


json/ext 12041 4.25M/s
Grammar/Ruby2CExt 1083 1.36M/s
hand/RE 1482 619K/s
hand/LL1 3342 483K/s
Grammar/Ruby 1083 417K/s
RACC/ext 1389 296K/s
json/pure 3979 283K/s
Grammar/0.5 1242 83.7K/s
RACC/ruby 1389 77.1K/s
Treetop 1121 8.98K/s
GhostWheel 1076 6.88K/s
Grammar/Ruby0 1083 2.59K/s
Peggy 2975 1.7K/s

Also, for any that looked at the presentation, the demos referred to are
in
test/test_demo.rb.

On Sep 6, 2008, at 8:20 PM, Eric M. wrote:

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

i look forward to using this - looks like a very nice piece of work!

a @ http://codeforpeople.com/

On Sun, Sep 7, 2008 at 11:22 PM, ara.t.howard
[email protected]wrote:

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

i look forward to using this - looks like a very nice piece of work!

Thanks!

I believe I finally have infrastructure I’m happy with that is very
extensible. At this point, it would be nice to have help with
development.
Here are some of the ideas I’m thinking:

  • more engines (the things that traverse Grammar trees)
    ++ direct Ruby C (target: on par with Florian F.'s Ragel generated
    JSON C
    extension)
    ++ engines call lambda’s for actions instead of inlining (slower, but
    more
    usable/flexible)
    ++ generation in other languages using native (non-ruby) objects: C++,
    Java,
    python, …
    ++ generation of Graphiz dot diagram or other visualization formats
    ++ LALR or packrat parser generation and/or add some features of those
    to
    current engines (i.e. memoization)
  • conversion mechanisms
    ++ from ANTLR
    ++ from YACC/RACC/RACC
    ++ possibly layer these so that these formats could be used directly
  • specific language grammars
    ++ *ML formats - should be simple
    ++ parse Ruby itself - quite difficult, but should be possible with
    Grammar
    ++ other programming languages - ANTLR grammars are probably a good
    start
  • better documentation and examples
    ++ especially a web page - not my forte
  • additional functionality at the Grammar level
    ++ external object references (not just clones or Marshaled as now)
    ++ additional common Grammar patterns
  • many other things I’m just not seeing

Contact me with any interest.

Eric