Grammar 0.8

snowbird122 · September 7, 2008, 4:27am

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

Here are some release notes:

second complete rewrite since Grammar 0.5
BNF-like grammar written directly in Ruby
LL(1) parser
separation of Grammar and parser generator (“engine”)
engines could generate a parser in a particular target language or do
something else with the Grammar tree
current engines: Ruby (generate stand-alone very flat Ruby parser),
Ruby0
(directly parse while traversing Grammar), Ruby2CExt (Ruby + convert to
C
using ruby2cext)
lexer and parser grammars are specified in exactly the same way - one
parses characters and the other tokens
can write a lexer-free parser
may define tokens however is appropriate
methods for a lexer (or just token) Grammar to a parser Grammar -
including multi-threading
arbitrary lookahead can be achieved with the backtrack method
tail-call optimization of recursion (right recursion)
directly handles left recursion (historically only doable with (LA)LR
parsers only).
very very fast - on par with the best hand crafted/tuned recursive
descent
LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
generators. And that is before Ruby2CExt.

Comments and feedback are quite welcome.

Eric

snowbird122 · September 7, 2008, 4:14pm

On Sat, Sep 6, 2008 at 9:20 PM, Eric M. [email protected]
wrote:

Here are some release notes:

lexer and parser grammars are specified in exactly the same way - one
descent
LL(1) or Regexp (LL(*)) parsers. 100X faster than many other parser
generators. And that is before Ruby2CExt.

Comments and feedback are quite welcome.

Eric

Here is a benchmark for parsing JSON (JSON->ruby). “size” is the size
of
the parser specification (stripped out insignificant spacing).
“bandwidth”
is the size of the JSON / runtime. hand* and json* are manually created
(well optimized) JSON parsers. The others are from general parser
generators. *ext and *Ext have underlying C code to help (Regexp could
be
considered like that also). 3 engines are shown with Grammar 0.8 along
with
Grammar 0.5 (5X pure ruby improvement from 0.5->0.8). The benchmark
code
along with the JSON parser specs are in the gem (benchmark dir).

parser generator size bandwidth

json/ext 12041 4.25M/s
Grammar/Ruby2CExt 1083 1.36M/s
hand/RE 1482 619K/s
hand/LL1 3342 483K/s
Grammar/Ruby 1083 417K/s
RACC/ext 1389 296K/s
json/pure 3979 283K/s
Grammar/0.5 1242 83.7K/s
RACC/ruby 1389 77.1K/s
Treetop 1121 8.98K/s
GhostWheel 1076 6.88K/s
Grammar/Ruby0 1083 2.59K/s
Peggy 2975 1.7K/s

Also, for any that looked at the presentation, the demos referred to are
in
test/test_demo.rb.

snowbird122 · September 8, 2008, 6:29am

On Sep 6, 2008, at 8:20 PM, Eric M. wrote:

Since I was giving a presentation about Grammar at the Lonestar Ruby
Conference (which ended today), I thought it was about time to release
another version of Grammar. I posted it yesterday here:

http://rubyforge.org/projects/grammar/

and the presentation is here (before a few modifications):

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

i look forward to using this - looks like a very nice piece of work!

a @ http://codeforpeople.com/

snowbird122 · September 8, 2008, 8:59pm

On Sun, Sep 7, 2008 at 11:22 PM, ara.t.howard
[email protected]wrote:

http://grammar.rubyforge.org/0.8/grammar_lonestar2008.swf

i look forward to using this - looks like a very nice piece of work!

Thanks!

I believe I finally have infrastructure I’m happy with that is very
extensible. At this point, it would be nice to have help with
development.
Here are some of the ideas I’m thinking:

more engines (the things that traverse Grammar trees)
++ direct Ruby C (target: on par with Florian F.'s Ragel generated
JSON C
extension)
++ engines call lambda’s for actions instead of inlining (slower, but
more
usable/flexible)
++ generation in other languages using native (non-ruby) objects: C++,
Java,
python, …
++ generation of Graphiz dot diagram or other visualization formats
++ LALR or packrat parser generation and/or add some features of those
to
current engines (i.e. memoization)
conversion mechanisms
++ from ANTLR
++ from YACC/RACC/RACC
++ possibly layer these so that these formats could be used directly
specific language grammars
++ *ML formats - should be simple
++ parse Ruby itself - quite difficult, but should be possible with
Grammar
++ other programming languages - ANTLR grammars are probably a good
start
better documentation and examples
++ especially a web page - not my forte
additional functionality at the Grammar level
++ external object references (not just clones or Marshaled as now)
++ additional common Grammar patterns
many other things I’m just not seeing

Contact me with any interest.

Eric