ANTLR Target for Ruby

arcadio · September 16, 2008, 7:08pm

Hi everyone,

I’ve asked the following question in the ANTLR mailing list, with no
luck. Since it’s quite Ruby-centric, perhaps someone here knows the
answer, given the popularity of ANTLR.

I’m in the process of migrating a YACC grammar to ANTLR.

However, since the rest of the project is written in Ruby, I’d like to
generate the ANTLR code in Ruby, if possible. Currently I’m generating
Java and using JRuby to circumvent this limitation. I’d prefer to use
MRI-Ruby, though

I’ve read the wiki page related to the Ruby Target (http://
www.antlr.org/wiki/display/ANTLR3/Antlr3RubyTarget), and the blog of
its developer (http://split-s.blogspot.com/2005/12/antlr-for-
ruby.html). The development seems to have halted in version 3.0b4.
This version could suit my needs as I’m only using basic Lexer and
Parser generation. No tree parsers.

Does anyone know where to get this version? It’s not listed in the
Download directory (http://www.antlr.org/download/) and I couldn’t
find it in the development repository. Has anyone used it
successfully?

Thanks in advance.

arcadio · September 21, 2008, 3:05pm

arcadio wrote:

I’ve asked the following question in the ANTLR mailing list, with no
luck. Since it’s quite Ruby-centric, perhaps someone here knows the
answer, given the popularity of ANTLR.

I’m in the process of migrating a YACC grammar to ANTLR.

However, since the rest of the project is written in Ruby, I’d like to
generate the ANTLR code in Ruby, if possible.

Have you tried just adding the required options section below your
grammar
statement? It just worked for me, though the generated code is a little
quirky. The Ruby generator has been a standard part of ANTLR for years.

options {
language = Ruby;
output = AST;
}

I found that ANTLR was very poor at reporting situations for which it
couldn’t generate correct code (or rather, the designer has a strange
idea of “correct”), and in particular the lexing behaviour is very
non-intuitive. I was attempting a very difficult grammar that needs
LL(*) behaviour, and as I know now, needs to handle lexing using parse
rules (no lexer at all). After some unhelpful arguments, I gave up and
implemented my grammar using Treetop. Terribly slow, but powerful and
simple to use, once you get the hang of PEG parsing.

Clifford H…

arcadio · September 21, 2008, 4:46pm

On Sun, Sep 21, 2008 at 7:56 AM, Clifford H. [email protected]
wrote:

I found that ANTLR was very poor at reporting situations for which it
couldn’t generate correct code (or rather, the designer has a strange
idea of “correct”), and in particular the lexing behaviour is very
non-intuitive. I was attempting a very difficult grammar that needs
LL(*) behaviour, and as I know now, needs to handle lexing using parse
rules (no lexer at all). After some unhelpful arguments, I gave up and
implemented my grammar using Treetop. Terribly slow, but powerful and
simple to use, once you get the hang of PEG parsing.

Clifford H…

Hey Clifford,

I still don’t know enough about packrat parsing yet. Do you think it
possible for this type of parser to reach the same performance as LL (or
LALR) parsers or is there just extra overhead that you need at parse
time
(as opposed to accounted for at parser generation time)? In my
benchmarking, the 3 packrat parsers I looked at (Treetop, Ghostwheel,
and
Peggy) are 10-100 X slower than pure ruby LL and LALR parsers.

I also was a previous ANTLR user (you might find my C preprocessor in
ANTLR
2.7.6 releases) before writing my own parser. I decided to stick with
LL
parsing for my Grammar project. It does handle lexer-free parsers (but
you
can have a lexer) and handles LL(*) parsing, but with a performance
penalty

backtracking. From my understanding, packrat parsers shine in
backtracking by memoizing to maintain linear performance. I’m wondering
if
I could use this technique and still maintain my non-backtracking
performance. I thinking I could also make an “engine” for Grammar that
did
packrat or even LALR parsing instead of LL parsing. Independently I
should
be able to translate a PEG syntax, Regexp syntax, or BNF (YACC/ANTLR) to
a
Grammar.

Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is
what I’ve been benchmarking with (because of the ruby quiz) and I’d like
to
compare against ANTLR.

Eric

arcadio · September 23, 2008, 11:20am

Eric M. wrote:

I still don’t know enough about packrat parsing yet. Do you think it
possible for this type of parser to reach the same performance as LL (or
LALR) parsers or is there just extra overhead that you need at parse time

Unless you can analyse the heck out of the source grammar, as ANTLR
does,
I think there’s always going to be an overhead. Terence Parr’s
achievement
is to figure out where backtracking might be needed and do it
efficiently
only when needed.

Treetop could be improved with some local optimisations, like using
Regexp,
and perhaps looking at the rest of the rule to reduce memoization for
example,
which might pick up 10x improvement or mor, but I expect there’ll always
be a
a large overhead over using a hand-crafted parser. There’s a suggestion
in the
wind to implement “skip” rules, that produce no syntax nodes, for things
like
whitespace skipping.

Currently, saying “space*” will generate one node for each space -
Regexp
would fix that. So even for a PEG parser, Treetop is unnecessarily slow.

It does handle lexer-free parsers (but you
can have a lexer) and handles LL(*) parsing, but with a performance penalty

backtracking. From my understanding, packrat parsers shine in
backtracking by memoizing to maintain linear performance. I’m wondering if
I could use this technique and still maintain my non-backtracking
performance.

The major part of the cost is the construction of so many objects.
If you provide memoize-when-hinted, I think that’d be best.
If you also provided a sweet metagrammar (I find your earlier Grammar
examples make my eyes bleed), I’d be onto it like a shot. I need the
prospect of non-Ruby code generation however…

Perhaps an option to “memoize everything” could gather statistics based
on actual parser behaviour with real input, and produce the backtracking
hints automatically? That’d be really sweet, because you wouldn’t need
to
be a language wizard to work out where to put them.

I thinking I could also make an “engine” for Grammar that did
packrat or even LALR parsing instead of LL parsing.

Nathan was thinking about implementing Treetop using a VM, Truth is,
if the VM had the right hints, it’d be awesome.

Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is
what I’ve been benchmarking with (because of the ruby quiz) and I’d like to
compare against ANTLR.

Not I, but I’d be surprised if Google doesn’t find one. It’d be pretty
simple to throw one together anyhow.

Have you used ANTRWorks BTW? It’s excellent!

Clifford H…

arcadio · September 23, 2008, 4:08pm

On Sep 23, 2008, at 8:58 AM, Eric M. wrote:

I already found it on the antlr site, but not with a ruby target. I
didn’t want to go relearn antlr (v3 now) and how to get it to
generate ruby.

Hopefully it’s fixed now, but last time I tried using ANTLR for Ruby
output it was very broken. That was a while ago though (shortly after
version three released, I think).

James Edward G. II

arcadio · September 23, 2008, 8:31pm

On Sep 23, 2008, at 07:08 , James G. wrote:

On Sep 23, 2008, at 8:58 AM, Eric M. wrote:

I already found it on the antlr site, but not with a ruby target.
I didn’t want to go relearn antlr (v3 now) and how to get it to
generate ruby.

Hopefully it’s fixed now, but last time I tried using ANTLR for Ruby
output it was very broken. That was a while ago though (shortly
after version three released, I think).

I doubt it is fixed… I tried to submit patches to get the ruby code
to work and Terr was a PITA to no end so I walked away. He later came
to me asking if I wanted to maintain the ruby side, but… too little,
too late.

I just checked perforce and the last change to ruby runtime was:

Change 3846 on 2007/06/26 by [email protected] ‘- fixed
various bugs related…’

My last emails from Terr were 2007-08, so … no, probably not fixed
in the slightest.

arcadio · September 23, 2008, 4:02pm

On Tue, Sep 23, 2008 at 4:20 AM, Clifford H. [email protected]
wrote:

only when needed.
whitespace skipping.

I could use this technique and still maintain my non-backtracking
performance.

The major part of the cost is the construction of so many objects.
If you provide memoize-when-hinted, I think that’d be best.
If you also provided a sweet metagrammar (I find your earlier Grammar
examples make my eyes bleed), I’d be onto it like a shot. I need the
prospect of non-Ruby code generation however…

I’m curious, could you give an example of what makes your eyes bleed? I
would like to improve the usability where possible. The Grammar layer
itself will always be just ruby objects with a ruby DSL. But, I would
like
to provide translators from various formats (BNF, PEG, regex, etc) to
Grammar objects. I’m sure I could even give you a faster replacement
for
Regexp (using ruby2cext). These translators would simply be Grammars
that
build Grammars (or maybe the code, so you could see/edit it).

Also, with Grammar 0.8, I’ve completely separated the parser generation.
Grammar is just a light-weight layer used to hold the description of the
language you are parsing. The real work is in the “engine” that does
parser
generation. There aren’t many bounds as to what an “engine” could do
with a
Grammar. An engine could:

generate a parser in another target language (engine might allow an
“action” to be simply a code string)
directly parse while traversing a Grammar
generate another type of parser - LALR, packrat ???
build a Graphviz diagram of a Grammar tree
“inspect” a Grammar (Grammar#inspect isn’t useful because every
Grammar
level mainly just has a lambda)
translate to another format - regex, BNF, PEG

Right now I’ve implemented only a few engines (some built on each other)

Ruby (generates very flat LL ruby parsers), RubyCall (puts method calls
in
certain places), Ruby0 (parses directly instead of generating a parser),
Ruby2CExt (uses ruby2cext to generate a rubyC parser).

Perhaps an option to “memoize everything” could gather statistics based

on actual parser behaviour with real input, and produce the backtracking
hints automatically? That’d be really sweet, because you wouldn’t need to
be a language wizard to work out where to put them.

Personally, I don’t like the idea of automatic backtracking. Regexp
does
this. The big problem I see with auto-backtracking is that when the
parse
fails, you just get a simple “it failed” response. Or you could
possibly
generate a tree of possible failing locations (and expectations) in the
input. Nothing as simple as “expected this, got that, at line 123”.
Ever
try to debug a large Regexp? If it doesn’t parse your input, it just
returns false and you have nothing to go on. I see the
auto-backtracking
feature of Regexp as the reason it can’t do much more than that.
If/when I
make a Regexp translator, it would generate a Grammar that can backtrack
on
every alternative (except with the (?>…) no backtrack construct) to be
compatible. I guess an “engine” could also backtrack by default.

In my very first incarnation of Grammar (called Syntax on RAA), I did do
auto-backtracking, but found the above problem. I’d rather it be the
exception than the rule.

Does Treetop auto-backtrack? How do you deal with error reporting?

If backtracking isn’t always on, do you memoize only when you might
possibly
backtrack? In Grammar, I’m wondering if it is possible to take to a
memoizing performance hit only when backtracking is allowed.

I thinking I could also make an “engine” for Grammar that did

packrat or even LALR parsing instead of LL parsing.

Nathan was thinking about implementing Treetop using a VM, Truth is,

if the VM had the right hints, it’d be awesome.

I’d also could make a Grammar engine for generating VM bytecode if that
were
possible.

Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is

what I’ve been benchmarking with (because of the ruby quiz) and I’d like
to
compare against ANTLR.

Not I, but I’d be surprised if Google doesn’t find one. It’d be pretty
simple to throw one together anyhow.

I already found it on the antlr site, but not with a ruby target. I
didn’t
want to go relearn antlr (v3 now) and how to get it to generate ruby.

Have you used ANTRWorks BTW? It’s excellent!

Once. I’d like to make something like this for Grammar (at least an
engine
that graphically displays a Grammar).

Eric

arcadio · September 23, 2008, 11:57pm

On Sep 23, 2008, at 12:19 , James G. wrote:

Yeah, I too felt like Ruby was a second class citizen. I was trying
to make the switch to ANTLR as my parsing tool, but their week
support ran me off.

I doubt they even gave it that long… rimshot

arcadio · September 23, 2008, 9:20pm

On Sep 23, 2008, at 1:30 PM, Ryan D. wrote:

Ruby output it was very broken. That was a while ago though
various bugs related…’

My last emails from Terr were 2007-08, so … no, probably not fixed
in the slightest.

Yeah, I too felt like Ruby was a second class citizen. I was trying
to make the switch to ANTLR as my parsing tool, but their week support
ran me off.

James Edward G. II

arcadio · September 24, 2008, 4:05am

Eric M. wrote:

The major part of the cost is the construction of so many objects.
If you provide memoize-when-hinted, I think that’d be best.
If you also provided a sweet metagrammar (I find your earlier Grammar
examples make my eyes bleed), I’d be onto it like a shot. I need the
prospect of non-Ruby code generation however…
I’m curious, could you give an example of what makes your eyes bleed?

I must admit I haven’t looked closely at Grammar for a while, however…

DSLs are fine for folk who don’t know how to make proper grammars :-).
They tend to have a somewhat high signal-to-noise ratio, by which I mean
there are too many extraneous characters/symbols, or symbols that aren’t
intuitively communicative.

Every non-essential symbol in a sentence is a significant barrier
to a newcomer, though experienced users learn to skip them and don’t
understand what the fuss is about. The problem is that the difficulty
of grasping which symbols are significant and which aren’t goes with
the square of the S/N ratio.

XML is a perfect example of how to do this wrongly sendmail.cf
and Perl are even better examples ;-).

I would like
to provide translators from various formats (BNF, PEG, regex, etc) to
Grammar objects.

That’s what I’m talking about, good stuff.

Also, with Grammar 0.8, I’ve completely separated the parser generation.
Grammar is just a light-weight layer used to hold the description of the
language you are parsing.

So you’re already halfway to a VM, where the generated parser doesn’t
have
to be human-readable at all

Personally, I don’t like the idea of automatic backtracking. Regexp does
this. The big problem I see with auto-backtracking is that when the parse
fails, you just get a simple “it failed” response.

Not the case with Treetop. It assumes that the furthest point reached is
the failure point (not often true but an assumption that works,
given…)
and lists the text leading up to that, and enumerates the terminal
tokens
that would have allowed it to get further. This works really well.
Try it - call the “failure_reason” method on a failed parse to see it
in action.

Nothing as simple as “expected this, got that, at line 123”.

It’s exactly as simple as that. The diagnosis might be wrong, but to
the human, it’s a comprehensible mistake.

Does Treetop auto-backtrack?

That’s what memoizing is for - it’s the core of how PEG works.

If backtracking isn’t always on, do you memoize only when you might possibly
backtrack? In Grammar, I’m wondering if it is possible to take to a
memoizing performance hit only when backtracking is allowed.

I don’t think you can guess when backtracking will be needed.
As I said, it requires heavy analysis - Terrence didn’t get his
PhD for nothing!

But if you always allow backtracking ala PEG, but only memoize in
places where experience (aka statistics collected) have shown it’s
needed, you could have the best of both worlds.

I already found it on the antlr site, but not with a ruby target. I didn’t
want to go relearn antlr (v3 now) and how to get it to generate ruby.

You just ask for Ruby in your options block at the top of the grammar.
The Ruby target is quite strange and limited though IIRC. The lexer
in all ANTLR grammars is definitely non-intuitive, it doesn’t choose
the longest possible match like every other lexer in the world. Terr
isn’t concerned about his users, just about big-noting himself. I
wouldn’t say “their weak support ran me off” so much as just “they ran
me off”. When he pisses someone off, that just proves his superiority
more… and I have the emails to prove it. You mighta thought DHH had
an attitude problem, heh, he’s tame!

Have you used ANTRWorks BTW? It’s excellent!
Once. I’d like to make something like this for Grammar (at least an engine
that graphically displays a Grammar).

ANTLRWorks shows railroad diagrams, which I used to produce syntax
documentation for CQL. But the ability to interactively test individual
rules is great. A lot of work though - I’d have thought your skills
were better spent elsewhere.

Clifford H…

arcadio · September 24, 2008, 2:24pm

On Tue, Sep 23, 2008 at 9:04 PM, Clifford H. [email protected]
wrote:

DSLs are fine for folk who don’t know how to make proper grammars :-).

OK. I guess that’s that source of the “bleeding eyes”… I’m sounds
like I
won’t be able to convince you why a DSL is better than a specific
format,
and you vice-versa. A quick summary is: although you lose the exact
syntax
that you may want, you gain in power and flexibility. Using Grammar
(directly), you could template Grammars (method/lambda could return a
Grammar based on Grammar args and other params), use plain ruby for
extra
control when specifying a Grammar tree, or simply embed Grammar objects
in a
ruby program (like Regexp objects can be - although they parse the
grammar
from a string).

I would like

So you’re already halfway to a VM, where the generated parser doesn’t have
to be human-readable at all

Better - it is a compiler. Compiles from a Grammar to a target
language.
Grammar has always done this. Previously, Ruby code generation was done
in
the Grammar class itself. I’ve separated it to add quite a bit of
flexibility.

Try it - call the “failure_reason” method on a failed parse to see it
in action.

OK. I guess it just remembers the furthest point it reached when
backtracking and when all possibilities are exhausted, it reports this
furthest point. Sounds reasonable. I guess I could try something like
this
when I backtrack. Right now, I ignore errors/mismatches that cause a
backtrack.

Does Treetop auto-backtrack?

That’s what memoizing is for - it’s the core of how PEG works.

Don’t you mean packrat instead of PEG? I thought PEG was just a format
for
describing something to parse, like BNF. And packrat refers to the type
of
parser - like LL or LALR. From my understanding packrat is a cousin of
LL,
but adds memoizing to achieve linear backtracking performance.

In Grammar, I don’t try to guess. Backtracking is normally not done.
To
allow for backtracking, I provide a Grammar#backtrack method that
returns a
new Grammar that can backtrack to the beginning of where the Grammar
started
parsing the input when it fails in the middle. I would only apply
memoization for one of these - if this is doable.

wouldn’t say “their weak support ran me off” so much as just “they ran
me off”. When he pisses someone off, that just proves his superiority
more… and I have the emails to prove it. You mighta thought DHH had
an attitude problem, heh, he’s tame!

I didn’t encounter that a few years ago. I mostly just learned a lot
about
LL parsing and wanted to make something better. It does sound like
others
have been run off though…

I was only thinking making a Grammar engine that generated dot files for
Graphviz to get a visualization of a Grammar.

Eric

arcadio · September 24, 2008, 10:50am

I wrote a PEG parser generator a couple of years ago to support my
Walrus object-oriented templating system (inspired by the Cheetah
template engine in Python, and carrying over a lot of its syntax and
directives). Actually, the reason I did this was because I didn’t have
any luck getting ANTLR to do what I wanted it to do.

The parser generator uses a Ruby DSL to specify the grammar,
dynamically constructing the parser on the fly. To get an idea of what
the DSL looks like and what it can do, check out the parser for the
Walrus language itself; it is a fairly complex example because it
demonstrates things like “here documents”, include files, and parses a
subset of Ruby itself in addition to the Walrus markup:

http://git.wincent.com/Walrus.git?a=blob;f=lib/walrus/parser.rb

Like Treetop, it is fairly slow, but it works. The major performance
penalty actually seems to be the memory bloat associated with
memoization. There is also a resource leak in there where a lot of the
memoized data doesn’t appear to be getting garbage-collected when you
run the parser over lots of documents in the same session; never got
to the bottom of that one despite spending quite a few hours on it. My
workaround was to process multiple files in batches, running Walrus
once for each file rather than passing it all the files at once. Not
very elegant but it worked.

Given that it works I haven’t really done any more development on it.
But an academic itch that I’ve wanted to scratch for some time now has
been turning it into a performance powerhouse; either by replacing the
dynamically generated parser with a custom Ragel lexer (packaged as a
C extension for Ruby) or by doing static grammar analysis like ANTLR
does to identify places where prediction could be used to circumvent
unnecessary backtracking and jump straight to the right alternative.

But I think both of those options would require me to lose the Ruby
DSL and switch to a custom, non-Ruby language for specifying grammars.
One of the things I always liked about the system was that you wrote
your grammars in Ruby. Evidently, you can only do static analysis once
you’ve parsed the grammar, and that would be fiendishly difficult (or
impossible?) to do with a dynamically-constructed one built on the fly
by evaluating the Ruby DSL.

Cheers,
Wincent

arcadio · September 24, 2008, 2:55pm

On Sep 24, 2008, at 7:23 AM, Eric M. wrote:

Wincent, do you happen to have something in Walrus that parses JSON
and
generates ruby (tree of hashs and arrays)? I’ve benchmarked a bunch
of
parser generators out there using JSON and would always like to add
more.

Have you tried the JSON parser one built in to 1.9? It’s a native
extension, but I don’t know how its performance compares with other
JSON parsers.

Dave

arcadio · September 24, 2008, 3:34pm

On Sep 24, 2008, at 7:23 AM, Eric M. wrote:

objects in a
ruby program (like Regexp objects can be - although they parse the
grammar
from a string).

When I wrote GhostWheel, I was convinced the DSL was better too. Then
I tried using it and changed my mind.

The DSL adds a lot of Ruby noise that makes it harder to think about
and understand the language being parsed. It keeps you from focusing
on the right things.

For example, compare this JSON parser using GhostWheel’s DSL:

http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ruby.rb

and this using a grammar:

http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ghost_wheel.rb

I’m not arguing my grammar is perfect either. I think it could get
even more readable. But have a look at how clear rules like
value_space, value_content, value, and value_with_array_sep play out.

I also think most of the advantages you list can be gained with non-
DSL grammars. I mean, we can always dynamically build up a String in
Ruby too and that’s the dumbest thing I could possibly think of.

Anyway, the two don’t have to be exclusive. As you see, GhostWheel
supports both.

James Edward G. II

arcadio · September 24, 2008, 3:39pm

On Wed, Sep 24, 2008 at 7:54 AM, Dave T. [email protected] wrote:

parsers.

Dave

Yes. This is the same as the “json” gem that Florian F. did.
Florian
used Ragel to generate a pure C extension. The Ragel file is about 20X
larger than what my gramar specification is (and most other ruby
parsers)
because all the actions are C. But it is really, really fast. This
parser
is a key benchmark for me. My Grammar generated parser (using
ruby2cext) is
about half the speed of it now. Still 100X -1000X faster than most
other
ruby parsers out there. Dominik just gave me developer priviledges for
ruby2cext and I have ideas to double the speed again.

arcadio · September 24, 2008, 2:24pm

On Wed, Sep 24, 2008 at 3:49 AM, Wincent C. [email protected]
wrote:

I wrote a PEG parser generator a couple of years ago to support my
Walrus object-oriented templating system (inspired by the Cheetah
template engine in Python, and carrying over a lot of its syntax and
directives). Actually, the reason I did this was because I didn’t have
any luck getting ANTLR to do what I wanted it to do.

It seems like everybody writing parser (generators) is coming from
ANTLR. I
count Clifford H., James G., Ryan D., Wincent, and me.

Wincent, do you happen to have something in Walrus that parses JSON and
generates ruby (tree of hashs and arrays)? I’ve benchmarked a bunch of
parser generators out there using JSON and would always like to add
more.

arcadio · September 24, 2008, 3:47pm

On Wed, Sep 24, 2008 at 8:32 AM, James G.
[email protected]wrote:

a

more readable. But have a look at how clear rules like value_space,
value_content, value, and value_with_array_sep play out.

I also think most of the advantages you list can be gained with non-DSL
grammars. I mean, we can always dynamically build up a String in Ruby too
and that’s the dumbest thing I could possibly think of.

Anyway, the two don’t have to be exclusive. As you see, GhostWheel
supports both.

I think my DSL is still fairly clean because you can simply use variable
assignments for “rules” and use the “+” and “|” operators for sequences
and
alternations. Those help a lot. Ideally I would use a space instead of
“+”
to match BNF, but it doesn’t seem that bad.

But, I think I’ll also have converters (PEG, BNF, regex) to give the
non-DSL
approach.

arcadio · September 24, 2008, 8:25pm

On 24 sep, 14:23, Eric M. [email protected] wrote:

On Tue, Sep 23, 2008 at 9:04 PM, Clifford H. [email protected] wrote:

Does Treetop auto-backtrack?

That’s what memoizing is for - it’s the core of how PEG works.

Don’t you mean packrat instead of PEG? I thought PEG was just a format for
describing something to parse, like BNF. And packrat refers to the type of
parser - like LL or LALR. From my understanding packrat is a cousin of LL,
but adds memoizing to achieve linear backtracking performance.

I think your right about the first part, but not about the second
part; Packrat has nothing to do with LL or LALR parsing.

A PEG is a Parsing Expression Grammar. Its distinguishing features
are:

has the usual repetition (zero or more, one or more) operators,
sequence operators, alternative operators, as well as “and predicates”
and “not predicates”
grammars are non-ambiguous because alternatives are specified as a
set of ordered choices
lends itself to recognition by recursive descent, and maps quite
well onto the way human beings perform recognition in their own minds

A Packrat parser is a memoizing recursive-descent parser that
recognizes input based on a PEG. Its distinguishing feature is the
memoization.

At least, that’s my understanding of it.

Cheers,
Wincent

arcadio · September 24, 2008, 8:51pm

On Sep 24, 2008, at 06:32 , James G. wrote:

For example, compare this JSON parser using GhostWheel’s DSL:

http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ruby.rb

and this using a grammar:

http://ghostwheel.rubyforge.org/svn/trunk/test/example/tc_json_ghost_wheel.rb

that’s an awesome example of contrast.

arcadio · September 24, 2008, 8:21pm

On 24 sep, 14:23, Eric M. [email protected] wrote:

Wincent, do you happen to have something in Walrus that parses JSON and
generates ruby (tree of hashs and arrays)? I’ve benchmarked a bunch of
parser generators out there using JSON and would always like to add more.

Nope, sorry. Although my cursory knowledge of JSON syntax leads me to
believe that the JSON grammar would be pretty short and easy to whip
up.

Cheers,
Wincent