On Tue, Sep 23, 2008 at 4:20 AM, Clifford H. [email protected]
wrote:
only when needed.
whitespace skipping.
I could use this technique and still maintain my non-backtracking
performance.
The major part of the cost is the construction of so many objects.
If you provide memoize-when-hinted, I think that’d be best.
If you also provided a sweet metagrammar (I find your earlier Grammar
examples make my eyes bleed), I’d be onto it like a shot. I need the
prospect of non-Ruby code generation however…
I’m curious, could you give an example of what makes your eyes bleed? I
would like to improve the usability where possible. The Grammar layer
itself will always be just ruby objects with a ruby DSL. But, I would
like
to provide translators from various formats (BNF, PEG, regex, etc) to
Grammar objects. I’m sure I could even give you a faster replacement
for
Regexp (using ruby2cext). These translators would simply be Grammars
that
build Grammars (or maybe the code, so you could see/edit it).
Also, with Grammar 0.8, I’ve completely separated the parser generation.
Grammar is just a light-weight layer used to hold the description of the
language you are parsing. The real work is in the “engine” that does
parser
generation. There aren’t many bounds as to what an “engine” could do
with a
Grammar. An engine could:
- generate a parser in another target language (engine might allow an
“action” to be simply a code string)
- directly parse while traversing a Grammar
- generate another type of parser - LALR, packrat ???
- build a Graphviz diagram of a Grammar tree
- “inspect” a Grammar (Grammar#inspect isn’t useful because every
Grammar
level mainly just has a lambda)
- translate to another format - regex, BNF, PEG
Right now I’ve implemented only a few engines (some built on each other)
Ruby (generates very flat LL ruby parsers), RubyCall (puts method calls
in
certain places), Ruby0 (parses directly instead of generating a parser),
Ruby2CExt (uses ruby2cext to generate a rubyC parser).
Perhaps an option to “memoize everything” could gather statistics based
on actual parser behaviour with real input, and produce the backtracking
hints automatically? That’d be really sweet, because you wouldn’t need to
be a language wizard to work out where to put them.
Personally, I don’t like the idea of automatic backtracking. Regexp
does
this. The big problem I see with auto-backtracking is that when the
parse
fails, you just get a simple “it failed” response. Or you could
possibly
generate a tree of possible failing locations (and expectations) in the
input. Nothing as simple as “expected this, got that, at line 123”.
Ever
try to debug a large Regexp? If it doesn’t parse your input, it just
returns false and you have nothing to go on. I see the
auto-backtracking
feature of Regexp as the reason it can’t do much more than that.
If/when I
make a Regexp translator, it would generate a Grammar that can backtrack
on
every alternative (except with the (?>…) no backtrack construct) to be
compatible. I guess an “engine” could also backtrack by default.
In my very first incarnation of Grammar (called Syntax on RAA), I did do
auto-backtracking, but found the above problem. I’d rather it be the
exception than the rule.
Does Treetop auto-backtrack? How do you deal with error reporting?
If backtracking isn’t always on, do you memoize only when you might
possibly
backtrack? In Grammar, I’m wondering if it is possible to take to a
memoizing performance hit only when backtracking is allowed.
I thinking I could also make an “engine” for Grammar that did
packrat or even LALR parsing instead of LL parsing.
Nathan was thinking about implementing Treetop using a VM, Truth is,
if the VM had the right hints, it’d be awesome.
I’d also could make a Grammar engine for generating VM bytecode if that
were
possible.
Also, any of you have JSON parser for ANTLR with a Ruby target? JSON is
what I’ve been benchmarking with (because of the ruby quiz) and I’d like
to
compare against ANTLR.
Not I, but I’d be surprised if Google doesn’t find one. It’d be pretty
simple to throw one together anyhow.
I already found it on the antlr site, but not with a ruby target. I
didn’t
want to go relearn antlr (v3 now) and how to get it to generate ruby.
Have you used ANTRWorks BTW? It’s excellent!
Once. I’d like to make something like this for Grammar (at least an
engine
that graphically displays a Grammar).
Eric