On Sep 26, 10:30 pm, Eric M. [email protected] wrote:
On Fri, Sep 26, 2008 at 1:19 AM, [email protected] wrote:
As for packrat etc⌠Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize.
This sounds very good. Iâd like to see how you did some of this.
There is some really amazing jiggery-pokery in the grammar analyzer.
Sometimes I look at it and say, âholy crap! it actually works!â 
My parser
generator does LL(1) and LL(*) where directed.
Just to prevent confusion in the namespace, I assume you mean LL(k)
not LL(). LL(k) does more than a single symbol of lookahead but with
a finite maximum determined at parser generation time. LL(), on the
other hand, can generate cyclic DFA that can look arbitrarily ahead to
distinguish alternatives. As long as the lookahead language is
regular, LL() can handle it. If you have a left common prefix that is
recursive, naturally that means the lookahead language is not regular.
At that point, we must backtrack. That is where we get into predicated
LL(), which knows how to backtrack and incorporate semantic
predicates its own. In fact I implement syntactic predicates,
backtracking, as a special form of semantic predicate. this has
advantages as youâll see next.
When you set
âbacktrack=trueâ, does it generate backtracking code only where the grammar
really needs it (a lookahead token/character could match two alternatives?)
or for (almost) all alternatives? Now Iâm curious to see what kind of code
ANTLR generates, especially with selective memoization.
Yes, ANTLR does the optimal thing. Imagine two alternatives that are
distinguishable with a single symbol of lookahead in 90% of the cases.
But, in 9% of the cases it requires three token lookahead and in 1% of
the cases it requires a full backtrack. ANTLRWill generate a parsing
decision that does exactly that. It will never, even in the same
parsing decision, default to using the maximum strength in all input
sequence cases. Without a lot of fancy pants grammar analysis and DFA
generation, this canât be done.
Pure packrat parsing is really easy. There is no analysis to be done
all. You just always backtrack. PEGs have a number of disadvantages.
Two of which are that it cannot handle infinite streams like socket
protocols. Also, if you are always backtracking, the error message is
essentially âno, Your input does not match the syntaxâ. Also, action
execution is extremely difficult in the face of backtracking,
particularly continuously. Consequently, it is my opinion that pure
PEG based parser generators are not viable commercially unless you
donât care about invalid input, you donât care about infinite parsing
strings, and you donât care about action execution (you only need an
AST or parse tree from the parse). For some projects, actually make
sense, but for most commercial applications youâll need something like
ANTLRâs approach where it tries to avoid backtracking at all cost.
As an ex-ANTLR guy, Iâve always found you to be nice. I learned quite a bit
about parsing while using ANTLR that I wanted to go off and design my own.
Iâve counted a bunch of others that did the same. That says a lot. I was
much easier to wrap your head around parsing when you saw the recursive
descent LL code that ANTLR generated compared to the tables of *ACC.
Hooray! Yeah, I applaud everybodyâs attempt to build their own parser
generator. Besides being fun, it helps to push the state-of-the-art.
Bryan Fordâs packrat parsing really helped me improve ANTLR. He and I
are scheduled to write an LL(*) paper sometime this year.
I finally got my parser generator setup for re-targeting. The code
generator is more generically a grammar tree traverser. There are a ton of
possibilities for a given traverser: generate code in a specific language,
parse while traversing (no code generation), generate a different type of
parser (LALR?), generate grammar diagrams, etc.
The the question in terms of retargetability is whether or not you can
change all of your internal data structures without affecting the code
generation very much.
Be careful youâre not cutting and pasting
logic of code generation when you make a new target. If so, itâs not
really retargetable. 
Thanks for ANTLR!
My pleasure. You know, I just realized itâs been exactly 20 years
since I started building yucc, the predecessor of ANTLR! Wow, one of
these days, I had better get another interest! 
Ter