My ears are burning... ;)

unknown · September 26, 2008, 8:21am

Hi Gang, sorry I seem to have pissed off the ruby gang. Never
could figure out why Ryan was mad at me. Oh well.

Sorry for lack of Ruby support. Nobody has been working on it. I
wish somebody would. Hopefully that would make you less annoyed with
ANTLR.

A couple of nits. ANTLR does do some weird stuff in the lexer, though
I did it on purpose as an expedient. That was pointed out bluntly to
me, which I appreciate. I have it on the to do list.

As for packrat etc… Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize. Lots of cool optimzation will
happen once I finish my next book. Getting there. I worked 7 days a
week for about 4 years to get v3.0 done then the book. then more
code. now another book on DSL design patterns.

As for my attitude, most think I’m quite a nice guy. Even a few
students! I do go off the reservation a few times a year,
though. I have a few buttons that set me off related to demands
people make about my software. I work nonstop on them for free
etc… Anyhoo, not sure how I ran you guys off, but sorry. Well, not
sorry if it’s “heh, dumbass why don’t you have a ruby target yet”
ha!

Hopefully somebody will get the Ruby target working! Pretty cool that
the retargetable codegen works, eh? Got a guy doing emacs Lisp at the
moment, if you can believe it!

Oh, and watch for ANTLRMorph, a super cool rewrite system that lets
you speak in the concrete syntax of the language like

[stat … expr]:
“x+0” -> “”

that means “in expr in a statement, replace x+0 with x”. Lots more
cool things for quick tweaks of code for which you have a grammar.

Regards from SF,
Terence

unknown · September 26, 2008, 2:55pm

[email protected] wrote:

Hi Gang, sorry I seem to have pissed off the ruby gang.

Terrence,

As one of those who has perhaps said some of the harshest things,
I acknowledge that situations can escalate due to misinterpretation,
and that not everything may have been as it appeared.

I freely acknowledge your singular achievement in ANTLR from a
technological perspective, and hope that one day it will represent
the same caliber of work from the usability point of view, creating
no nasty surprises, such as I and many others encountered with the
lexer and in some other areas.

Sorry for lack of Ruby support. Nobody has been working on it. I
wish somebody would. Hopefully that would make you less annoyed with
ANTLR.

There’s certainly no shortage of interest here in building parsers to
execute within Ruby programs, despite Ruby’s sweet DSL capabilities.
There is however definitely an expectation that the POLS will apply,
and ANTLR’s ruby support hasn’t created much love from that angle yet.

A couple of nits. ANTLR does do some weird stuff in the lexer

It turns out that in my case, I simply needed to do almost all my
lexing in the parser; had I realised that things could have been
much easier.

As for packrat etc… Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize.

Did you see my suggestion that rather than trying to find those
places analytically, that it be done using a test workload?
Might that provide a shortcut to a solution?

etc… Anyhoo, not sure how I ran you guys off, but sorry. Well, not
sorry if it’s “heh, dumbass why don’t you have a ruby target yet”

I haven’t seen that being said anywhere. I chose ANTLR initially
because I want to target other languages. The Ruby quirks I was
quite prepared to fix (and I’d begun that)… but rude responses
to my asking about generic oddities stretched my patience too far,
so I found another way to solve my problem.

Hopefully somebody will get the Ruby target working! Pretty cool that
the retargetable codegen works, eh? Got a guy doing emacs Lisp at the
moment, if you can believe it!

Actually, yes it is cool that it works, but most of us here have
an expectation that a templating system will be much nicer and easier
to use than yours. Not that there’s anything really wrong with yours,
but Ruby does excel at templating.

Clifford H…

unknown · September 27, 2008, 7:50pm

Actually, yes it is cool that it works, but most of us here have
an expectation that a templating system will be much nicer and easier
to use than yours. Not that there’s anything really wrong with yours,
but Ruby does excel at templating.

Templating is nice in Ruby. The problem is it’s too powerful in the
sense that it doesn’t prevent model view entanglement. The same
argument goes for any template system that allows full language
expressions. This is the old JSP is bad argument.

The key to retargetable code generation is not that you can do it.
It’s that you can do it without duplicating logic from your code
generation. The real test will come when you can change the logic in a
single place and all targets change. ANTLR has not a single character
of output in its code generationAnd no code generation logic in the
templates; well, that isn’t quite true. sometimes, you have to test
the results of computations made in the model so you will see IF
conditionals in the templates. You will note, however, that they are
testing results and not doing expression evaluation.

The output templates in ANTLR are definitely not for the faint of
heart, but that is the nature of the problem and not something
inherent in ST.

Terence

unknown · September 27, 2008, 8:06am

On Fri, Sep 26, 2008 at 1:19 AM, [email protected] wrote:

As for packrat etc… Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize.

This sounds very good. I’d like to see how you did some of this. My
parser
generator does LL(1) and LL(*) where directed. When you set
“backtrack=true”, does it generate backtracking code only where the
grammar
really needs it (a lookahead token/character could match two
alternatives?)
or for (almost) all alternatives? Now I’m curious to see what kind of
code
ANTLR generates, especially with selective memoization.

As for my attitude, most think I’m quite a nice guy. Even a few

students! I do go off the reservation a few times a year,
though. I have a few buttons that set me off related to demands
people make about my software. I work nonstop on them for free
etc… Anyhoo, not sure how I ran you guys off, but sorry. Well, not
sorry if it’s “heh, dumbass why don’t you have a ruby target yet”
ha!

As an ex-ANTLR guy, I’ve always found you to be nice. I learned quite a
bit
about parsing while using ANTLR that I wanted to go off and design my
own.
I’ve counted a bunch of others that did the same. That says a lot. I
was
much easier to wrap your head around parsing when you saw the recursive
descent LL code that ANTLR generated compared to the tables of *ACC.

Hopefully somebody will get the Ruby target working!

I’d like to see a ruby parser out of ANTLR and benchmark it against my
parser. My current benchmark is JSON->ruby.

Pretty cool that
the retargetable codegen works, eh? Got a guy doing emacs Lisp at the
moment, if you can believe it!

I finally got my parser generator setup for re-targeting. The code
generator is more generically a grammar tree traverser. There are a ton
of
possibilities for a given traverser: generate code in a specific
language,
parse while traversing (no code generation), generate a different type
of
parser (LALR?), generate grammar diagrams, etc.

Thanks for ANTLR!

Eric

unknown · September 27, 2008, 8:05pm

On Sep 26, 10:30 pm, Eric M. [email protected] wrote:

On Fri, Sep 26, 2008 at 1:19 AM, [email protected] wrote:

As for packrat etc… Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize.

This sounds very good. I’d like to see how you did some of this.

There is some really amazing jiggery-pokery in the grammar analyzer.
Sometimes I look at it and say, “holy crap! it actually works!”

My parser
generator does LL(1) and LL(*) where directed.

Just to prevent confusion in the namespace, I assume you mean LL(k)
not LL(). LL(k) does more than a single symbol of lookahead but with
a finite maximum determined at parser generation time. LL(), on the
other hand, can generate cyclic DFA that can look arbitrarily ahead to
distinguish alternatives. As long as the lookahead language is
regular, LL() can handle it. If you have a left common prefix that is
recursive, naturally that means the lookahead language is not regular.
At that point, we must backtrack. That is where we get into predicated
LL(), which knows how to backtrack and incorporate semantic
predicates its own. In fact I implement syntactic predicates,
backtracking, as a special form of semantic predicate. this has
advantages as you’ll see next.

When you set

“backtrack=true”, does it generate backtracking code only where the grammar
really needs it (a lookahead token/character could match two alternatives?)
or for (almost) all alternatives? Now I’m curious to see what kind of code
ANTLR generates, especially with selective memoization.

Yes, ANTLR does the optimal thing. Imagine two alternatives that are
distinguishable with a single symbol of lookahead in 90% of the cases.
But, in 9% of the cases it requires three token lookahead and in 1% of
the cases it requires a full backtrack. ANTLRWill generate a parsing
decision that does exactly that. It will never, even in the same
parsing decision, default to using the maximum strength in all input
sequence cases. Without a lot of fancy pants grammar analysis and DFA
generation, this can’t be done.

Pure packrat parsing is really easy. There is no analysis to be done
all. You just always backtrack. PEGs have a number of disadvantages.
Two of which are that it cannot handle infinite streams like socket
protocols. Also, if you are always backtracking, the error message is
essentially “no, Your input does not match the syntax”. Also, action
execution is extremely difficult in the face of backtracking,
particularly continuously. Consequently, it is my opinion that pure
PEG based parser generators are not viable commercially unless you
don’t care about invalid input, you don’t care about infinite parsing
strings, and you don’t care about action execution (you only need an
AST or parse tree from the parse). For some projects, actually make
sense, but for most commercial applications you’ll need something like
ANTLR’s approach where it tries to avoid backtracking at all cost.

As an ex-ANTLR guy, I’ve always found you to be nice. I learned quite a bit
about parsing while using ANTLR that I wanted to go off and design my own.
I’ve counted a bunch of others that did the same. That says a lot. I was
much easier to wrap your head around parsing when you saw the recursive
descent LL code that ANTLR generated compared to the tables of *ACC.

Hooray! Yeah, I applaud everybody’s attempt to build their own parser
generator. Besides being fun, it helps to push the state-of-the-art.
Bryan Ford’s packrat parsing really helped me improve ANTLR. He and I
are scheduled to write an LL(*) paper sometime this year.

I finally got my parser generator setup for re-targeting. The code
generator is more generically a grammar tree traverser. There are a ton of
possibilities for a given traverser: generate code in a specific language,
parse while traversing (no code generation), generate a different type of
parser (LALR?), generate grammar diagrams, etc.

The the question in terms of retargetability is whether or not you can
change all of your internal data structures without affecting the code
generation very much. Be careful you’re not cutting and pasting
logic of code generation when you make a new target. If so, it’s not
really retargetable.

Thanks for ANTLR!

My pleasure. You know, I just realized it’s been exactly 20 years
since I started building yucc, the predecessor of ANTLR! Wow, one of
these days, I had better get another interest!

Ter

unknown · September 28, 2008, 9:40pm

My parser
LL(*), which knows how to backtrack and incorporate semantic
predicates its own. In fact I implement syntactic predicates,
backtracking, as a special form of semantic predicate. this has
advantages as you’ll see next.

I do mean LL(). I decided up front to make my parser mainly LL(1) for
speed and clarity but give a backdoor to accomplish LL() via backtacking.

Howdy. Well, that’s my point. LL() is not backtracking.
Backtracking is backtracking; it’s done in conjunction with LL() in
ANTLR. I invented LL() (though my friend Sriram suggested the term
LL()). I’m asking that people don’t pollute the name space with
improper usage. LL(k) does regular lookahead with acyclic DFA and
LL(*) does it with cyclic DFA. Backtracking as used in PEGs is a
totally different strategy which effectively allows CFG lookahead
languages using syntactic predicates (something I also pioneered).

Sounds like you’re doing LL(1) + backtracking. Nice but not LL(*) +
backtracking.

Ter

unknown · September 28, 2008, 5:32am

On Sat, Sep 27, 2008 at 1:04 PM, [email protected] wrote:

Sometimes I look at it and say, “holy crap! it actually works!”
recursive, naturally that means the lookahead language is not regular.
At that point, we must backtrack. That is where we get into predicated
LL(*), which knows how to backtrack and incorporate semantic
predicates its own. In fact I implement syntactic predicates,
backtracking, as a special form of semantic predicate. this has
advantages as you’ll see next.

I do mean LL(). I decided up front to make my parser mainly LL(1) for
speed and clarity but give a backdoor to accomplish LL() via
backtacking.

Yes, ANTLR does the optimal thing. Imagine two alternatives that are
distinguishable with a single symbol of lookahead in 90% of the cases.
But, in 9% of the cases it requires three token lookahead and in 1% of
the cases it requires a full backtrack. ANTLRWill generate a parsing
decision that does exactly that. It will never, even in the same
parsing decision, default to using the maximum strength in all input
sequence cases. Without a lot of fancy pants grammar analysis and DFA
generation, this can’t be done.

Very nice…

Pure packrat parsing is really easy. There is no analysis to be done

sense, but for most commercial applications you’ll need something like
ANTLR’s approach where it tries to avoid backtracking at all cost.

I completely agree and came to the exact same conclusions a long time
ago
(after starting down the auto-backtrack route).

Hooray! Yeah, I applaud everybody’s attempt to build their own parser

parser (LALR?), generate grammar diagrams, etc.

The the question in terms of retargetability is whether or not you can
change all of your internal data structures without affecting the code
generation very much. Be careful you’re not cutting and pasting
logic of code generation when you make a new target. If so, it’s not
really retargetable.

Good point. This is something I struggled with for a while. I decided
to
go with a fairly generic layer to hold the grammar tree and put most of
the
complexity in the code generation engine. This gives a lot of
flexibility
to the engines, but makes them more complex. But, the engines don’t
need to
handle higher level combinators built on lower level ones. For example,
“one-or-more”, “zero-or-more”, and repeat by a count/range are all built
with a recursion function that detects left and right recursion (and
builds
loops).

But, point taken. I could think about refactoring some between the
layers
to minimize any code duplication. Another option might be helper
methods/functions.

unknown · September 28, 2008, 10:55pm

On Sun, Sep 28, 2008 at 2:39 PM, [email protected] wrote:

At that point, we must backtrack. That is where we get into predicated
Backtracking is backtracking; it’s done in conjunction with LL() in
ANTLR. I invented LL() (though my friend Sriram suggested the term
LL()). I’m asking that people don’t pollute the name space with
improper usage. LL(k) does regular lookahead with acyclic DFA and
LL() does it with cyclic DFA. Backtracking as used in PEGs is a
totally different strategy which effectively allows CFG lookahead
languages using syntactic predicates (something I also pioneered).

Sounds like you’re doing LL(1) + backtracking. Nice but not LL(*) +
backtracking.

I didn’t realize the special meaning of LL(*). I’m really doing
something
very similar to what parsec does:

http://legacy.cs.uu.nl/daan/download/parsec/parsec.html#try

which they call “infinite lookahead” and wikipedia uses the term
LL(infinity) with it. Looks about like backtracking to me though.

It sounds like your LL(*) parsing allows arbitrary, but finite
lookahead.
Kind of like LL(k) where k is automatically determined for each set of
alternatives?

unknown · September 29, 2008, 9:40pm

On Sep 28, 1:53 pm, Eric M. [email protected] wrote:

a finite maximum determined at parser generation time. LL(), on the
I do mean LL(). I decided up front to make my parser mainly LL(1) for
languages using syntactic predicates (something I also pioneered).
LL(infinity) with it. Looks about like backtracking to me though.
Yep, combinators are essentially syntactic predicates. The amount of
lookahead is arbitrary; the key difference between backtracking and
LL() his how it uses the arbitrary lookahead. syntactic predicates
and I presume combinators test context free lookahead languages
whereas LL() (ala * in regular expressions) uses only a regular
lookahead language to predict alternatives. This means no method calls
to rules and so on whereas the backtracker is looking ahead with the
full parser.

It sounds like your LL(*) parsing allows arbitrary, but finite lookahead.
Kind of like LL(k) where k is automatically determined for each set of
alternatives?

LL() is arbitrarily large without a fixed number computed at static
grammar analysis time. The similarity to LL(k) is that both use
regular lookahead languages to predict alternatives. Here is a simple
non-LL(k) decision that is trivially LL():

a : x B
| x C
;
x : A+ ;

Note that we do not have to backtrack to solve this problem. we never
have to call x to predict alternatives of rule ‘a’.

Cool, eh?

Ter

unknown · September 30, 2008, 2:00pm

[email protected] wrote:

PEGs have a number of disadvantages.
Two of which are that it cannot handle infinite streams like socket
protocols.

Such protocols always consist of a continuous stream of packets,
and you can parse those one by one.

Also, if you are always backtracking, the error message is
essentially “no, Your input does not match the syntax”.

That’s not what happens with Treetop, which is pure PEG,
and emits more useful error messages that most hard-core
parsers. It simply records for the furthermost failure
point which terminals would have allowed it to get further.
Cheap and cheerful, but eminently comprehensible to a human.

I should say that the goal of Treetop is different from most
parser generators - it’s to be able to incrementally re-parse
input that’s being edited with insertions and deletions, using
whatever is still valid of the memoization. Nathan’s still
working on subtle bugs in the invalidation mechanism, but I
think it’s an interesting project, no?

Also, action
execution is extremely difficult in the face of backtracking,
particularly continuously.

That’s not as much of an issue as you might think. Many
actions down in the leaves can be side-effect free, so
the effects get lost when that branch is abandoned. Near
the top of the tree there are very often rules that cannot
backtrack, and actions with side-effects are ok there. It
does take a little care and comprehension, but all coding
does.

At present, Treetop offers no mechanism for runtime execution,
unless you build it into the constructors of custom SyntaxNodes.

Consequently, it is my opinion that pure
PEG based parser generators are not viable commercially

I believe I’ve established some counter-arguments, though in
general I agree that avoidance of backtracking is preferable
if you have an effective tool for it. I’d like to find the
time to make ANTLR work properly with Ruby, but I already have
a (more) significant project under way. Maybe Eric will do it?

Clifford H…

unknown · September 30, 2008, 7:20pm

On Sep 30, 4:56 am, Clifford H. [email protected] wrote:

[email protected] wrote:

PEGs have a number of disadvantages.
Two of which are that it cannot handle infinite streams like socket
protocols.

Such protocols always consist of a continuous stream of packets,
and you can parse those one by one.

Hi Clifford.

Well, I wasn’t talking about packets; more protocols and things like
that. Sometimes you have to parse to figure out when to stop reading
all the elements of a single request. Sometimes this is easy (look for
a blank line a particular token), but sometimes you really do have the
parse to figure out when to stop.

That’s not what happens with Treetop, which is pure PEG,
and emits more useful error messages that most hard-core
parsers. It simply records for the furthermost failure
point which terminals would have allowed it to get further.
Cheap and cheerful, but eminently comprehensible to a human.

Oh, right. Grimm/Rats! guy told me about that trick. Nice, though I’m
guessing you’re going to have trouble with recovery. Bolting on the
first error is something most people don’t want to happen.

I should say that the goal of Treetop is different from most
parser generators - it’s to be able to incrementally re-parse
input that’s being edited with insertions and deletions, using
whatever is still valid of the memoization. Nathan’s still
working on subtle bugs in the invalidation mechanism, but I
think it’s an interesting project, no?

You bet. Incremental parsing with recursive descent parsers has to be
done with a mechanism strikingly similar to memoization. That’s the
easy part once you figure out the trick. the part that I have not
figured out is the lexer. Since PEGs are scannerless usually, perhaps
the same mechanism just plain works; let me know how that goes. I
would be interested in the results.

Also, action
execution is extremely difficult in the face of backtracking,
particularly continuously.

That’s not as much of an issue as you might think. Many

Actually, my experience is that people like ANTLR because they can put
actions anywhere. Over 24 years observing parser building, It seems
that adding arbitrary actions to the grammar is pretty common. Many
users are uncomfortable building an intermediate data structure. They
just want to do some syntax directed translation with actions in the
grammar.

stuff like symbol tables and tree construction can usually be done as
undoable actions. Though, not side effect free, right, since they
build data structures.

actions down in the leaves can be side-effect free, so
the effects get lost when that branch is abandoned.

These are typically undoable not side effect free.

More importantly, you cannot use semantic predicates if actions are
executed with side effects during the parse. You can’t test if an
identifier is in the symbol table if you can’t add it.

Near
the top of the tree there are very often rules that cannot
backtrack, and actions with side-effects are ok there. It
does take a little care and comprehension, but all coding
does.

Well, you should either make generic actions illegal as most do or
keep them out when you’re backtracking. It’s just easy to introduce
bugs by tweaking your grammar; this should not be left to the user.

At present, Treetop offers no mechanism for runtime execution,
unless you build it into the constructors of custom SyntaxNodes.

Ah. Ok, that’s good.

Consequently, it is my opinion that pure
PEG based parser generators are not viable commercially

I believe I’ve established some counter-arguments, though in
general I agree that avoidance of backtracking is preferable
if you have an effective tool for it. I’d like to find the
time to make ANTLR work properly with Ruby, but I already have
a (more) significant project under way. Maybe Eric will do it?

As before, I’m very happy to help. Minus the optional parentheses, I
like Ruby

Thanks for your detailed and well thought out counterarguments! I
forgot about the error handling thing for example.
Ter

unknown · October 2, 2008, 3:15am

I got stuck with ANTLR’s Ruby templates due to a lack of documentation
of what data was available to each template. I guess it’s just a parse
tree representation of some sort, but I wasn’t familiar enough with
ANTLR to figure it out. An ability to define a template as “Stub - print
a diagnostic telling me all the things available here” would have helped
a lot.

Well, easiest way to look at that is to examine the argument list of
the template itself. That tells you the name of all attributes
visible; well, it does not include the attributes visible from the
enclosing template. That is the hard part with dynamically typed
languages like Ruby, Python, and ST. You know the name, but can’t
remember the type. In this case, you never knew the type. ha! You’d
need to look at how the closest language implemented the target. Then,
monkey see, monkey do. One of these days, I will write up the
architecture. I note with some satisfaction that the guy building the
Lisp backend is having no trouble.

Ter

unknown · October 1, 2008, 5:09am

[email protected] wrote:

Well, I wasn’t talking about packets; more protocols and things like
that.

Protocols are just a two-directional sequences of packets. They
always require parsing, though sometimes the hardware does some
of that (like Ethernet frames)
.

Oh, right. Grimm/Rats! guy told me about that trick. Nice, though I’m
guessing you’re going to have trouble with recovery. Bolting on the
first error is something most people don’t want to happen.

Agree - but recovery isn’t so hard. Just use a catch-all alternative
that gets activated when nothing else matches, and get it to skip to
a recovery point. The problem is you lose the error context from the
failure though. Perhaps the current “best guess” error list should be
available to the recovery rule… thanks for inspiring that idea, I
might be able to make it work.

You bet. Incremental parsing with recursive descent parsers has to be
done with a mechanism strikingly similar to memoization. That’s the
easy part once you figure out the trick. the part that I have not
figured out is the lexer. Since PEGs are scannerless usually, perhaps
the same mechanism just plain works; let me know how that goes. I
would be interested in the results.

It should just work, though AIUI the invalidation rule are rather
subtle.

Actually, my experience is that people like ANTLR because they can put
actions anywhere. Over 24 years observing parser building, It seems
that adding arbitrary actions to the grammar is pretty common.

Right, I can see that. In my CQL grammar, it’s a sequence of isolated
definitions that may reference earlier definitions. Each definition is
quite difficult to parse, but the sentence structure is flat. I tell
Treetop to parse a definition and not worry about unconsumed input,
rinse and repeat. The only thing I really need semantic predicates for
is to detect when a symbol ref’s an earlier definition, i.e. symbol-
table lookup, and I’m currently doing without that (so some features
of the grammar can’t be implemented). However. that makes it pretty
easy to envisage PEG support for the kind of predicates I need.

Many
users are uncomfortable building an intermediate data structure.

Most “Ruby DSLs” are just intermediate data structures (hashes, arrays
and symbols, perhaps yielded by code blocks, so most of us are pretty
comfortable with it… no need to create custom classes to represent
the intermediate tree as the builtin classes work just fine.

I’d like to find the
time to make ANTLR work properly with Ruby, but I already have
a (more) significant project under way. Maybe Eric will do it?

As before, I’m very happy to help. Minus the optional parentheses, I
like Ruby

I got stuck with ANTLR’s Ruby templates due to a lack of documentation
of what data was available to each template. I guess it’s just a parse
tree representation of some sort, but I wasn’t familiar enough with
ANTLR to figure it out. An ability to define a template as “Stub - print
a diagnostic telling me all the things available here” would have helped
a lot.

Clifford H…