A proposal for Parsing Ruby in Ruby

kturner · April 26, 2006, 2:58am

R3: A Ruby native parsing framework ported from Perl6 Rules.

Introduction:
R3 is a Ruby native implementation of Perl6’s Rule engine specification.
Perl6 Rules is a good compiler tool set for dynamic languages due to it
heavy use of regular expressions and language native constructs such as
classes, objects, strings, streams etc. R3 is envisioned as a possible
new language feature to Ruby, not merely an add-on library or external
compiler tool set such as lex and yaxx, or Antlr.
Power and Rational of Rules:

Perl6 Rules are the basis for many features in Perl6 and can be the
basis of corresponding features in other dynamic languages. LISP style
macros for dynamic languages can be defined in rules. Dynamically syntax
can be implemented using rules. In other words the ability to change the
syntax of the language on the fly. Support for first class Domain
Specific Languages and corresponding syntax can be implemented using
rules. Rules can be used to parse any type of stream: strings, IO
streams, network protocol streams.
Proposed work:

* Implement sub rules and nested match objects to support parsing.
* Implement sufficient backtracking support to parse Ruby syntax.
* Implement a Ruby Grammar defined using R3 rules syntax.
* Implement a Grammar/Rules DSL approach where a Grammar is a class
  and Rules are methods in the class.
* Implement inline use of rules as an alternative for Ruby Regexs.

Delimitations:

* The features of Perl6 rules are extensive, R3 will initially only
  support a minimum subset in an attempt to parse Ruby syntax.
* R3, initially being a port or translation of Perl6 ideas, will
  utilize much of the Perl6 syntax for rules. This is a benefit, as
  work commences, since much of the syntax come from regular
  expressions, which is widely shared across languages.
* Although not it final potential, R3 will be initially implemented

as

Background:
The Perl6 Regex/Rules engine interests me and I think it is a powerful
way to build parsers. In order to better understand how it works, I
originally tried to help implement rules in Haskell. Recently, I
attempted to port PGE, another Perl6 Rules implementation to Ruby and
later to Perl5. PGE is written in PIR which is parrot assembly(Perl5
data types, with assembly syntax). Getting from labels and jumps to
Perl5 was doable since Perl5 has gotos. Ruby was a little harder. Gotos
had to be emulated using blocks and exceptions. Each of theses attempts
ended in abandonment, but with each try, I became more familiar with the
design and syntax of rules.
My latest attempt(R3) has been to port Pugs::Compiler::Rules to Ruby and
I’m finally getting some traction. The control structure of
Pugs::Compiler::Rules, continuation passing style(CPS), is easily
implemented in Ruby using Procs. As a result, this port has been much
more straight forward than previous attempts.

R3 mirrors the architecture of Pugs::Compiler::Rules.

* Match.rb => Regex/Rules matches returned by rules, which are
  chained together to form rudimentary parse trees.
* RuntimeRule.rb => Runtime regex primitives needed to implement

rules.
* GrammarRule.rb => Grammar of Rules expressed using primitives from
RuntimeRule.rb.
* RuleCompiler.rb => uses the Grammar in GrammarRule.rb to parse
rules into rule parse trees.
* RubyEmitter.rb => Takes a rule parse tree from RuleCompiler and
emits ruby code to parse the grammar specified by rules.

R3 is just now beginning to parse the Perl6 Rule syntax. All of the
modules above except the RubyEmitter.rb module have primitive
functionality. Work on the RubyEmitter has not yet begun. R3 is is
currently hosted in a private SVN repository. See tewk.com/r3.tgz
http://tewk.com/r3.tgz.

Perl6 Rules Reference Implementations

* PGE Parrot Grammar Engine (Written in PIR, uses coroutines for
  continued parsing)

* https://svn.perl.org/parrot/trunk/compilers/pge

Pugs Rules Engine (Written in Haskell)

* http://svn.openfoundry.org/pugs/src/Text/Parser

Pugs::Compiler::Rules (Written in Perl5, generates Perl5 code, uses

continuation passing style to backtrack)

* http://svn.openfoundry.org/pugs/misc/pX/Common/Pugs-Compiler-Rule

My Previous Attempts

* http://svn.openfoundry.org/pugs/misc/pX/tewk

See also:

* http://dev.perl.org/perl6/doc/design/syn/S05.html
* Apocalypse 5: Pattern Matching
* Exegesis 5: Pattern Matching

kturner · April 28, 2006, 2:58pm

Kevin T. [email protected] writes:

The control structure of
Pugs::Compiler::Rules, continuation passing style(CPS), is easily
implemented in Ruby using Procs.

Does that scale given Ruby does not perform tail call optimization?

kturner · April 28, 2006, 7:01pm

On Apr 28, 2006, at 8:54 AM, Christian N. wrote:

Kevin T. [email protected] writes:

The control structure of
Pugs::Compiler::Rules, continuation passing style(CPS), is easily
implemented in Ruby using Procs.

Does that scale given Ruby does not perform tail call optimization?

Maybe not, but on the other hand, ruby does have real continuations.
Maybe he can try doing CPS with Cs instead of Procs.

I personally think this is kind of a silly idea. Perl6 rules are
cool, but they are very very Perl. Regular expressions on steroids,
not very OO. I think a ruby in ruby project would be much better
served by a parser combinator library. The problem is having a
performant one in ruby.

kturner · May 3, 2006, 7:08pm

Logan C. [email protected] writes:

Maybe not, but on the other hand, ruby does have real continuations.
Maybe he can try doing CPS with Cs instead of Procs.

He certainly can try, but in my experience, they scale even less.
However, he probably can reclaim the stack space with exceptions or
catch/throw.

I personally think this is kind of a silly idea. Perl6 rules are
cool, but they are very very Perl. Regular expressions on steroids,
not very OO. I think a ruby in ruby project would be much better
served by a parser combinator library. The problem is having a
performant one in ruby.

I don’t think it’s a silly idea, and parsing per-se is not a very
object-oriented task. Perl6 rules can be implemented using parser
combinators pretty easily (check the Pugs source for details. :-)), so
if that approach is taken, everyone benefits.

I, for one, would welcome having Perl6 rules or a Rubyish feel-alike
as a Ruby library.

kturner · May 3, 2006, 6:58pm

On Apr 28, 2006, at 5:54 AM, Christian N. wrote:

Kevin T. [email protected] writes:

The control structure of
Pugs::Compiler::Rules, continuation passing style(CPS), is easily
implemented in Ruby using Procs.

Does that scale given Ruby does not perform tail call optimization?

I don’t believe I’ve ever heard of anybody using continuations to
implement continuations. Instead they use CPS and callcc falls right
out for you.

–
Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

kturner · May 3, 2006, 7:09pm

On Apr 29, 2006, at 9:56 AM, Christian N. wrote:

I, for one, would welcome having Perl6 rules or a Rubyish feel-alike
as a Ruby library.

Aha, there’s the key point. The OP was proposing making Perl6 style
rules part of the language. That’s more what I drew exception to,
although I did go off on a tangent of sorts, as regards my feelings
for the syntax.

kturner · May 3, 2006, 7:09pm

On 4/28/06, Logan C. [email protected] wrote:

Maybe not, but on the other hand, ruby does have real continuations.
Maybe he can try doing CPS with Cs instead of Procs.

I personally think this is kind of a silly idea. Perl6 rules are
cool, but they are very very Perl. Regular expressions on steroids,
not very OO. I think a ruby in ruby project would be much better
served by a parser combinator library. The problem is having a
performant one in ruby.

I agree. I looked at perl6 rules before starting to write a parser
library in ruby. You can look for Grammar on rubyforge to see what
I’ve done. I started working on making a “Grammar” for parsing ruby
but didn’t get so far. And I’ve been away from ruby for several
months now because of work. My main problem was converting the LR
yacc grammar to LL. Now that somebody has done this for ANTLR (also
LL), it shouldn’t be too hard.

I solved the performance problems by having the grammar objects
generate there own code (flattening their grammar children also) and
eval it when needed (under the covers - no parser generator step
needed from a usability perspective). I don’t think I checked all of
those enhancements into CVS yet Last I checked, it was faster and
more memory efficient than racc.

Since I’ve been busy with work stuff, I wouldn’t mind some help with
this.