Using ruby for generic language parsing (or any language-specific parsing libraries out there?)

Hi,

I’ve been looking at the various parser generator options for Ruby on
and off for a while now, and I was wondering if anyone had any
experience/recommendations on existing solutions.

What I’d ideally like to have is a set of pre-defined parser libraries
that defined an event-driven interface for handling productions. This
would theoretically allow you to simply plug in your own business logic
into a robust, maintained and (hopefully) well-tested parser without
actually having to go though the tedium and pain of defining a bunch of
parsers yourself.

I’d like to be able to have something like this for SQL-92, Java and C,
with potentially other languages in the future. I’m guessing stuff like
this has been developed for many of the IDEs out there that support
these languages, but I’ve never seen something as complete as what I’m
talking about via searching through google.

In the past, I’ve written parsers with Lex/Yacc, and I did do some work
a couple of years ago with rex & racc, but for anything as complex as
the above, I think rex/racc would be a) too time consuming/error prone
(I found it incredibly difficult to debug and diagnose problems and
generate reasonable error messages) and b) surely re-inventing the wheel
for the nth time, where n > 10,000,000,000. :slight_smile:

I looked a bit at rparsec and treetop, but before I go down this path, I
was hoping to tap some of the community’s collective wisdom for pointers
to similar initiatives.

Any information, suggestions or war stories would be most appreciated.

Cheers,

ast

On Tue, Apr 14, 2009 at 01:29:51AM +0900, Andrew S. Townley wrote:

actually having to go though the tedium and pain of defining a bunch of
the above, I think rex/racc would be a) too time consuming/error prone
(I found it incredibly difficult to debug and diagnose problems and
generate reasonable error messages) and b) surely re-inventing the wheel
for the nth time, where n > 10,000,000,000. :slight_smile:

I’ve been using rex and racc for quite a while, and I’m pretty happy
with both of them. I’ve found racc to be quite good in combination with
existing Yacc grammars. I’ve written a pure ruby JavaScript parser[1]
and CSS parser[2][3] using racc. I started on a C parser, but decided
it
wasn’t worth my time because I didn’t want to deal with preprocessor
statements and we already have CAST[4]. Other non-trivial racc usage is
Ryan D.'s ruby parser[5] which is a ruby parser written with racc.

I looked a bit at rparsec and treetop, but before I go down this path, I
was hoping to tap some of the community’s collective wisdom for pointers
to similar initiatives.

Treetop looks neat. Last time I looked at it though, it couldn’t do
error recovery in grammars similar to the error token in Yacc. That may
have changed since I looked at it last.

[1] GitHub - tenderlove/recma: Pure ruby javascript parser and interpreter.
[2] GitHub - sparklemotion/csspool: CSSPool is a CSS SAC parser and by default will output a DOM Level 2 style tree.
[3] http://tinyurl.com/dkn8y4
[4] http://cast.rubyforge.org/
[5] http://rubyforge.org/forum/forum.php?forum_id=29842

On Apr 13, 2009, at 12:17 PM, Aaron P. wrote:

I’ve been using rex and racc for quite a while, and I’m pretty happy
with both of them. I’ve found racc to be quite good in combination
with
existing Yacc grammars. I’ve written a pure ruby JavaScript parser[1]
and CSS parser[2][3] using racc. I started on a C parser, but
decided it wasn’t worth my time because I didn’t want to deal with
preprocessor
statements and we already have CAST[4]. Other non-trivial racc
usage is
Ryan D.'s ruby parser[5] which is a ruby parser written with racc.

I appreciate all of these great example to look over.

Does anyone know of a good racc tutorial though? I’ve seen the very
basic calculator examples, but I’ve had trouble finding better
references.

James Edward G. II

On Tue, 2009-04-14 at 02:17 +0900, Aaron P. wrote:

On Tue, Apr 14, 2009 at 01:29:51AM +0900, Andrew S. Townley wrote:

[what I was looking for snipped]

and CSS parser[2][3] using racc. I started on a C parser, but decided it
wasn’t worth my time because I didn’t want to deal with preprocessor
statements and we already have CAST[4]. Other non-trivial racc usage is
Ryan D.'s ruby parser[5] which is a ruby parser written with racc.

Wow. Thanks for all the examples! I’ll have to poke through your
rex/racc usage to see what I was doing wrong trying to get decent error
handling. I finally just gave up and hand-coded a parser for what I was
trying to do. Of course, I guess I should’ve mentioned that my lex/yacc
days were a looooong time ago, and I’m a bit rusty with hard-core
parsing & compiler writing (even though I used to think it was cool and
even, dare I say it, even kinda fun). That might’ve had something to do
with my difficulties as well. :slight_smile:

The CAST example looks very interesting.

[3] http://tinyurl.com/dkn8y4
[4] http://cast.rubyforge.org/
[5] http://rubyforge.org/forum/forum.php?forum_id=29842

The main things that concern me about them is that I won’t be able to
easily adapt existing grammars for lex/yacc-based parsers (or even
ANTLR). If I do have to go my own way, I don’t want to write more than
I must.

Cheers for the info.

ast