Ruby Parser Combinator


#1

Hello all,

this is my first post on the ruby-talk list. I’ve been using ruby for a
while though :slight_smile:

Anyway, just horsing around I tried my hand at writing a ruby parser
combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

(the explanation and link to the gem and source tar are most of the way
down the page (the post with the catapult in it)).

Anyway I’m still a bit of a noob when it comes to fp and parsing stuffs
so if anyone wants to correct me I’d be grateful for the help :slight_smile:

Cheers,
Scott


#2

Scott W. removed_email_address@domain.invalid writes:

(the explanation and link to the gem and source tar are most of the
way down the page (the post with the catapult in it)).

Anyway I’m still a bit of a noob when it comes to fp and parsing
stuffs so if anyone wants to correct me I’d be grateful for the help
:slight_smile:

You may be interested in
http://chneukirchen.org/blog/archive/2005/10/the-rightyear-parsing-library.html


#3

On Jan 20, 2006, at 2:59 AM, Scott W. wrote:

Anyway, just horsing around I tried my hand at writing a ruby
parser combinator. It (along with an explanation) is at

http://twelvestone.com/forum_thread/view/29249

Minor correction your example calculator has a prefix syntax, not an
infix syntax. Otherwise I likes dem der combinators. and cor lets you
do scanner forking, which is always fun. Neat.


#4

You may be interested in
http://chneukirchen.org/blog/archive/2005/10/the-rightyear-parsing-
library.html

Cool! That sure is a lot simpler than mine. Thanks for the link :slight_smile:


#5

On 21/01/2006, at 6:49 AM, Logan C. wrote:

Minor correction your example calculator has a prefix syntax, not an
infix syntax. Otherwise I likes dem der combinators. and cor lets you
do scanner forking, which is always fun. Neat.

ha! thanks for the correction, I have a tendency to do stuff like that.
I’ll update the examples/code :slight_smile:

Cheers


#6

Take a look here:

http://rubyforge.org/projects/grammar/

Nice! It’s a bit different since it seems to be a classic style parser
library and not quite in the same vein but still very cool. I
especially like how you overrode | and + to make expressing production
rules more natural

Hey Christian, I couldn’t find the code for your parser combinator in
that blog entry, could you post a link? I’d love to look at your
implementation because it seems like you’ve done it in a more elegant
way (at least from the examples you gave).

Cheers,
Scott


#7

Scott W. removed_email_address@domain.invalid writes:

Hey Christian, I couldn’t find the code for your parser combinator in
that blog entry, could you post a link? I’d love to look at your
implementation because it seems like you’ve done it in a more elegant
way (at least from the examples you gave).

I attached all I have. It’s not really practical because flow-control
with throw/catch isn’t as fast as I wished… but it could be
interesting nevertheless.


#8

On 1/20/06, Scott W. removed_email_address@domain.invalid wrote:

(the explanation and link to the gem and source tar are most of the way
down the page (the post with the catapult in it)).

Anyway I’m still a bit of a noob when it comes to fp and parsing stuffs
so if anyone wants to correct me I’d be grateful for the help :slight_smile:

Cheers,
Scott

Take a look here:

http://rubyforge.org/projects/grammar/

I think this is the same concept that you are talking about. I just
took it to the next level.

Sorry I haven’t been working on this for a while. I just haven’t had
time with a new job where I’m working too much. I have lots of
updates, but haven’t had time to get back to it.


#9

Cool! Thanks for that, I love reading other people’s code :slight_smile:

I’m really more interested in this as an exercise to get a better
understanding of functional programming techniques by seeing how they
could/can be implemented in Ruby. I love this sort of stuff.


#10

On 1/21/06, Scott W. removed_email_address@domain.invalid wrote:

rules more natural
I think what I did is closer to what your blog talks about than
classic parsers. Basically, you start with leaf “Grammar” objects and
build complex Grammar productions by combining Grammars.

If want to see the very basics of this technique, here is a stripped
down version of it with a calculator example (self contained and
executable):

class SimpleGrammar
def initialize(grammar)
@grammar = grammar
end
def scan(cursor,buffer)
@grammar.scan(cursor,buffer)
end
def |(other); Code.new { |cursor,buffer|
scan(cursor,buffer) || other.scan(cursor,buffer)
} end
def +(other); Code.new { |cursor,buffer|
scan(cursor,buffer) && other.scan(cursor,buffer)
} end
def filter(buf0,&block); Code.new { |cursor,buffer|
buf = buf0.clone
scan(cursor,buf) && buffer.concat(block[buf])
} end
def discard; Code.new { |cursor,buffer|
scan(cursor,[])
} end
class Code < SimpleGrammar
def initialize(&block)
@block = block
end
def scan(cursor,buffer)
@block[cursor,buffer]
end
end
class Recurse < SimpleGrammar
def initialize(&block)
@grammar = block[self]
end
def scan(cursor,buffer)
@grammar.scan(cursor,buffer)
end
end
class Element < SimpleGrammar
def initialize(pattern)
@pattern = pattern
end
def scan(cursor,buffer)
c = cursor.read1after
if @pattern===c
buffer << c
cursor.skip1next
true
end
end
end
# Make methods for our classes that call new for us
constants.each { |klass|
eval("
def #{klass}(*args,&block)
#{klass}.new(*args,&block)
end
def self.#{klass}(*args,&block)
#{klass}.new(*args,&block)
end
")
}
NULL = Code.new { true }
end

class IO
# implement just the methods we need to look like a cursor
def read1after;c=getc;ungetc©;c;end
def skip1next;getc&&true;end
end

class Expression < SimpleGrammar::Recurse
def initialize; super() { |expr|
digit = Element(?0…?9)
int = Recurse { |int| digit+(int|NULL) }
number =
(int + (
Element(?.)+int |
NULL
)).filter("") { |n| [n.to_f] }
primary = Recurse { |primary|
number |
Element(?-).discard + primary + Code { |,b| b[-1]=-b[-1] }
|
Element(?().discard + expr + Element(?)).discard
}
product = Recurse { |product|
primary + (
Element(?*).discard + product + Code { |
,b|
b[-2]*=b[-1];b.pop } |
Element(?/).discard + product + Code { |,b|
b[-2]/=b[-1];b.pop } |
NULL
)
}
sum = Recurse { |sum|
product + (
Element(?+).discard + sum + Code { |
,b|
b[-2]+=b[-1];b.pop } |
Element(?-).discard + sum + Code { |_,b|
b[-2]-=b[-1];b.pop } |
NULL
)
}
} end
end

Expression.new.scan(STDIN,buf=[]) && p(buf[0])

In the grammar package on rubyforge, the API is very similar to above,
but I added a bunch of stuff to improve performance, usability, and
features. Notice in the above, I’m not really tied to using a
character stream (IO). You could be parsing a token stream or
whatever. You just need to implement a subset of the Cursor (another
package of mine) API. This way you can build lexers, parsers,
preprocessors, etc in the same way.


#11

Cool, Thanks for the examples! One of the things I was thinking about
was bringing the example I wrote further inline with Parsec where it
can handle a list of tokens of any sort rather than just an IO stream
as you’ve done with the Cursor library.

My main aim with this was to understand combinators more intimately and
I feel that it’s been quite good for doing that and has certainly given
me a lot of perspective in regards to a Haskell project that I’m
working on.

Cheers,
Scott