Lightweigth lexer?


#1

Is there lightweight lexer for Ruby? I have seen a few lexer
generators out there, but I want something simple that can be used
without generating the lexer first. I really just need the basic
functionaliy of matching a pattern and associating an action with that
pattern. For example, I was thinking something like:
class MyLexer < Lexer
rule /a/ do
# action for pattern a
end
rule /b/ do
# action for pattern b
end
rule /c/ do
# action for pattern c
end
end

l = MyLexer.new(input_stream)
l.each_token do |token|
   # do something with token
end

I have something similar to the above that I have created. However, I
wanted to check and see what is out there first before I invested too
much time in it. If no one knows of anything else and\or is interested
in what I have described let me know and I will clean it up and
publish it.

–Meador


#2

On May 12, 2006, at 8:35 PM, Meador I. wrote:

     # action for pattern b

I have something similar to the above that I have created. However, I
wanted to check and see what is out there first before I invested too
much time in it. If no one knows of anything else and\or is interested
in what I have described let me know and I will clean it up and
publish it.

–Meador

StringScanner is close if you combine it with ifs.


#3

If you look closely at what traditional tools like lex and flex do, you
could almost take the input to lex and execute it directly in Ruby! It’s
not
much more than a case statement from regular expressions. I’m not sure
what
you mean by “lightweight” but if you mean “fast”: a lot of people who
wrote
language compilers back in the day (I’m one of them) would often write
scanners in hand-tuned C rather than using lex.


#4

If you look closely at what traditional tools like lex and flex do, you
could almost take the input to lex and execute it directly in Ruby!
They do look very similar.

It’s not much more than a case statement from regular expressions.
It is a bit more that a case statement on regular expressions:). You
have to
do a bit of work to manage the input buffer and ensure that the longest
match is made (e.g. greedy) for each token. Also, if you want some of
the
other traditional features of lex and flex you have to add features such
as
start states, putting characters back into the input stream, reject
matches,
etc…

I’m not sure what you mean by “lightweight”
I more or less meant simple and easy to use (maybe I should have just
said
so:)) without the full range of features of a typical lex\flex knockoff.


#5

Are you going to give this a try?
I think I just may. I will be sure to post to ruby-talk if I come up
with
anything.


#6

Might be fun to integrate it with racc.


#7

Well, yes, that’s what I meant by “not much more.” In my experience
(writing
scanners for programming languages) the start states and greedy matching
were the most important features beyond the basic NFA. Unshifting input
seems like more of a job for the parser than the scanner, but different
strokes, I guess. Are you going to give this a try?


#8

Meador I. wrote:

Is there lightweight lexer for Ruby?

There’s a library called LittleLexer.

http://littlelexer.rubyforge.org/

Cheers,
Dave


#9

On Sat, May 13, 2006 at 05:19:45 +0900, Dave B. wrote:

Meador I. wrote:

Is there lightweight lexer for Ruby?

There’s a library called LittleLexer.

http://littlelexer.rubyforge.org/

The fact he uses the word “cute” on his site puts me off!

Cheers,
Phil


#10

Phil J. wrote:

On Sat, May 13, 2006 at 05:19:45 +0900, Dave B. wrote:

Meador I. wrote:

Is there lightweight lexer for Ruby?
There’s a library called LittleLexer.

http://littlelexer.rubyforge.org/

The fact he uses the word “cute” on his site puts me off!

But he’s right!

Cheers,
Dave


#11

On 5/13/06, Meador I. removed_email_address@domain.invalid wrote:

Are you going to give this a try?
I think I just may. I will be sure to post to ruby-talk if I come up with
anything.

There is rex, http://raa.ruby-lang.org/project/rex/ which takes a very
lex-like input and produces a ruby lexer that can be used with racc.

-Scott