Rex: howto use the lexer class?

unknown · June 22, 2008, 10:30am

Hello.

I’m trying to get rex to parse my inputs. After reading some of the
sample
files provided with rex, I created this simple(?) file:

file:
test.rex------------------------------------------------------------

-- ruby --

##########################################################################

class Lexer
macro
BLANKS \s+
DIGITS \d+
LETTERS [a-zA-Z]+
rule
{BLANKS}
{LETTERS} { puts “ID: ‘@{text}’”; [ :ID, text ] }
{DIGITS} { puts “NUMBER: ‘@{text}’”; [ :NUMBER, text.to_f ] }
.|\n { puts “text: ‘@{text}’”; [ text, text ] }
inner
end

##########################################################################
lexer=Lexer.new
while 1
str=$stdin.gets.strip
puts “str=@{str}”
lexer.scan_str(str)
puts
“--------------------------------------------------------------------------”
end

end of file:
test.rex-----------------------------------------------------

After ‘rex test.rex’, and ‘ruby -Ku test.rex.rb’, I always get errors
like

test.rex.rb:60:in scan_evaluate’: can not match: ‘2’ (Lexer::ScanError)

when I type input.

Can anybody tell me why? Thanks.

unknown · June 24, 2008, 4:01am

Hey Fabrice,

The problem you’re seeing is due to rex’s assumption that you are
generating a parser in tandem with your lexer. The generated method
Lexer::scan_str looks like this:

def scan_str( str )
scan_evaluate str
do_parse
end

While scan_evaluate(str) is the method generated by your token
definitions, do_parse() depends on a racc grammar having been defined
and initialized. The bad news is that the default scan_str() won’t
work for your purposes. The good news is that scan_evaluate() will. If
you examine your generated test.rex.rb file, you’ll see that
scan_evaluate() identifies your tokens and pushes them one by one into
a queue named @rex_tokens. To pull them out of the queue, simply call
next_token(). Here’s a quick replacement for the bottom of your token
definition file:

lexer=Lexer.new
while 1
str=$stdin.gets.strip
puts “str=#{str}”

Here we’re scanning the string for tokens

lexer.scan_evaluate(str)

And then printing each one out to stdout

while token = lexer.next_token
p token
end
puts
“--------------------------------------------------------------------------”
end

The only other minor change I made was to “@{str}”. The ruby string
interpolation escape sequence is actually “#{ }”. Let us know if you
have more questions. Happy lexing!

-Nick