Hello List
I am relatively new to ruby. I have set myself the problem of writing
a lexical analyzer in ruby to learn some of it’s capabilites. I have
pasted the code for that class and for the calling test harness
below. I beg the lists indulgence in several ways
- has this problem already been solved in a “gem”? I’d love to see
how a more sophisticated rubyist solves it - There is object manipulation with which I’m still not comfortable.
In particular, in the buffer manipulation code in the method analyze
makes me unhappy and I’d be happy to receive instructions in a better
way to do it - Every lanuage has its idioms. I’m not at all sure that I’m using
the best or most “ruby-like” way of doing certain things. Again I
welcome suggestions.
Thanks in advance
Collins
code snippet 1
class Rule
attr_reader :tokID, :re
def initialize(_tokID, _re)
@tokID = _tokID
@re = _re
end
def to_s
self.class.name + ": " + @tokID.to_s + "::= " + @re.to_s
end
end
class Match
attr_reader :rule, :lexeme
def initialize(_r, _s)
@rule = _r
@lexeme = _s
end
def to_s
self.class.name + ": " + @rule.to_s + "\nmatches: " + @lexeme.to_s
end
end
class Lexer
attr_reader :errString
keep a collection of regular expressions and values to return as
token
types
then match text to the longest substring yet seen
def initialize
@rules = Array.new
@buff = String.new
@aFile = nil
@errString = nil
end
def addToken (tokID, re)
if re.class.name == “String”
@rules << Rule.new(tokID, Regexp.new(re))
elsif re.class.name == “Regexp”
@rules << Rule.new(tokID, re)
else
print "unsupported type in addToken: ", re.class.name, “\n”
end
end
def findMatch
maxLexeme, maxMatch = String.new, nil
matchCount, rule2 = 0, nil
@rules.each { |rule|
# loop invariant:
# maxLexeme contains the longest matching prefix of @buff found
so far,
# matchCount contains the number of rules that have matched
maxLexeme,
# maxMatch contains the proposed return value
# rule2 contains a subsequent rule that matches maxLexeme
#
# if rule matches from beginning of @buff AND
# does not match all of @buff AND
# match is longer than previous longest match
# then update maxMatch and maxLexeme and matchCount and rule2
#
# but… we have to avoid matching and keep looking if we make
it to the
# end of @buff with a match active (it could still collect more
# characters) OR if more than one match is still active. If the
end of
# the @buff is also the end of the file then it’s ok to match to
the end
#
# TODO: think about prepending an anchor to the regexp to
eliminate the
# false matches (those not to the beginning of the @buff)
#
md = rule.re.match(@buff)
if !md.nil? && md.pre_match.length == 0
if md[0].length == @buff.length && [email protected]of?
# @buff is potentially ambiguous and there is more file to parse
return nil
elsif md[0].length > maxLexeme.length
# either matching less than whole buffer or at eof AND
# match is longer than any prior match
matchCount, rule2 = 1, nil
maxLexeme, maxMatch = md[0], Match.new(rule,md[0])
elsif md[0].length == maxLexeme.length
# a subsequent match of equal length has been found
matchCount += 1
rule2 = rule
else
# short match… skip
end
else
# either rule did not match @buff OR
rule did not match the start of @buff
end
}
if !maxMatch.nil? && matchCount == 1
#return an unambiguous match
return maxMatch
elsif !maxMatch.nil? && matchCount > 1
print "ambiguous: ", maxLexeme, " : ", maxMatch.rule.to_s, " :
",
rule2.to_s, “\n”
return nil
else
# no match was found
return nil
end
end
def analyze
aMatch = findMatch
if !aMatch.nil?
#remove matched text from buff
oldBuff = String.new(@buff)
newBuff = @buff[aMatch.lexeme.length,@buff.length-1]
if oldBuff != aMatch.lexeme + newBuff
puts oldBuff
puts “compare failure!”
puts aMatch.lexeme + newBuff
end
@buff = newBuff
end
return aMatch
end
def parseFile(_name)
@fileName = _name
@aFile = File.new(@fileName, “r”)
@aFile.each {|line|
# add lines from file to @buff… after each addition yield as
many
# tokens as possible
@buff += line
# comsume all the tokens from @buff that can be found… when no
more
# can be found analyze will return nil… so we’ll get another
line
aMatch = analyze
while !aMatch.nil?
# deliver one <token, lexeme pair> at a time to caller…
by convention a nil tokID is one about which the caller does not
care to hear…
yield aMatch.rule.tokID, aMatch.lexeme if !
aMatch.rule.tokID.nil?
aMatch = analyze
end
}
# @buff contains the earliest unmatched text… if @buff is not
empty when
# we finish with the file, this is an error
if !@buff.empty?
@errString = “error: unmatched text:\n” + @buff[0,[80,
@buff.length].min]
return false
else
@errStrng = “no errors detected\n”
return true
end
end
end
code snippet 2
WhiteSpaceToken = 0
CommentToken = 1
QuotedStringToken = 2
WordToken = 3
require “lexer”
l = Lexer.new
l.addToken(nil, Regexp.new("\s+", Regexp::MULTILINE))
l.addToken(nil, Regexp.new("#.[\n\r]+"))
#l.addToken(QuotedStringToken, Regexp.new(’["][^"]["]’,
Regexp::MULTILINE))
l.addToken(QuotedStringToken,’"*"’)
l.addToken(WordToken,Regexp.new("\w+"))
foo = l.parseFile(“testFile1”) { |token, lexeme|
print token.to_s + “:” + lexeme.to_s + “\n”
}
if foo
print “pass!\n”
else
print "fail: " + l.errString + “\n”
end