Re: Bytecode Compiler (#100)

This is my first post to ruby-talk and my first stab at a ruby quiz!
Hooray!
I used rparsec because I was too lazy to write a simple parser. And I
used some cute monkey patching that I shouldn’t have.
But it works!

require ‘rparsec/rparsec’

class Fixnum def to_bytes
if self >= -32768 && self <= 32767
a = [0x01]
a << ((self & 0x0000FF00) >> 8) a << ((self &
0x000000FF)) else
a = [0x02] a << ((self &
0xFF000000) >> 24) a << ((self & 0x00FF0000) >> 16)
a << ((self & 0x0000FF00) >> 8) a << ((self &
0x000000FF)) end
end end
class Symbol
def to_bytes
case self
when :+
[0x0a]
when :-
[0x0b]
when :*
[0x0c]
when :**
[0x0d]
when :confused:
[0x0e]
when :%
[0x0f]
end
end

def to_proc
proc { |x| x.send self }
end
end

class Array
alias to_bytes to_a
end

class Compiler
include Parsers
include Functors
def self.compile str
new.parser.parse str
end

def func sym
proc { |x, y| [x,y,sym].map(&:to_bytes).flatten }
end

def parser
ops = OperatorTable.new do |t|
t.infixl(char(?+) >> func(:+), 20)
t.infixl(char(?-) >> func(:-), 20)
t.infixl(char(?) >> func(:), 40)
t.infixl(char(?/) >> func(:/), 40)
t.infixl(char(?%) >> func(:%), 40)
t.infixl(string(’’) >> func(:), 60)
t.prefix(char(?-) >> Neg, 80)
end
expr = nil
term = integer.map(&To_i) | char(’(’) >> lazy{expr} << char(’)’)
delim = whitespace.many_
expr = delim >> Expressions.build(term, ops, delim)
end
end

Aaaargh formatting explosion! Here’s that first bit all nice and
unexploded.

class Fixnum
def to_bytes
if self >= -32768 && self <= 32767
a = [0x01]
a << ((self & 0x0000FF00) >> 8)
a << ((self & 0x000000FF))
else
a = [0x02]
a << ((self & 0xFF000000) >> 24)
a << ((self & 0x00FF0000) >> 16)
a << ((self & 0x0000FF00) >> 8)
a << ((self & 0x000000FF))
end
end
end

class Symbol
def to_bytes
case self
when :+
[0x0a]
when :-
[0x0b]
when :*
[0x0c]
when :**
[0x0d]
when :confused:
[0x0e]
when :%
[0x0f]
end
end

def to_proc
    proc { |x| x.send self }
end

end

Here is my solution. I tried to keep it short and simple.
Unfortunately, it got a little more complicated when I added code to
pass Marcel’s test cases. But now it can handle ‘–+±+±2±2’…

module Compiler

The whole thing wrapped up into one function…

The first half tokenizes into single chars or fixnums: it uses gsub

to convert ‘**’ to ‘^’, binary ‘-’’ to ‘~’, and to remove pointless

'+'s.

Digits and most remaining '-'s are collected and converted using

to_i.

A few pesky unary '-'s are converted to the token ‘n’

The second half uses the shunting yard algorithm to build the RPN,

and does the token->bytecode translation inline.

The Tokens array contains 4 items: the character token, the

bytecode,

and two precedence values - one for a token on the stack,

and another for a token in the stream. The values are only different

if the operator is right-associative - this simplifies the

conditional

for deciding when to pop operators.

The ‘n’ token is expanded to [0,swap,sub]

Tokens =
[[’+’,10,3,3],[’~’,11,3,3],[’*’,12,2,2],[’/’,14,2,2],[’%’,15,2,2],
[’^’,13,1,0],[‘n’,[1,0,0,0xa0,11],-1,-2],[nil,’(’,9,9]]

def self.compile expr
num,tokens,output,stack=’’,[],[],[]

expr.gsub(/(\d|))-/,’\1~’).gsub(/(^|[^\d)])++/,’\1’).gsub("**","^").each_byte{|b|
if /\d|-/ =~ c=b.chr
num+=c
num.gsub!(’–’,’’)
else
tokens << (/\d/=~num ? num.to_i : ‘n’) and num=’’ unless
num.empty?
tokens << c
end
}
tokens << num.to_i unless num.empty?

tokens.each{|token|
case token
when Integer
output += (-215…215)===token ?
[1]+[token].pack(‘n’).unpack(‘C*’) :
[2]+[token].pack(‘N’).unpack(‘C*’)
when ‘(’
stack << token
when ‘)’
output << stack.pop until stack.last == ‘(’
stack.pop
else
tokdef = Tokens.assoc(token)
output << stack.pop while t=stack.last and Tokens.rassoc(t)[2]
<= tokdef[3]
stack << tokdef[1]
end
}
(output+stack.reverse).flatten
end
end

–Adam