On Feb 1, 2008 7:55 AM, Ruby Q. [email protected] wrote:
In honor of that, this week’s Ruby Q. is to write a parser for JSON.
Here is another solution of mine:
http://pastie.caboo.se/147505
In this one, I just made a fast hand-built recursive-descent/LL(1)
parser.
This is the kind of parser that I’m trying to get my ‘grammar’ package
to
approach (using lots of optimizations). It uses no Regexp or ruby eval
(both of which have compiled C to help speed). And yet, it is the
fastest
pure-ruby JSON parser we’ve seen (see the recursive descent line below):
ch/s author/gem
3214 Justin E. (RE lexer + ruby eval, fixed number parsing)
4054 Eric M. (Grammar0, no lexer, no parser generation)
4078 Eric I (Treetop, unicode broken)
6534 oksteev (Treetop, mismatches in benchmark)
8313 Clifford H. (Treetop, had to remove handling of “/”)
17320 Alexander Stedile (RE)
54586 Eric M. (Grammar, no lexer, v0.5)
137989 Paolo B. (RE)
166041 Thomas Link (RE lexer + ruby eval, ruby 1.9 results)
220289 json
223486 Eric M. (Grammar, no lexer, unreleased)
224823 fjson (uses C extensions)
333368 Thomas Link & Paolo B. (RE + eval, unicode broken)
388670 Eric M. (hand-built recursive descent)
553081 Eric M. (Grammar, no lexer, unreleased, w/ ruby2cext)
1522250 json (w/ C extensions)
JSON hand-built recursive descent/LL(1) parser, by Eric M.
require ‘stringio’
class JSONParser
def parse(s)
@next = (@io=StringIO.new(s)).getc
ws
value(out=[])
ws
raise("EOF expected") if @next
raise(out.inspect) unless out.length==1
out[0]
end
def error(expected, found)
raise("expected #{expected}, found #{found ? ("'"<<found<<?\') :
‘EOF’}")
end
def value(out)
if ?\[.equal?(@next)
# array
@[email protected]
ws
a = []
unless ?\].equal?(@next)
value(a)
ws
until ?\].equal?(@next)
?\,.equal?(@next) ? (@[email protected]) : error("','",
@next)
ws
value(a)
ws
end
end
@next = @io.getc
out << a
elsif ?{.equal?(@next)
# object
@[email protected]
ws
h = {}
unless ?}.equal?(@next)
?".equal?(@next) ? string(kv=[]) : error(“a string”,
@next)
ws
?:.equal?(@next) ? (@[email protected]) : error(“‘:’”,
@next)
ws
value(kv)
ws
h[kv[0]] = kv[1]
until ?}.equal?(@next)
?,.equal?(@next) ? (@[email protected]) : error(“‘,’”,
@next)
ws
?".equal?(@next) ? string(kv.clear) : error(“a
string”,
@next)
ws
?:.equal?(@next) ? (@[email protected]) : error(“‘:’”,
@next)
ws
value(kv)
ws
h[kv[0]] = kv[1]
end
end
@next = @io.getc
out << h
elsif (?a…?z)===(@next)
# boolean
(s=“”)<<@next
@next = @io.getc
while (?a…?z)===(@next)
s<<@next;@[email protected]
end
out << case s
when “true” then true
when “false” then false
when “null” then nil
else error(“‘true’ or ‘false’ or ‘null’”, s)
end
elsif ?".equal?(@next)
string(out)
else
# number
n = “”
(n<<@next;@[email protected]) if ?-.equal?(@next)
?0.equal?(@next) ? (n<<@next;@[email protected]) : digits(n)
(?..equal?(@next) ?
(n<<@next;@[email protected];digits(n);exp(n);true) :
exp(n)) ?
(out << n.to_f) :
(out << n.to_i)
end
end
# Flattening any of the methods below will improve performance
further
def ws
@next = @io.getc while (case @next;when
?\s,?\t,?\n,?\r;true;end)
end
def digits(out)
(?0..?9)===@next ? (out<<@next;@[email protected]) : error("a
digit",
@next)
while (?0…?9)===@next; (out<<@next;@[email protected]); end
true
end
def exp(out)
(case @next;when ?e,?E;true;end) ? (out<<@next;@[email protected]) :
return
(out<<@next;@[email protected]) if (case @next;when ?-,?+;true;end)
digits(out)
end
def string(out)
# we've already verified the starting "
@[email protected]
s = ""
until ?\".equal?(@next)
if ?\\.equal?(@next)
@next = @io.getc
case @next
when ?\",?\\,?\/ then (s<<@next;@[email protected])
when ?b then (s<<?\b;@[email protected])
when ?f then (s<<?\f;@[email protected])
when ?n then (s<<?\n;@[email protected])
when ?r then (s<<?\r;@[email protected])
when ?t then (s<<?\t;@[email protected])
when ?u
@next = @io.getc
u = ""
4.times {
case @next
when ?0..?9, ?a..?f, ?A..?F
u<<@next;@[email protected]
else
error("a hex character", @next)
end
}
s << u.to_i(16)
else
error("a valid escape", @next)
end
else
error("a character", @next) unless @next
s<<@next;@[email protected]
end
end
@next = @io.getc
out << s
end
end