Parsing JSON (#155)

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this quiz
    until
    48 hours have passed from the time on this message.

  2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

  1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

There has been a lot of talk recently about parsing with Ruby. We’re
seeing
some parser generator libraries pop up that make the task that much
easier and
they’ve been stirring up interest.

In honor of that, this week’s Ruby Q. is to write a parser for JSON.

JSON turns out to turns out to be a great little example for writing
parsers for
two reasons. First, it’s pretty easy stuff. You can hand-roll a JSON
parser in
under 100 lines of Ruby. The second advantage is that the data format
is
wonderfully documented:

http://json.org/

Since JSON is just a data format and Ruby supports all of the data
types, I vote
we just use Ruby itself as the abstract syntax tree produced by the
parse.

Feel free to show off your favorite parser generator, if you don’t want
to roll
your own. Anything goes.

Here are a few tests to get you started:

require “test/unit”

class TestJSONParser < Test::Unit::TestCase
def setup
@parser = JSONParser.new
end

def test_keyword_parsing
  assert_equal(true,  @parser.parse("true"))
  assert_equal(false, @parser.parse("false"))
  assert_equal(nil,   @parser.parse("null"))
end

def test_number_parsing
  assert_equal(42,      @parser.parse("42"))
  assert_equal(-13,     @parser.parse("-13"))
  assert_equal(3.1415,  @parser.parse("3.1415"))
  assert_equal(-0.01,   @parser.parse("-0.01"))

  assert_equal(0.2e1,   @parser.parse("0.2e1"))
  assert_equal(0.2e+1,  @parser.parse("0.2e+1"))
  assert_equal(0.2e-1,  @parser.parse("0.2e-1"))
  assert_equal(0.2E1,   @parser.parse("0.2e1"))
end

def test_string_parsing
  assert_equal(String.new,          @parser.parse(%Q{""}))
  assert_equal("JSON",              @parser.parse(%Q{"JSON"}))

  assert_equal( %Q{nested "quotes"},
                @parser.parse('"nested \"quotes\""') )
  assert_equal("\n",                @parser.parse(%Q{"\\n"}))
  assert_equal( "a",
                @parser.parse(%Q{"\\u#{"%04X" % ?a}"}) )
end

def test_array_parsing
  assert_equal(Array.new, @parser.parse(%Q{[]}))
  assert_equal( ["JSON", 3.1415, true],
                @parser.parse(%Q{["JSON", 3.1415, true]}) )
  assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))
end

def test_object_parsing
  assert_equal(Hash.new, @parser.parse(%Q{{}}))
  assert_equal( {"JSON" => 3.1415, "data" => true},
                @parser.parse(%Q{{"JSON": 3.1415, "data": true}}) )
  assert_equal( { "Array"  => [1, 2, 3],
                  "Object" => {"nested" => "objects"} },
                @parser.parse(<<-END_OBJECT) )
  {"Array": [1, 2, 3], "Object": {"nested": "objects"}}
  END_OBJECT
end

def test_parse_errors
  assert_raise(RuntimeError) { @parser.parse("{") }
  assert_raise(RuntimeError) { @parser.parse(%q{{"key": true 

false}}) }

  assert_raise(RuntimeError) { @parser.parse("[") }
  assert_raise(RuntimeError) { @parser.parse("[1,,2]") }

  assert_raise(RuntimeError) { @parser.parse(%Q{"}) }
  assert_raise(RuntimeError) { @parser.parse(%Q{"\\i"}) }

  assert_raise(RuntimeError) { @parser.parse("$1,000") }
  assert_raise(RuntimeError) { @parser.parse("1_000") }
  assert_raise(RuntimeError) { @parser.parse("1K") }

  assert_raise(RuntimeError) { @parser.parse("unknown") }
end

end

On Feb 1, 2008 7:55 AM, Ruby Q. [email protected] wrote:

I definitely want to find time to do this one. What would be nice to
have
is performance benchmark to compare parsers. Maybe just have a little
ruby
script that generates a stream of repeatable random (but valid) JSON.

Eric

On Feb 1, 2008, at 10:09 AM, Eric M. wrote:

under 100 lines of Ruby. The second advantage is that the data
script that generates a stream of repeatable random (but valid) JSON.
Neat idea.

Just FYI though, I’m probably going to focus more on the parsing in
the summary that the speed.

James Edward G. II

On Feb 1, 2008, at 10:23 AM, Trans wrote:

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%.

Conversion is pretty easy and definitely one way to solve this quiz.

James Edward G. II

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%. Then a parse would be as simple as,

data = eval(json)

Or, safety levels withstanding, we could conceive a safe_eval(json).

T.

On Feb 1, 2008 10:12 AM, James G. [email protected] wrote:

JSON
is performance benchmark to compare parsers. Maybe just have a
little ruby
script that generates a stream of repeatable random (but valid) JSON.

Neat idea.

Just FYI though, I’m probably going to focus more on the parsing in
the summary that the speed.

James Edward G. II

OK. Once I figure out exactly what JSON is, I’ll probably make a random
JSON generator.

Hopefully this will make me do another release of my parser package,
which
is long overdue :). At least check-in my local code into CVS.

On Feb 1, 2008, at 9:28 AM, James G. wrote:

Conversion is pretty easy and definitely one way to solve this quiz.

not to mention installing one of the three gem json parsers out
there :wink:

a @ http://codeforpeople.com/

On Feb 1, 2008, at 11:20 AM, ara howard wrote:

On Feb 1, 2008, at 9:28 AM, James G. wrote:

Conversion is pretty easy and definitely one way to solve this quiz.

not to mention installing one of the three gem json parsers out
there :wink:

That’s so cheating Ara. :smiley:

James Edward G. II

On Feb 1, 2008, at 9:09 AM, Eric M. wrote:

Maybe just have a little ruby
script that generates a stream of repeatable random (but valid) JSON.

cfp2:~ > cat a.rb
require ‘rubygems’
require ‘json’

def random_json
case rand
when 0 … 1/3.0
top = Hash.new
add = lambda{|obj| top[obj] = obj}
when 1/3.0 … 2/3.0
top = Array.new
add = lambda{|obj| top.push obj}
when 2/3.0 … 1
top = String.new
add = lambda{|obj| top += obj}
end
10.times{ add[rand.to_s] }
top.to_json
end

puts random_json

cfp2:~ > for i in seq 1 3;do ruby a.rb ;done
"0.3786779826911330.2475380034343990.7052927081471540.2056530009384740.1367079874315110.6433874613518640.5329060341883540.8932613322492760.9233991888762390.561470121133217
"
{“0.758942077040095”:“0.758942077040095”,“0.740998718448961”:“0.740998718448961”,“0.581975309640819”:“0.581975309640819”,“0.471066491788047”:“0.471066491788047”,“0.150752108985123”:“0.150752108985123”,“0.679712508205116”:“0.679712508205116”,“0.265444532310993”:“0.265444532310993”,“0.43229805237576”:“0.43229805237576”,“0.880407977937905”:“0.880407977937905”,“0.91896885679168”:“0.91896885679168”}
["0.140526101058637
","0.647296447390116
","0.419874655921874
","0.67320818546074
","0.847043108967541
","0.479385904117001
","0.378678170026127
",“0.707315391952609”,“0.26064520446906”,“0.460184583302929”]

a @ http://codeforpeople.com/

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%.

I hoped ruby19 would already support this use of colons as in {“key”:
“value”} but unfortunately not.

tho_mica_l wrote:

I hoped ruby19 would already support this use of colons as in {“key”:
“value”} but unfortunately not.

It’s key: value

{key: “value”}
=> {:key => “value”}

HTH,
Sebastian

On Feb 1, 2008, at 5:23 PM, Trans wrote:

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%. Then a parse would be as simple as,

data = eval(json)

Brilliant!

But perhaps the other way around: bridge the JSON syntax discrepencies
to valid Ruby syntax, e.g:

eval( to_ruby( json ) )

Cheers,

PA.

It’s key: value>> {key: “value”}

=> {:key => “value”}

I see. Thanks for pointing this out.

Still some conversion needed for the quiz.

BTW, the ruby19 json library, which Ara mentioned, also has a parse
method which I suppose you, uhm, know of? I think the use of this
should be … well, officially disencouraged in the context of this
quiz maybe. Since Ara has already mentioned this library, I hope this
isn’t news for you.

On Feb 1, 2008, at 1:09 PM, tho_mica_l wrote:

BTW, the ruby19 json library, which Ara mentioned, also has a parse
method which I suppose you, uhm, know of?

I suspect the number of us using 1.9 exclusively is still pretty
small, so I don’t focus too much on it when writing quizzes. I knew
it had a JSON library though, yes.

I think the use of this should be … well, officially disencouraged
in the context of this quiz maybe.

I agree. It’s cheating too. :slight_smile:

James Edward G. II

I suspect the number of us using 1.9 exclusively is still pretty
small, so I don’t focus too much on it when writing quizzes. I knew
it had a JSON library though, yes.

I think the use of this should be … well, officially disencouraged
in the context of this quiz maybe.

I agree. It’s cheating too. :slight_smile:

yeah - obviously cheating. just to clarify though, for people who
might actually want to use json that this is not a 1.9 thing: it’s on
rubyforge:

cfp2:~ > gem list --remote|grep json
fjson (0.1.2, 0.1.1, 0.1.0, 0.0.9, 0.0.8, 0.0.7, 0.0.6, 0.0.5, 0.0.4,
0.0.3, 0.0.2, 0.0.1)
json (1.1.2, 1.1.1, 1.1.0, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0, 0.4.3,
0.4.2, 0.4.1, 0.4.0)
json_pure (1.1.2, 1.1.1, 1.1.0, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0)
Orbjson (0.0.4, 0.0.3, 0.0.2, 0.0.1)
ruby-json (1.1.2, 1.1.1)

and activesupport also includes a parser

regards.

On Feb 1, 2:37 pm, -a [email protected] wrote:

might actually want to use json that this is not a 1.9 thing: it’s on

and activesupport also includes a parser

as does blow.

T.

I do not know why I have the impression that you want make it easier
for James to leave ;).

The idea giving some show on parsing ( never really done AFAIR in a
Ruby Q.) is a nice one!!!

Now a translation that would eval is indeed an interesting idea
especially as I am almost sure that it would be parsing too.
AFAIK Json can be read by Javascript natively (surprisingly) what
about implementing Javascript in Ruby :wink:

Cheers
Robert

On Feb 1, 2008, at 4:33 PM, Robert D. wrote:

I do not know why I have the impression that you want make it easier
for James to leave ;).

doh - didn’t want to give that impression!

a @ http://drawohara.com/

Here is my solution. I do a first pass to tokenize the input and perform
basic syntax checks. Then the expression is fully converted into ruby
syntax
and eval is used to load it into Ruby. It passes each of the test cases,
although some improvements could still be made.

class JSONParser

Parse a given JSON expression

def parse(expr)
# Tokenize the input
tokens = lex(expr)

# Load the expression into ruby
# Takes advantage of the fact ruby syntax is so close to that of 

JSON.
# However, it would be nice to have a safe_eval to prevent against
potential injection attacks
begin
eval(ruby_convert(tokens))
rescue SyntaxError, NameError
raise RuntimeError
end
end

Converts tokens into a single ruby expression

def ruby_convert(tokens)
expr = “”
for token in tokens
token = “=>” if token == “:” # Ruby hash syntax
token = “nil” if token == “null”
expr += token
end
expr
end

Parses the input expression into a series of tokens

Performs some limited forms of conversion where necessary

def lex(expr)
tokens = []
i = -1
while i < expr.size - 1
tok ||= “”
i += 1

  case expr[i].chr
    when '[', ']', '{', '}', ':', ','
      tokens << tok if tok.size > 0
      tokens << expr[i].chr
      tok = ""
    # String processing
    when '"'
      raise "Unexpected quote" if tok.size > 0
      len = 1
      escaped = false
      while (len + i) < expr.size
        break if expr[len + i].chr == '"' and not escaped
        if escaped
          case expr[len + i].chr
            when '"', '/', '\\', 'b', 'f', 'n', 'r', 't', 'u'
            else
              raise "Unable to escape #{expr[len + i].chr}"
            end
        end
        escaped = expr[len + i].chr == "\\"
        len += 1
      end
      raise "No matching endquote for string" if (len + i) > 

expr.size
tokens << convert_unicode(expr.slice(i, len+1))
i += len
# Number processing
when ‘-’, /[0-9]/
len = 0
while (len + i) < expr.size and /[0-9eE±.]/.match(expr[len +
i].chr)!= nil
len += 1
end
num = expr.slice(i, len)

      # Verify syntax of the number using the JSON state machine
      raise "Invalid number #{num}" if

/[-]?([1-9]|(0.))[0-9][eE]?[±]?[0-9]/.match(num) == nil

      tokens << num
      i += len - 1
    # Skip whitespace
    when ' ', '\t'
    else
      tok << expr[i].chr
  end
end
tokens << tok if tok.size > 0
tokens

end

Convert unicode characters from hex (currently only handles ASCII

set)
def convert_unicode(str)
while true
u_idx = str.index(/\u[0-9a-fA-F]{4}/)
break if u_idx == nil

  u_str = str.slice(u_idx, 6)
  str.sub!(u_str, u_str[2..5].hex.chr)
end
str

end
end

Thanks,

Justin

Hey guys

This is my first parser. I used Nathan Sobo’s Treetop parsing library (
http://treetop.rubyforge.org/, gem install treetop):

http://pastie.caboo.se/146906

require ‘treetop’

File.open(“json.treetop”, “w”) {|f| f.write GRAMMAR }

Treetop.load “json”

parser = JsonParser.new

pp parser.parse(STDIN.read).value if $0 == FILE

BEGIN {

GRAMMAR = %q{

grammar Json
rule json
space json_value space { def value; json_value.value; end }
end

rule json_value
string / numeric / keyword / object / array
end

rule string
‘"’ chars:char* ‘"’ {
def value
chars.elements.map {|e| e.value }.join

  end
}

end

rule char
!‘"’ (‘\\’ ( ( [nbfrt"] / ‘\\’ / ‘/’ ) / ‘u’ hex hex hex hex )
/ !‘\\’ .) {

  def value
    if text_value[0..0] == '\\\\'
      case c = text_value[1..1]
      when /[nbfrt]/

        {'n' => "\n", 'b' => "\b", 'f' => "\f", 'r' => "\r", 't' => 

“\t”}[c]
when ‘u’

        [text_value[2,4].to_i(16)].pack("L").gsub(/\0*$/,'')
      else
        c
      end

    else
      text_value
    end
  end
}

end

rule hex
[0-9a-fA-F]
end

rule numeric
exp / float / integer
end

rule exp
(float / integer) (‘e’ / ‘E’) (‘+’ / ‘-’)? integer { def value;
text_value.to_f; end }

end

rule float
integer ‘.’ [0-9]+ { def value; text_value.to_f; end }
end

rule integer
‘-’? (‘0’ / [1-9] [0-9]*) { def value; text_value.to_i; end }
end

rule keyword
(‘true’ / ‘false’ / ‘null’) {
def value

    { 'true' => true, 'false' => false, 'null' => nil }[text_value]
  end
}

end

rule object
‘{’ space pairs:pair* space ‘}’ {
def value

    pairs.elements.map {|p| p.value }.inject({}) {|h,p| h.merge p }
  end
}

end

rule pair
space string space ‘:’ space json_value space (‘,’ &pair / !pair) {
def value
{ string.value => json_value.value }

  end
}

end

rule array

'[' space array_values:array_value* space ']' {
  def value
    array_values.elements.map {|e| e.value }

  end
}

end

rule array_value
space json_value space (‘,’ &array_value / !array_value) {

  def value
    json_value.value
  end
}

end

rule space
[ \t\r\n]*
end

end

}

}

  • steve