Ruby_parser 3.3.0 Released

ruby_parser version 3.3.0 has been released!

ruby_parser (RP) is a ruby parser written in pure ruby (utilizing
racc–which does by default use a C extension). RP’s output is
the same as ParseTree’s output: s-expressions using ruby’s arrays and
base types.

As an example:

def conditional1 arg1
  return 1 if arg1 == 0
  return 0
end

becomes:

s(:defn, :conditional1, s(:args, :arg1),
  s(:if,
    s(:call, s(:lvar, :arg1), :==, s(:lit, 0)),
    s(:return, s(:lit, 1)),
    nil),
  s(:return, s(:lit, 0)))

Tested against 801,039 files from the latest of all rubygems (as of
2013-05):

  • 1.8 parser is at 99.9739% accuracy, 3.651 sigma
  • 1.9 parser is at 99.9940% accuracy, 4.013 sigma
  • 2.0 parser is at 99.9939% accuracy, 4.008 sigma

Changes:

3.3.0 / 2014-01-14

  • Notes:

39 files failed to parse out of ~834k files makes this 99.9953% or
4.07??.

  • 15 minor enhancements:

    • 2.0: Parse kwarg as lvars. (chastell)
    • Added RubyLexer#beginning_of_line?, check(re), end_of_stream?
    • Added RubyLexer#process_token_keyword.
    • Added RubyLexer#scan, #matched, #beginning_of_line? and others to
      decouple from internals.
    • Added lexing of \u### and \u{###}."
    • Added optimizations for simple quoted symbols.
    • Aliased Lexer#src to ss (since that is what it is).
    • Allow for 20 in parser class name.
    • Modified parsers line number calculations for defn nodes.
    • Removed Env#dynamic, #dynamic?, #use, #used?
    • Removed RubyLexer#tern. Introduced and disused during 3.0 alpha.
      (whitequark)
    • Removed unused RubyLexer#warnings.
    • Renamed *_RE consts to just * (IDENT_CHAR, ESC, etc).
    • new_defn now sets arg node line number directly.
    • zero byte is allowed in symbols for 1.9 / 2.0.
  • 11 bug fixes:

    • 2.0: Fixed paren-less kwargs in defn.
    • Don’t bother with regexp encoding options on 1.9+ to avoid warnings.
    • Fix constant re-build on ruby 2.0 + rake 10.
    • Fix lexing of %i with extra whitespace. (flori)
    • Fixed RubyParserStuff#new_body to deal with nonsensical code better
      (begin-empty+else). (snatchev)
    • Fixed bug lexing h[k]=begin … end. Use your space bars people!
    • Fixed env scoping in new lambdas.
    • Fixed handling of single array arg in attrasgn.
    • Fixed test to call RubyLexer#reset between assertions.
    • No longer assigning ivar/cvars to env. Only locals should be in env.
    • Refactored initialize and reset to more properly re-initialize as
      needed.