Forum: Ruby parsing literals

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
A70b7da5a3a712e800100e61ef8d8917?d=identicon&s=25 ako... (Guest)
on 2005-12-14 03:29
(Received via mailing list)
hello,

i need to write a function that would parse a string literal in another
language. a string literal in this language is:

STRING = "CHAR*"
CHAR = any character except for " and \
  | \"
  | \\
  | \/
  | \u four hexadecimal digits

the \u sequence specifies a character in UTF-16 encoding.

for example: "abc", "", "a\"bc", "a\\b", "a\u12bfc"

below is the code that i wrote. is this Ruby enough? can someone
suggest improvements? a better style?

thanks
konstantin

def parselit(s)
  r = %r{\\"|\\/|\\\\|\\u[\da-f][\da-f][\da-f][\da-f]}i
  s =~ /^"((?:[^"\\]|#{r})*)"$/ && $1.gsub(r) { |x| x =~ /\\u(.*)/ ?
[$1.hex].pack('U*') : x[1..-1] }
end

puts parselit('"\u004e\"a"')
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2005-12-14 12:14
(Received via mailing list)
ako... wrote:
>   | \u four hexadecimal digits
>
> def parselit(s)
>   r = %r{\\"|\\/|\\\\|\\u[\da-f][\da-f][\da-f][\da-f]}i
>   s =~ /^"((?:[^"\\]|#{r})*)"$/ && $1.gsub(r) { |x| x =~ /\\u(.*)/ ?
> [$1.hex].pack('U*') : x[1..-1] }
> end
>
> puts parselit('"\u004e\"a"')

def parselit(s)

  re = %r{
           \\"
        |  \\/
        |  \\\\
        |  \\u [\da-f] {4}
  }xoi

  return nil   if s !~ /^".*"$/

  out = ""

  s[1..-2].scan( /\G (?: ( [^"\\]+ ) | ( #{re} ) )/x ){ |x|
    out <<
      if !x.last
        x.first
      else
        if x.last[0,2] == '\u'
          [x.last[2..-1].hex].pack('U*')
        else
          x.last[1..-1]
        end
      end

  }

  # Fail if whole string didn't match.
  if $~.post_match != ""
    nil
  else
    out
  end


end

puts parselit('"\u004e\"a"')
puts parselit('"\u004e\""a"')
This topic is locked and can not be replied to.