Parsing literals

akoSS · December 14, 2005, 3:29am

hello,

i need to write a function that would parse a string literal in another
language. a string literal in this language is:

STRING = “CHAR*”
CHAR = any character except for " and
| "
| \
| /
| \u four hexadecimal digits

the \u sequence specifies a character in UTF-16 encoding.

for example: “abc”, “”, “a"bc”, “a\b”, “a\u12bfc”

below is the code that i wrote. is this Ruby enough? can someone
suggest improvements? a better style?

thanks
konstantin

def parselit(s)
r = %r{\"|\/|\\|\u[\da-f][\da-f][\da-f][\da-f]}i
s =~ /^"((?:[^"\]|#{r}))"$/ && $1.gsub® { |x| x =~ /\u(.)/ ?
[$1.hex].pack(‘U*’) : x[1…-1] }
end

puts parselit(’"\u004e"a"’)

akoSS · December 14, 2005, 12:14pm

ako… wrote:

| \u four hexadecimal digits

def parselit(s)
r = %r{\"|\/|\\|\u[\da-f][\da-f][\da-f][\da-f]}i
s =~ /^"((?:[^"\]|#{r}))"$/ && $1.gsub® { |x| x =~ /\u(.)/ ?
[$1.hex].pack(‘U*’) : x[1…-1] }
end

puts parselit(’"\u004e"a"’)

def parselit(s)

re = %r{
\"
| \/
| \\
| \u [\da-f] {4}
}xoi

return nil if s !~ /^".*"$/

out = “”

s[1…-2].scan( /\G (?: ( [^"\]+ ) | ( #{re} ) )/x ){ |x|
out <<
if !x.last
x.first
else
if x.last[0,2] == ‘\u’
[x.last[2…-1].hex].pack(‘U*’)
else
x.last[1…-1]
end
end

}

Fail if whole string didn’t match.

if $~.post_match != “”
nil
else
out
end

end

puts parselit(’"\u004e"a"’)
puts parselit(’"\u004e"“a”’)