I have string: ‘\u041f\u0440\u0438\u0432\u0435\u0442!’ and i need to
convert it to string such as ‘привет!’.
I can convert string to ‘041f 0440 0438 0432 0435 0442’, then convert to
decimal and at the end convert each code to character with function:
str.scan(/[0-9]+/).each {|x| result_str << x.to_i}
but i don’t think that it is the most rational way.
On 06/27/2010 05:33 AM, born in USSR wrote:
I have string: ‘\u041f\u0440\u0438\u0432\u0435\u0442!’ and i need to
convert it to string such as ‘привет!’.
I can convert string to ‘041f 0440 0438 0432 0435 0442’, then convert to
decimal and at the end convert each code to character with function:
str.scan(/[0-9]+/).each {|x| result_str<< x.to_i}
but i don’t think that it is the most rational way.
irb(main):001:0> RUBY_VERSION
=> “1.9.1”
irb(main):002:0> puts ‘\u041f\u0440\u0438\u0432\u0435\u0442!’
\u041f\u0440\u0438\u0432\u0435\u0442!
=> nil
irb(main):003:0> puts “\u041f\u0440\u0438\u0432\u0435\u0442!”
Привет!
=> nil
Note the difference in single quotes versus double quotes.
-Justin
On Jun 27, 2010, at 8:33 AM, born in USSR wrote:
I have string: ‘\u041f\u0440\u0438\u0432\u0435\u0442!’ and i need to
convert it to string such as ‘привет!’.
I can convert string to ‘041f 0440 0438 0432 0435 0442’, then convert to
decimal and at the end convert each code to character with function:
If I understand you correctly you can leverage Ruby’s parser to
interpret your string literal:
irb> x = ‘\u041f\u0440\u0438\u0432\u0435\u0442!’
=> “\u041f\u0440\u0438\u0432\u0435\u0442!”
irb> eval(""#{x}"")
=> “Привет!”
Be careful though with eval, make sure your string to be evaluated
doesn’t contain any untrusted code.
Gary W.
On 28 June 2010 07:39, Markus S. [email protected] wrote:
IMHO better than eval 
str = ‘\u041f\u0440\u0438\u0432\u0435\u0442!’
p str.gsub(/\u(\h{4})/) {
$1.to_i(16).chr(‘UTF-8’)
}
What do you say of this?
Well, I was searching something in the line of String#unpack, like
p str.gsub(/\u(\h{4})/) {
[$1.to_i(16)].pack(‘U’)
}
but as we are scanning one by one, it is not interesting and need an
extra array like in JSON (but it is 1.8 compatible).
B.D.
I think the JSON parser is able to decode this unicode escapes
correctly!
The JSON parser will not decode an pure string to you have to wrap the
string into array syntax, and extract after parsing:
mbj@mbj ~ $ irb
irb(main):001:0> require ‘json’
=> true
irb(main):002:0> x = ‘\u041f\u0440\u0438\u0432\u0435\u0442!’
=> “\u041f\u0440\u0438\u0432\u0435\u0442!”
irb(main):003:0> JSON.parse(’["’+x+’"]’)[0]
=> “Привет!”
irb(main):004:0>
IMHO better than eval 
On 28.06.2010 15:24, Benoit D. wrote:
On 28 June 2010 07:39, Markus S. [email protected] wrote:
IMHO better than eval 
str = ‘\u041f\u0440\u0438\u0432\u0435\u0442!’
p str.gsub(/\u(\h{4})/) {
$1.to_i(16).chr(‘UTF-8’)
}
Don’t forget that Unicode Code Points not only cover the BMP and can be
up to 6 hex digits long
[Unicode - Wikipedia].
What do you do if the string contained some escaped backslashes, like in
str = ‘\u041f\u0440’? Does it contain Surrogates?
– Matthias