Ruby Forum Ruby-core > Astral Plane Characters and Ruby 1.9

Posted by Sam Ruby (rubys)
on 28.12.2007 21:46
(Received via mailing list)
As near as I can tell, Ruby 1.9 handles Unicode Astral Plane characters
correctly (example: 0x10464.chr(Encoding::UTF_8)), but doesn't provide
any means to directly include such in string constants, not even using
surrogates (popular with JSON).

Python, for example, supports this with an uppercase U followed by 8 hex
characters.

   u'\U00010346'

Python also recognizes surrogate characters when decoding utf-8:

   u"\ud800\udf46".encode('utf-8').decode('utf-8')

- Sam Ruby
Posted by David Flanagan (Guest)
on 28.12.2007 22:46
(Received via mailing list)
Sam,

The syntax you want uses curly braces: \u{10464}.  Works for short
codepoints, too: \u{A3}.  And even for space-separate sequences of
codepoints: \u{a3 a5 20ac} => pounds, yen, euro

  David
Posted by Sam Ruby (rubys)
on 28.12.2007 23:28
(Received via mailing list)
David Flanagan wrote:
> Sam,
> 
> The syntax you want uses curly braces: \u{10464}.  Works for short 
> codepoints, too: \u{A3}.  And even for space-separate sequences of 
> codepoints: \u{a3 a5 20ac} => pounds, yen, euro

Thanks!

- Sam Ruby