Individual char values in a Unicode string

tbobker · September 2, 2006, 6:47am

I’m trying to figure out how to use [] String or jconv or something
to figure out the actual code-point values in a Unicode/UTF-8
string. For example, how can I write f such that

f(‘tÃ¶ä¸’) ==> [ 0x74, 0xf6, 0x4e2d ]

(hex just for clarity of course, I want numbers).

-Tim

tbobker · September 2, 2006, 7:13am

Tim B. wrote:

I’m trying to figure out how to use [] String or jconv or something
to figure out the actual code-point values in a Unicode/UTF-8
string. For example, how can I write f such that

f(‘tÃ¶ä¸’) ==> [ 0x74, 0xf6, 0x4e2d ]

(hex just for clarity of course, I want numbers).

Hex numbers are numbers.

To answer your question, you can extract bytes from a string:

#!/usr/bin/ruby

s = “this is a test”

i = 0
while (i < s.size)
puts s[i] # emits numbers, not characters
i += 1
end

Bu I don’t think Ruby recognizes characters, Unicode or otherwise. So it
may
not be able to interpret a mixture of Unicode and UTF/8 without explicit
code from the programmer.

tbobker · September 2, 2006, 7:43am

Tim B. wrote:

‘tÃ¶ä¸’.unpack(“U*”) => [116, 246, 20013]

Regards,

Dan