[Greg H. [email protected], 2007-02-24 20.00 CET]
=> [342, 202, 254]
In fact, I could have sworn that things used to work this way… Am I
going crazy? The following seems to confirm that the string is indeed
using a UTF-8 representation internally.
I get exactly the same results whether $KCODE is set to ‘NONE’ or ‘u’.
The UNICODE codepoint for the euro sign is 8364. In your string you have
that number encoded as a sequence of bytes [226, 130, 172]. That
known as UTF-8. #unpack decodifies that sequence of bytes and gives you
For analogy, think as if you had the string “\272!\000\000” and did an
#unpack(“I”). The sequence of bytes [186, 33, 0, 0] also represent the
number 8364, but this time encoded in the internal format my computer
#unpack retrieves that number. The fact that UTF-8 is used for encoding
UNICODE codepoints is incidental to this.
To unpack the bytes from a string use #unpack(“C*”).