On Tue, May 11, 2010 at 03:09:47AM +0900, Roger P. wrote:
Is this expected?
Does your shell support UTF-8?
Also shouldn’t the following code fail?
“asdfМикимаує.force_encoding “IBM437”
(it doesn’t)
Why would it fail? You’re just associating (by force) the encoding
IBM437
with some bag of bytes. Because you’re using force_encoding(), whether
or
not those bytes represent valid characters in IBM437 is up to you.
One of the (many arbitrary) rules of ruby 1.9 encodings is that although
there is an external_encoding set on STDIN, there is no external
encoding set on STDOUT.
So when you write a string to STDOUT, by default the sequence of bytes
is transferred as-is to the terminal.
If you want strings to be transcoded on write, try putting this at the
start of your program:
STDOUT.set_encoding Encoding.locale_charmap
or
STDOUT.set_encoding Encoding.default_external
or
STDOUT.set_encoding “IBM437”
Also shouldn’t the following code fail?
“asdfМикимаує.force_encoding “IBM437”
(it doesn’t)
No, it shouldn’t; you’re just tagging the string with a different
encoding. In any case, notice that
returns true. This means that the sequence of bytes in str happens to be
a valid sequence of characters in IBM437 (just not the characters you
originally thought of)
However, the following does give an error, because you’re trying to
transcode each character in the source string to an equivalent character
in IBM437:
“asdfМикимаує.encode “IBM437”
Encoding::UndefinedConversionError: U+041C from UTF-8 to IBM437
If you want to understand how String encodings work in ruby 1.9 - and I
warn you in advance that you probably don’t - you could try reading