Puts bug?

With this file:

coding: UTF-8

puts “asdfМикимаус”

output in windows is:

C:\dev\digitalarchive_trunk>ruby -v go.rb
ruby 1.9.3dev (2010-05-11 trunk 27724) [i386-mingw32]
asdfМикимаус

Is this expected?

Also shouldn’t the following code fail?

“asdfМикимаус”.force_encoding “IBM437”

(it doesn’t)

Thanks!
-rp

On Tue, May 11, 2010 at 03:09:47AM +0900, Roger P. wrote:

Is this expected?

Does your shell support UTF-8?

Also shouldn’t the following code fail?

“asdfМикимаус”.force_encoding “IBM437”

(it doesn’t)

Why would it fail? You’re just associating (by force) the encoding
IBM437
with some bag of bytes. Because you’re using force_encoding(), whether
or
not those bytes represent valid characters in IBM437 is up to you.

Roger P. wrote:

With this file:

coding: UTF-8

puts “asdfМикимаус”

output in windows is:

C:\dev\digitalarchive_trunk>ruby -v go.rb
ruby 1.9.3dev (2010-05-11 trunk 27724) [i386-mingw32]
asdfМикимаус

Is this expected?

Yes.

One of the (many arbitrary) rules of ruby 1.9 encodings is that although
there is an external_encoding set on STDIN, there is no external
encoding set on STDOUT.

So when you write a string to STDOUT, by default the sequence of bytes
is transferred as-is to the terminal.

If you want strings to be transcoded on write, try putting this at the
start of your program:

STDOUT.set_encoding Encoding.locale_charmap

or

STDOUT.set_encoding Encoding.default_external

or

STDOUT.set_encoding “IBM437”

Also shouldn’t the following code fail?

“asdfМикимаус”.force_encoding “IBM437”

(it doesn’t)

No, it shouldn’t; you’re just tagging the string with a different
encoding. In any case, notice that

str = “asdfМикимаус”.force_encoding “IBM437”
str.valid_encoding?

returns true. This means that the sequence of bytes in str happens to be
a valid sequence of characters in IBM437 (just not the characters you
originally thought of)

However, the following does give an error, because you’re trying to
transcode each character in the source string to an equivalent character
in IBM437:

“asdfМикимаус”.encode “IBM437”

Encoding::UndefinedConversionError: U+041C from UTF-8 to IBM437

If you want to understand how String encodings work in ruby 1.9 - and I
warn you in advance that you probably don’t - you could try reading
http://github.com/candlerb/string19/blob/master/string19.rb

Then cry.

Regards,

Brian.

2010/5/11 Roger P. [email protected]:

If you want to understand how String encodings work in ruby 1.9 - and I
warn you in advance that you probably don’t - you could try reading
http://github.com/candlerb/string19/blob/master/string19.rb

Then cry.

Thanks much :slight_smile:

There’s also James’ excellent article:

http://blog.grayproductions.net/articles/miscellaneous_m17n_details

Cheers

robert

If you want to understand how String encodings work in ruby 1.9 - and I
warn you in advance that you probably don’t - you could try reading
http://github.com/candlerb/string19/blob/master/string19.rb

Then cry.

Thanks much :slight_smile:
-rp