Unicode in irb on windows (respectively script/console in in

Hi everyone!

I have a problem with Unicode in irb on Windows. I recognized it when
trying to save an attribute of an ActiveRecord-Model with an umlaut
(for example “ü”) in script/console. If the database connection is
encoded in utf8, everything after the umlaut gets truncated, in the
default encoding I get funny characters back. It doesn’t matter if the
$KCODE is set to UTF8 or NONE, the character number stays the same
(also on plain irb)!

Does anyone has a hint on how to solve this? Of course I could try
things such as Cygwin, but I am trying to find an elegant solution for
Windows-Users, which eventually could merge in the next
InstantRails-release, if Curt agrees.

Thanks a lot,

Michael

On 11/7/06, [email protected] [email protected] wrote:

I have a problem with Unicode in irb on Windows. I recognized it when
trying to save an attribute of an ActiveRecord-Model with an umlaut
(for example “ü”) in script/console. If the database connection is
encoded in utf8, everything after the umlaut gets truncated, in the
default encoding I get funny characters back. It doesn’t matter if the
$KCODE is set to UTF8 or NONE, the character number stays the same
(also on plain irb)!

The windows console – also used by cygwin – doesn’t recognise UTF-8.
(That is, it’s not possible to properly display UTF-8 in cmd.exe, at
least so far as I can tell.)

-austin

On 11/7/06, Austin Z. [email protected] wrote:

least so far as I can tell.)
Ack my bad. I had forgotten: you can specify the UTF-8 codepage
(CP_UTF8) with:

chcp 65001

There are some caveats, of course:

http://blogs.msdn.com/michkap/archive/2006/03/06/544251.aspx

-austin

Austin Z. wrote:

Ack my bad. I had forgotten: you can specify the UTF-8 codepage
(CP_UTF8) with:

chcp 65001

There are some caveats, of course:

http://blogs.msdn.com/michkap/archive/2006/03/06/544251.aspx

Also the good old combo of “mode con codepage select=65001”.

lists pretty much all the numbers you can use. (The pain of navigating
to that on the MSDN website.)

Amusingly enough, none of those are even present anymore on WinXP Pro
x64. For yet more hilarity, the console is by default set to the DOS OEM
codepage of the given locale, instead of the newer ANSI ones that are
ISO extensions, which causes great fun when trying to use software
that’s ever so smart and autodetects my locale as my preferred language
(Postgres, assorted GNU stuff being too clever by half) instead of using
the OS language version.

And “there are some caveats” is an understatement, the UTF-8 support in
the console is a sham - I couldn’t get a trivial C program using
arbitrary combinations of tchar.h, wchar.h, -DUNICODE, cmd.exe, the
Windows console, a Cygwin and an MSYS rxvt to do something as daunting
as input random characters that aren’t shared between Latin1 and Latin2
codepages, store them as multibyte internally, and then write them out
to a text file and to the console successfully without one step
breaking. The fact whole of CMD broke down in tears from changing that
setting is also worth noting - IIRC, had problems doing output
redirection to a file and whatnot (I can’t play around with this without
setting up a virtual machine with a 32bit XP). Basically, the Path Less
Annoying is to only use the console for working in your “native”
codepage, and use a non-console tool for everything else.

end # of rant

David V.

Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:

chcp 65001

Thank you Austin for the nice hint!

The problem is, that as soon as I switch the codepage, irb (and also
script/console) stops working (it doesn’t even start anymore, it just
quits immediately without an error-message).

Michael

On 11/8/06, [email protected] [email protected] wrote:

Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:

chcp 65001

Thank you Austin for the nice hint!

The problem is, that as soon as I switch the codepage, irb (and also
script/console) stops working (it doesn’t even start anymore, it just
quits immediately without an error-message).

That’s one of the caveats mentioned: batch files no longer work.
I don’t know why. However, if you have Ruby installed in C:\Ruby, you
can do:

copy C:\Ruby\bin\irb C:\Ruby\bin\irb.rb
irb.rb

Or:

ruby C:\Ruby\bin\irb

And you’ll get a working irb.

-austin