Yukihiro M. wrote:
You said Tcl has Unicode support that works well with you. So that I
think treating all of them in UTF-8 is OK for you.
It’s actually not about treating everything in UTF-8, it just unifies
everything in Tcl in a way that you can have all variety of characters
in strings.
Then how can it
determine which should be in the current code page, or in Unicode?
Or using Win32 API ending with W could allow you living in the
Unicode?
Well, currently (just downloaded latest cvs sources) ruby uses ansi
versions of CreateFile and FindFirstFile/FindNextFile APIs, so even if I
say, for example, KCODE to UTF-8 (not sure how you can currently make
ruby work with UTF-8) ansi versions of APIs are still called, and that
means that
- if there are filenames with characters that don’t fall in range of
current codepage, I will receive ‘?’ in place of them when I enumerate
directory contents.
- I receive filenames in current code page, and not in UTF-8
- There is no way for me to open a file with these characters using
standard ruby classes
The same with win32ole extension, I can see a lot of ole_wc2mb/ole_mb2wc
there, which breaks things horribly when interoperating with, for
example, Excel and trying to work with russian/greek/japanese and all
other languages all on the same sheet (after I process the sheet,
modifying all of the cells, it will just strip all languages except
russian from it).
In *nixes you can just change your locale to *.UTF-8 and you’re ok with
that, because everything you receive when enumerating directory is
UTF-8, and File.open will expect UTF-8. Unfortunately, for Windows that
is not possible: MS already provides ‘wide’ versions of APIs for those
who need them, and there is no UTF-8 ANSI codepage you can set as
default (because UTF-8 codepage in Windows is somewhat ‘virtual’, for
conversion purposes only).
In Tcl you have all of your strings in UTF-8, and when Tcl interoperates
with the rest of the world, it converts strings appropriately (for
example, on Win9x there are mostly no ‘wide’ APIs, so it converts
strings to current code page and uses ansi APIs, but on WinNT it
converts it to unicode and uses ‘wide’ APIs). What I was thinking is
maybe a way for setting “current codepage” for ruby on win32 (including
possibility to set it to UTF-8), and so that when ruby works with the
world it would use ‘wide’ APIs when possible, converting to and from
this codepage (so that instead the way it is Tcl when it is hard-coded
to be UTF-8, there would be a possibility to choose), because there are
no other way to do that on Windows by user (user can’t set current
codepage to UTF-8).