Ruby 1.9 still cannot list all files on Vista or XP?


#1

I just tried using Ruby 1.9 and it seemed that it still cannot list all
files in a folder on XP or Vista when the filenames contain Chinese
characters, Japanese characters, or any foreign characters other than
English.

These two methods are used: entries and glob

files = Dir.new(basedir).entries

Dir.chdir(basedir)
files = Dir.glob("*");

both methods show ???.txt when the filename has foreign
characters. Can Ruby 1.9 readily handle this task rather than resorting
to Win32API? Thanks.


#2

On Apr 8, 7:59 am, SpringFlowers AutumnMoon removed_email_address@domain.invalid
wrote:

files = Dir.glob("*");

both methods show ???.txt when the filename has foreign
characters. Can Ruby 1.9 readily handle this task rather than resorting
to Win32API? Thanks.

Posted viahttp://www.ruby-forum.com/.

What is code page of your command prompt? When ???.txt is shown in
the console on Windows it is usually problem of code page settings.


#3

On Apr 7, 2009, at 22:59 , SpringFlowers AutumnMoon wrote:

both methods show ???.txt when the filename has foreign
characters. Can Ruby 1.9 readily handle this task rather than
resorting
to Win32API? Thanks.

show where? on what? what text encodings does it handle? what text
encodings did you set ruby up for?

On OSX:

% cd x
% touch ☃
% ls
☃
% ruby -e ‘p Dir[""]’
["\342\230\203"]
% ruby -KU -e 'p Dir["
"]’
[“☃”]
% ~/.multiruby/install/1.9.1-p0/bin/ruby -e ‘p Dir["*"]’
[“☃”]

You’ve got 2 sides to this equation, ruby’s encodings, and your
environment’s encodings.


#4

On Apr 8, 2009, at 00:58 , Heesob P. wrote:

This is Windows specific issue.
Refer to the OP’s original posting http://www.ruby-forum.com/topic/163681

As far as I know, this issue is not fixed in ruby 1.9.1

then his email doesn’t belong here, it should go to ruby-core@


#5

Ryan D. wrote:

On Apr 8, 2009, at 00:58 , Heesob P. wrote:

This is Windows specific issue.
Refer to the OP’s original posting http://www.ruby-forum.com/topic/163681

As far as I know, this issue is not fixed in ruby 1.9.1

then his email doesn’t belong here, it should go to ruby-core@

then can somebody file it in ruby-core… maybe as a bug or improvement?
for my love of Ruby… i’d like to see it work fine on Windows XP or
Vista… it is the year 2009… and we are a long way into unicode and
i18n issues… if Ruby cannot handle listing of files properly in its
latest version for Windows which is probably the most popular OS…
then… please can it be made to work well?


#6

From: “Heesob P.” removed_email_address@domain.invalid

As far as I know, this issue is not fixed in ruby 1.9.1

Hmm. If I have correctly understood matz in [ruby-core:20110] ,
Unicode path support for windows was supposed to be fixed:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/20110

“In short, if you’re using UTF-8 for your program encoding, you
should not see any problem (if you do, it’s a bug).”

Regards,

Bill


#7

Bill K. wrote:

From: “Heesob P.” removed_email_address@domain.invalid

As far as I know, this issue is not fixed in ruby 1.9.1

Hmm. If I have correctly understood matz in [ruby-core:20110] ,
Unicode path support for windows was supposed to be fixed:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/20110

“In short, if you’re using UTF-8 for your program encoding, you
should not see any problem (if you do, it’s a bug).”

is it by

coding: utf-8

or

encoding: utf-8

? are those for specifying that the current program file is in UTF8 ?


#8

SpringFlowers AutumnMoon wrote:

Bill K. wrote:

From: “Heesob P.” removed_email_address@domain.invalid

As far as I know, this issue is not fixed in ruby 1.9.1

Hmm. If I have correctly understood matz in [ruby-core:20110] ,
Unicode path support for windows was supposed to be fixed:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/20110

“In short, if you’re using UTF-8 for your program encoding, you
should not see any problem (if you do, it’s a bug).”

is it by

coding: utf-8

or

encoding: utf-8

? are those for specifying that the current program file is in UTF8 ?

does someone know how to solve this? to make some file have
international characters, it is really simple: can go to Google News
and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista. thanks.


#9

2009/4/8 Ryan D. removed_email_address@domain.invalid:

On OSX:
[“☃”]

You’ve got 2 sides to this equation, ruby’s encodings, and your
environment’s encodings.

This is Windows specific issue.
Refer to the OP’s original posting
http://www.ruby-forum.com/topic/163681

As far as I know, this issue is not fixed in ruby 1.9.1

Regards,

Park Heesbob


#10

Bill K. wrote:

But anyway… As I understand it, before you paste characters into
your editor, you’ll need to make sure your editor is using UTF-8
encoding for the file you’re editing. And put the #encoding: UTF-8
tag at the top of the file.

ah it is not really about using UTF-8 in my program file… it is about
getting UTF-8 file listing on Vista and XP.


#11

From: “SpringFlowers AutumnMoon” removed_email_address@domain.invalid

and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista. thanks.

Sorry, I haven’t made the time to experiment with ruby1.9 much yet.
(Even though I am interested in this feature.)

Here are a couple threads from ruby-core that show examples of
using the # encoding: UTF-8 tag.

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/23119

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/22784

(Warning: It looks like the ruby-core mailing list archive software
itself
doesn’t handle the encoding, and so the messages are displayed with
bogus remnants of the quoted-printable syntax left over like =3D and
=20. )

But anyway… As I understand it, before you paste characters into
your editor, you’ll need to make sure your editor is using UTF-8
encoding for the file you’re editing. And put the #encoding: UTF-8
tag at the top of the file.

Hope this helps,

Bill


#12

From: “SpringFlowers AutumnMoon” removed_email_address@domain.invalid

ah it is not really about using UTF-8 in my program file… it is about
getting UTF-8 file listing on Vista and XP.

Oh. When you wrote:

to make some file have
international characters, it is really simple: can go to Google News
and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista.

…I misunderstood and thought you meant pasting the characters into
your ruby source file. (I see now you were talking about a filename.)

Well OK - so I built the latest from the ruby 1.9.1 branch in
subversion,
and attempted to have ruby read a directory containing a filename
with chinese characters, and then open and read the contents of the
file…

My script was:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (win32_unicode.rb) ~~~

encoding: UTF-8

files = Dir[“T:/zz/*.txt”]

x = files.first
p x, x.encoding

dat = open(x, “r:UTF-8”) {|f| f.read}

p dat, dat.encoding


The result was:

ruby19 win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt 
(Errno::EINVAL)
        from win32_unicode.rb:8:in `open'
        from win32_unicode.rb:8:in `<main>'

I also tried with the -U flag and -E UTF-8 flag:

ruby19 -E UTF-8 win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt 
(Errno::EINVAL)
        from win32_unicode.rb:8:in `open'
        from win32_unicode.rb:8:in `<main>'

ruby19 -U win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt 
(Errno::EINVAL)
        from win32_unicode.rb:8:in `open'
        from win32_unicode.rb:8:in `<main>'

ruby19 -v
ruby 1.9.1p0 (2009-03-04) [i386-mswin32_71]


Note, it doesn't bother me that the filename displays as ???????.txt in 
the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.


Regards,

Bill

#13

SpringFlowers AutumnMoon wrote:

Bill K. wrote:

Note, it doesn’t bother me that the filename displays as ???.txt in
the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.

yeah, it looks like the file name is actually stored as ???.txt, not
just when printed out.

Mr. Park Heesbob had a solution but it involved using Win32. If Ruby
1.9 can handle it without using Win32 that’d be great.

actually… the solution that Park posted involved

files = cmd /u /c dir /b.split("\r\000\n\000")
which is to execute a system exe… gee…

the complete solution at:
http://www.ruby-forum.com/topic/163681

(need to use the line above and some Win32API calls)


#14

I wonder if people use Ruby in Japan or France, how is the characters
handled on Win XP or Vista?

For example, to write a script that will look at all files and if the
file name contains a word in Japanese or French, then back it up to
another hard disk. Even this task is not possible?


#15

Bill K. wrote:

Note, it doesn’t bother me that the filename displays as ???.txt in
the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.

yeah, it looks like the file name is actually stored as ???.txt, not
just when printed out.

Mr. Park Heesbob had a solution but it involved using Win32. If Ruby
1.9 can handle it without using Win32 that’d be great.


#16

Bosko I. wrote:

What is code page of your command prompt? When ???.txt is shown in
the console on Windows it is usually problem of code page settings.

i actually used each_byte to dump out the bytes in the file name…
they actually show the ASCII of the question mark… so the string got
back really is question mark, not related to the command prompt.


#17

Hi,

2009/4/14 SpringFlowers AutumnMoon removed_email_address@domain.invalid:

Mr. Park Heesbob, I wonder if you are actually changing Ruby 1.9 so that
it will handle it? Â I see you file a bug on Ruby Core related to this.
thanks.

I want to change it if I could, but it is beyond my ability.

According to
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/17759 ,
It will be handled in Ruby 1.9.2.

Regards,

Park H.


#18

Mr. Park Heesbob, I wonder if you are actually changing Ruby 1.9 so that
it will handle it? I see you file a bug on Ruby Core related to this.
thanks.