Forum: Ruby Ruby 1.9 still cannot list all files on Vista or XP?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-08 07:59
I just tried using Ruby 1.9 and it seemed that it still cannot list all
files in a folder on XP or Vista when the filenames contain Chinese
characters, Japanese characters, or any foreign characters other than
English.

These two methods are used:   entries and glob

files = Dir.new(basedir).entries

Dir.chdir(basedir)
files = Dir.glob("*");

both methods show ????????.txt   when the filename has foreign
characters.  Can Ruby 1.9 readily handle this task rather than resorting
to Win32API?  Thanks.
9770ee3431156709c728f6105dd98f0d?d=identicon&s=25 Bosko Ivanisevic (Guest)
on 2009-04-08 09:36
(Received via mailing list)
On Apr 8, 7:59 am, SpringFlowers AutumnMoon <summercooln...@gmail.com>
wrote:
> files = Dir.glob("*");
>
> both methods show ????????.txt   when the filename has foreign
> characters.  Can Ruby 1.9 readily handle this task rather than resorting
> to Win32API?  Thanks.
> --
> Posted viahttp://www.ruby-forum.com/.

What is code page of your command prompt? When ??????.txt is shown in
the console on Windows it is usually problem of code page settings.
5a837592409354297424994e8d62f722?d=identicon&s=25 Ryan Davis (Guest)
on 2009-04-08 09:36
(Received via mailing list)
On Apr 7, 2009, at 22:59 , SpringFlowers AutumnMoon wrote:

> both methods show ????????.txt   when the filename has foreign
> characters.  Can Ruby 1.9 readily handle this task rather than
> resorting
> to Win32API?  Thanks.

show where? on what? what text encodings does it handle? what text
encodings did you set ruby up for?

On OSX:

% cd x
% touch ☃
% ls
☃
% ruby -e 'p Dir["*"]'
["\342\230\203"]
% ruby -KU -e 'p Dir["*"]'
["☃"]
% ~/.multiruby/install/1.9.1-p0/bin/ruby -e 'p Dir["*"]'
["☃"]

You've got 2 sides to this equation, ruby's encodings, and your
environment's encodings.
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2009-04-08 09:59
(Received via mailing list)
2009/4/8 Ryan Davis <ryand-ruby@zenspider.com>:
> On OSX:
> ["☃"]
>
> You've got 2 sides to this equation, ruby's encodings, and your
> environment's encodings.
>
This is Windows specific issue.
Refer to the OP's original posting
http://www.ruby-forum.com/topic/163681

As far as I know, this issue is not fixed in ruby 1.9.1

Regards,

Park Heesbob
5a837592409354297424994e8d62f722?d=identicon&s=25 Ryan Davis (Guest)
on 2009-04-08 10:18
(Received via mailing list)
On Apr 8, 2009, at 00:58 , Heesob Park wrote:

> This is Windows specific issue.
> Refer to the OP's original posting http://www.ruby-forum.com/topic/163681
>
> As far as I know, this issue is not fixed in ruby 1.9.1

then his email doesn't belong here, it should go to ruby-core@
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-08 11:46
Ryan Davis wrote:
> On Apr 8, 2009, at 00:58 , Heesob Park wrote:
>
>> This is Windows specific issue.
>> Refer to the OP's original posting http://www.ruby-forum.com/topic/163681
>>
>> As far as I know, this issue is not fixed in ruby 1.9.1
>
> then his email doesn't belong here, it should go to ruby-core@

then can somebody file it in ruby-core... maybe as a bug or improvement?
for my love of Ruby... i'd like to see it work fine on Windows XP or
Vista... it is the year 2009... and we are a long way into unicode and
i18n issues...  if Ruby cannot handle listing of files properly in its
latest version for Windows which is probably the most popular OS...
then... please can it be made to work well?
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2009-04-08 12:38
(Received via mailing list)
From: "Heesob Park" <phasis@gmail.com>
>
> As far as I know, this issue is not fixed in ruby 1.9.1

Hmm.  If I have correctly understood matz in [ruby-core:20110] ,
Unicode path support for windows was supposed to be fixed:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

"In short, if you're using UTF-8 for your program encoding, you
should not see any problem  (if you do, it's a bug)."


Regards,

Bill
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-08 12:52
Bill Kelly wrote:
> From: "Heesob Park" <phasis@gmail.com>
>>
>> As far as I know, this issue is not fixed in ruby 1.9.1
>
> Hmm.  If I have correctly understood matz in [ruby-core:20110] ,
> Unicode path support for windows was supposed to be fixed:
>
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
>
> "In short, if you're using UTF-8 for your program encoding, you
> should not see any problem  (if you do, it's a bug)."
>

is it by
# coding: utf-8
or
# encoding: utf-8

?  are those for specifying that the current program file is in UTF8 ?
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-08 21:43
SpringFlowers AutumnMoon wrote:
> Bill Kelly wrote:
>> From: "Heesob Park" <phasis@gmail.com>
>>>
>>> As far as I know, this issue is not fixed in ruby 1.9.1
>>
>> Hmm.  If I have correctly understood matz in [ruby-core:20110] ,
>> Unicode path support for windows was supposed to be fixed:
>>
>> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
>>
>> "In short, if you're using UTF-8 for your program encoding, you
>> should not see any problem  (if you do, it's a bug)."
>>
>
> is it by
> # coding: utf-8
> or
> # encoding: utf-8
>
> ?  are those for specifying that the current program file is in UTF8 ?

does someone know how to solve this?   to make some file have
international characters, it is really simple:  can go to Google News
and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista.   thanks.
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2009-04-09 00:01
(Received via mailing list)
From: "SpringFlowers AutumnMoon" <summercoolness@gmail.com>
> and look at news from China or Taiwan or Hong Kong, and then copy and
> paste the text into a filename on Windows XP or Vista.   thanks.

Sorry, I haven't made the time to experiment with ruby1.9 much yet.
(Even though I am interested in this feature.)

Here are a couple threads from ruby-core that show examples of
using the # encoding: UTF-8 tag.

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...


(Warning: It looks like the ruby-core mailing list archive software
itself
doesn't handle the encoding, and so the messages are displayed with
bogus remnants of the quoted-printable syntax left over like =3D and
 =20. )


But anyway... As I understand it, before you paste characters into
your editor, you'll need to make sure your editor is using UTF-8
encoding for the file you're editing.  And put the #encoding: UTF-8
tag at the top of the file.


Hope this helps,

Bill
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-09 00:44
Bill Kelly wrote:

>
> But anyway... As I understand it, before you paste characters into
> your editor, you'll need to make sure your editor is using UTF-8
> encoding for the file you're editing.  And put the #encoding: UTF-8
> tag at the top of the file.

ah it is not really about using UTF-8 in my program file... it is about
getting UTF-8 file listing on Vista and XP.
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2009-04-09 04:39
(Received via mailing list)
From: "SpringFlowers AutumnMoon" <summercoolness@gmail.com>
>
> ah it is not really about using UTF-8 in my program file... it is about
> getting UTF-8 file listing on Vista and XP.

Oh.  When you wrote:

>    to make some file have
> international characters, it is really simple:  can go to Google News
> and look at news from China or Taiwan or Hong Kong, and then copy and
> paste the text into a filename on Windows XP or Vista.

...I misunderstood and thought you meant pasting the characters into
your ruby source file.  (I see now you were talking about a filename.)


Well OK - so I built the latest from the ruby 1.9.1 branch in
subversion,
and attempted to have ruby read a directory containing a filename
with chinese characters, and then open and read the contents of the
file...

My script was:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (win32_unicode.rb) ~~~
# encoding: UTF-8

files = Dir["T:/zz/*.txt"]

x = files.first
p x, x.encoding

dat = open(x, "r:UTF-8") {|f| f.read}

p dat, dat.encoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The result was:

ruby19 win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt
(Errno::EINVAL)
        from win32_unicode.rb:8:in `open'
        from win32_unicode.rb:8:in `<main>'

I also tried with the -U flag and -E UTF-8 flag:

ruby19 -E UTF-8 win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt
(Errno::EINVAL)
        from win32_unicode.rb:8:in `open'
        from win32_unicode.rb:8:in `<main>'

ruby19 -U win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt
(Errno::EINVAL)
        from win32_unicode.rb:8:in `open'
        from win32_unicode.rb:8:in `<main>'

ruby19 -v
ruby 1.9.1p0 (2009-03-04) [i386-mswin32_71]


Note, it doesn't bother me that the filename displays as ???????.txt in
the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.


Regards,

Bill
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-09 04:45
Bill Kelly wrote:

> Note, it doesn't bother me that the filename displays as ???????.txt in
> the
> command window, but rather the issue that ruby seems unable to open
> a filename it just obtained via Dir[].
>
> So, unless I have bungled my test somehow, it seems likely there is a
> problem.
>
> If so, as Ryan pointed out, we should move this to the ruby-core list.


yeah, it looks like the file name is actually stored as ???????.txt, not
just when printed out.

Mr. Park Heesbob had a solution but it involved using Win32.  If Ruby
1.9 can handle it without using Win32 that'd be great.
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-09 07:47
SpringFlowers AutumnMoon wrote:
> Bill Kelly wrote:
>
>> Note, it doesn't bother me that the filename displays as ???????.txt in
>> the
>> command window, but rather the issue that ruby seems unable to open
>> a filename it just obtained via Dir[].
>>
>> So, unless I have bungled my test somehow, it seems likely there is a
>> problem.
>>
>> If so, as Ryan pointed out, we should move this to the ruby-core list.
>
>
> yeah, it looks like the file name is actually stored as ???????.txt, not
> just when printed out.
>
> Mr. Park Heesbob had a solution but it involved using Win32.  If Ruby
> 1.9 can handle it without using Win32 that'd be great.


actually... the solution that Park posted involved

files = `cmd /u /c dir /b `.split("\r\000\n\000")
which is to execute a system exe...  gee...

the complete solution at:
http://www.ruby-forum.com/topic/163681

(need to use the line above and some Win32API calls)
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-09 23:22
I wonder if people use Ruby in Japan or France, how is the characters
handled on Win XP or Vista?

For example, to write a script that will look at all files and if the
file name contains a word in Japanese or French, then back it up to
another hard disk.  Even this task is not possible?
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-09 23:37
Bosko Ivanisevic wrote:

> What is code page of your command prompt? When ??????.txt is shown in
> the console on Windows it is usually problem of code page settings.

i actually used  each_byte to dump out the bytes in the file name...
they actually show the ASCII of the question mark...   so the string got
back really is question mark, not related to the command prompt.
Eb71c362ddeda80c2668d2575e97bc70?d=identicon&s=25 winter heat (winterheat)
on 2009-04-14 08:07
Mr. Park Heesbob, I wonder if you are actually changing Ruby 1.9 so that
it will handle it?  I see you file a bug on Ruby Core related to this.
thanks.
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2009-04-14 08:33
(Received via mailing list)
Hi,

2009/4/14 SpringFlowers AutumnMoon <summercoolness@gmail.com>:
> Mr. Park Heesbob, I wonder if you are actually changing Ruby 1.9 so that
> it will handle it?  I see you file a bug on Ruby Core related to this.
> thanks.
>
I want to change it if I could, but it is beyond my ability.

According to
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... ,
It will be handled in Ruby 1.9.2.

Regards,

Park Heesob
This topic is locked and can not be replied to.