Dir#.glob over accentuated named directories?

over MacOS X the user could have named directories using accentuated
characters like :

“Défaut” (for Default)

when doing a Dir.glob over the parent directory of it i get :

De’faut

is there a way to get the “right” name in UTF-8 ??? instead of ASCII ?

because, after getting those directories i memorize this directory name
to a label “Défaut” and move the directory to a www safe string “defaut”
(no accentuate characters).

7stud – [email protected] wrote:

That is your display device’s
interpretation of the string.

fine, thanks :wink:

in fact this is due to my text editor (TextMate); i did the following
experiment :

make a folder on desktop with an accentuated character e run the script
:

puts “Defaut” # here the e is accentuated as the folder on desktop

Dir.glob( “/Users/yt/Desktop/*” ).each { | f | puts f }

on “RubyMate” (the TextMate “console”) i get the right think but copying
from “RubyMate” to a ruby utf-8 encoded file (still within TextMate) i
get e’ (e ans the accent apart)…

Une Bév
ue wrote:

over MacOS X the user could have named directories using accentuated
characters like :

“D�faut” (for Default)

when doing a Dir.glob over the parent directory of it i get :

is there a way to get the “right” name in UTF-8 ??? instead of ASCII ?

Ruby has the “right” name–it’s just that your display device is unable
to display it. For instance, when I use Dir.glob to read a directory
that contains a file named cafe.txt, where the ‘e’ has an accent, this
is the output:

/TestData/cafe__314__201.txt #underscores added by me

which is the UFT-8 character :

_x_cc_x_81 #underscores added by me

in octal format. You can prove that to yourself by doing this:

puts “cafe__xcc__x81” #remove the underscores

When you say I get:

De’faut

that is completely meaningless. That is your display device’s
interpretation of the string. As for what display device you are using
and what encodings it can display, e.g. ascii, UTF-8, or even what
character your filename actually contains, who knows? The forum
software and/or my browser are interpreting your problematic character
as a black square with a question mark inside it.

7stud – [email protected] wrote:

What is a ‘ruby utf-8 encoded file’?

nothing more than an utf-8 encoded text file…

As an aside, how do you like TextMate? You might try running your
program in Terminal to see if things make more sense.

from Terminal i get :

De##faut ( using ls -al )

running the script from terminal gave me :

Defaut with the right accent for the line “puts “Defaut””

and “/Users/yt/Desktop/De’faut”

the accent apart on the terminal window but pasting it in TextEdit, the
right think…

the reason i was confused abou that ?

Une Bév
ue wrote:

7stud – [email protected] wrote:

That is your display device’s
interpretation of the string.

fine, thanks :wink:

in fact this is due to my text editor (TextMate); i did the following
experiment :

make a folder on desktop with an accentuated character e run the script
:

puts “Defaut” # here the e is accentuated as the folder on desktop

Dir.glob( “/Users/yt/Desktop/*” ).each { | f | puts f }

on “RubyMate” (the TextMate “console”) i get the right think but copying
from “RubyMate” to a ruby utf-8 encoded file (still within TextMate) i
get e’ (e ans the accent apart)…

What is a ‘ruby utf-8 encoded file’?

As an aside, how do you like TextMate? You might try running your
program in Terminal to see if things make more sense.

Une Bév
ue wrote:

7stud – [email protected] wrote:

What is a ‘ruby utf-8 encoded file’?

nothing more than an utf-8 encoded text file…

As an aside, how do you like TextMate? You might try running your
program in Terminal to see if things make more sense.

from Terminal i get :

De##faut ( using ls -al )

-ls- is most likely an ancient program written in C, and it doesn’t know
how to read or output UTF-8 characters. However, if you run your ruby
program in terminal, Terminal is capable of displaying a UTF-8 character
if given a UTF-8 character. It doesn’t matter whether the UTf-8
character is part of a string that is a single work or whether the UTF-8
character is part of a string that is a path.