Downloading a non UTF filename


#1

Hi,

I’m using Rails 1.2.2 and my app can upload files with special
characters like
ãçáéíóú.
I’m converting the filename (the physical name and not it’s content!)
from UTF to ISO and it’s working fine. Otherwise all filenames would
have UTF chars (it would work but will confuse the client).

The problem is I can’t download the filename because when I try to get
the name for eg.: fileção.jpg Rails will convert it to UTF: fileçÃ
£o2.jpg and throw and Routing Error or 404 not found.

Any idea how I could convert this URL to ISO so it can find the
filenames?

Thanks a lot.

Peter.


#2

On 3/2/07, Peter removed_email_address@domain.invalid wrote:

Hi,

I’m using Rails 1.2.2 and my app can upload files with special
characters like ãçáéíóú.

I’m converting the filename (the physical name and not it’s content!)
from UTF to ISO and it’s working fine. Otherwise all filenames would
have UTF chars (it would work but will confuse the client).

I presume you mean ISO-latin-1 ?

The problem is I can’t download the filename because when I try to get
the name for eg.: fileção.jpg Rails will convert it to UTF: fileçÃ
£o2.jpg and throw and Routing Error or 404 not found.

Scratching my head here, but aren’t URLs meant to be US-ASCII only?
Non-ascii characters and reserved characters must be escaped.

http://gbiv.com/protocols/uri/rfc/rfc1738.txt

Any idea how I could convert this URL to ISO so it can find the
filenames?

What you need to do is convert it to US-ASCII. Or autogenerate a
unique name.

Most places which support file uploads of arbitrary files will discard
the
original name anyway and save the file under an ID instead, or mangle
the name for various reasons (XSS attacks being one of them, but name
collisions is the primary reason).


#3

Hi Richard,

Thanks a lot for your help.

Yes I meant ISO-latin-1 or ISO-8859-1.

Thanks for your ideas, I thought about maybe having a unique name for
the filenames. It’s just I’m frustrated because of this simple thing I
can’t download a simple file with accentuations!

I tried to convert to US-ASCII using Iconv.new(‘US-ASCII’,
‘UTF-8’).iconv(self.filename) and the filename shows correctly in the
browser but only because the page is in UTF8 and then it can’t
download the filename (which I saved/converted it’s filename to ISO)
when I click it and in the logs I see it’s trying to download the
filename with UTF8 characters so it will never find it!

Any other suggestions?

Thanks,

Peter.


#4

On 3/7/07, Peter removed_email_address@domain.invalid wrote:

Hi Richard,

Thanks a lot for your help.

Yes I meant ISO-latin-1 or ISO-8859-1.

Thanks for your ideas, I thought about maybe having a unique name for
the filenames. It’s just I’m frustrated because of this simple thing I
can’t download a simple file with accentuations!

This isn’t a fault of Ruby or anything. Read the RFC. A URL cannot
contain
accented characters, in fact it can only contain a very limited subset
of
the ASCII characters (alphanumeric and some punctuation symbols).

Even though your operating system will let you use accented latin-1
and UTF characters in filenames, those filenames cannot be part
of a URL. Your webserver may be able to interpret escaped characters
and find the filename, but accented characters cannot be present in
the URL to begin with.


#5

Hi Richard, who said it can’t have accented characters? Of course it
can and even domain names now can have accented characters!!!

http://developer.mozilla.org/en/docs/Internationalized_Domain_Names_(IDN)_Support_in_Mozilla_Browsers

I have no problem doing this in any other language, just RoR has lots
of weirdness when using unicode besides the great framework it is.

Tks,