Danish Characters in Ruby Uri

Hi,

There would appear to be a deficiency in the common.rb
URI module of uri when processing non-standard
characters

e.g.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in split': bad URI(is not URI?): http:// fakebase/twiki/bin/view/Main/Østermark (URI::InvalidURIError) from c:/ruby/lib/ruby/1.8/uri/common.rb:481:inparse’
from c:/ruby/lib/ruby/1.8/open-uri.rb:85:in `open’
from testbed.rb:15

I have chcped till I am blue in the face, gsubbed the
string, etc, etc, but the uri parser won’t have some
Danish characters in any shape or form. Anyone have
any idea how I can persuade uri to take them, or some
other viable method of getting a page back with Danish
chars in it, or am I going to have to hack my local
copy of common.rb apart to make this happen?

rgds

Steve C.

Steve C. wrote:

There would appear to be a deficiency in the common.rb
URI module of uri when processing non-standard
characters

[snip]

Hmm… what is your $KCODE set to, and are you
using the jcode lib?

I’m not certain uri honors those, but I would
certainly assume so…

Or maybe it’s an issue where certain chars need
to be escaped…

Sorry, I’m not being very helpful, am I?

Hal

I have a workaround now (which will be doing for the
moment) which involves a Unicode gsub of the string
for DK characters e.g

str.gsub!(/Ø/, ‘&#197’) #Ø
str.gsub!(/Å/, ‘&#197’) #Å

etc…

Fortunately uri seems to be happy with this.

Steve C. wrote:

fakebase/twiki/bin/view/Main/Østermark
chars in it, or am I going to have to hack my local
copy of common.rb apart to make this happen?

Just apply URI.encode first. That will change the URI to
http://fakebase/twiki/bin/view/Main/Østermark (or something
different if you are not using UTF-8), which is valid.

Good luck.

ah cool, thanks very much, that looks a much sweeter
fix :slight_smile:

rgds

Steve

On 07/07/06, Steve C. [email protected] wrote:

ah cool, thanks very much, that looks a much sweeter
fix :slight_smile:

It’s not so much a fix as the right way to do it. ‘Ã?’ isn’t an allowed
character in URLs:

http://www.ietf.org/rfc/rfc1738.txt

‘Thus, only alphanumerics, the special characters "$-_.+!*’(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.’

Paul.

On 07/07/06, Matthew S. [email protected] wrote:

While this doesn’t quite apply to the original question, does Ruby
perhaps need to add support for IDNA?

A couple of relevant libraries already exist, so it should be a
straightforward task.

http://sourceforge.jp/projects/ruexli/
http://idn.rubyforge.org/

Paul.

On Jul 7, 2006, at 9:15, Paul B. wrote:

reserved characters used for their reserved purposes may be used
unencoded within a URL.’

While this doesn’t quite apply to the original question, does Ruby
perhaps need to add support for IDNA?

matthew smillie.

They may not necessarily be ‘legal’ nevertheless they
will turn up from time to time, particularly in
camelcased wiki & twiki links in Denmark, which is why
I need to be able to get them :slight_smile: But you’re right, it
isn’t a fix, it is indeed the correct way of doing it,
but I was so happy to be able to kiss the problem
goodbye that I allowed a little terminological
inexactitude to creep in in my enthusiasm for the
resolution :slight_smile:

rgds

Steve

Hi Hal,

Thanks for your swift reply, don’t feel $KCODE is the
issue but will play with it on the offchance, been
down the escape road though with no success. My
reading of the situation is the regexp parser in
common.rb can’t see the chars and throws the URL out
on a (misguided) safety first basis.

rgds

Steve

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs