Found a ruby bug in the URI class, what do I do?

victorp · August 27, 2009, 3:59pm

Hi,

i have to open an uri with open-uri but URI is raising me an error that
an URL is an invalid uri:

p URI.split(“http://xpto.com/index.asp?action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”)
URI::InvalidURIError: bad URI(is not URI?):
http://xpto.com/index.asp?action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:436:in
`split’
from (irb):8
from :0

That’s a strange ID but valid. I even checked the RFC 2396 and {} is not
a reserved character.

Best regards,

VP

victorp · August 27, 2009, 4:32pm

On Thu, Aug 27, 2009 at 4:00 PM, Victor Pereira[email protected]
wrote:

`split’
Â from (irb):8
Â from :0

That’s a strange ID but valid. I even checked the RFC 2396 and {} is not
a reserved character.

Best regards,

Just file a bug at http://redmine.ruby-lang.org/ so it won’t be
forgotten.
If you also supply a patch against the uri lib, you get bonus points

victorp · August 27, 2009, 6:08pm

On Aug 27, 2009, at 10:31 AM, Michael F. wrote:

URI::InvalidURIError: bad URI(is not URI?):
That’s a strange ID but valid. I even checked the RFC 2396 and {}
is not
a reserved character.

But it is part of the ‘unwise’ set defined by RFC 2396 in:
“2.4.3. Excluded US-ASCII Characters
Although they are disallowed within the URI syntax, we include
here a description of those US-ASCII characters that have been
excluded and the reasons for their exclusion.”
http://www.faqs.org/rfcs/rfc2396.html

Just because a URI works in any (or even all) browser(s) doesn’t mean
that it conforms to the standard for URIs.

I think that URI.split is doing the right thing.

URI.escape() might be a workaround for you:

irb> u=“http://xpto.com/index.asp?
action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”
=> “http://xpto.com/index.asp?
action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”
irb> URI.escape(u)
=>
"http://xpto.com/index.asp?action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}
"
irb> u1=URI.parse(URI.escape(u))
=> #<URI::HTTP:0x38cbd8
URL:http://xpto.com/index.asp?action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}

irb> u1.query
=> “action=showproduct&id=%7BD3E21D33-6DFF-4355-9324-AE1395CEB247%7D”
irb> URI.unescape(u1.query)
=> “action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”

-Rob

–
Michael F.
CTO, The Rubyists, LLC
972-996-5199

Rob B. http://agileconsultingllc.com
[email protected]

victorp · August 28, 2009, 5:40pm

Rob,

to be in the ‘unwise’ doesn’t mean
that it not conforms to the standard for URIs.

I’m escaping and it works, but in my option it could be handled by the
lib.

VP

Rob B. wrote:

On Aug 27, 2009, at 10:31 AM, Michael F. wrote:

URI::InvalidURIError: bad URI(is not URI?):
That’s a strange ID but valid. I even checked the RFC 2396 and {}
is not
a reserved character.

But it is part of the ‘unwise’ set defined by RFC 2396 in:
“2.4.3. Excluded US-ASCII Characters
Although they are disallowed within the URI syntax, we include
here a description of those US-ASCII characters that have been
excluded and the reasons for their exclusion.”
RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax (RFC2396)

Just because a URI works in any (or even all) browser(s) doesn’t mean
that it conforms to the standard for URIs.

I think that URI.split is doing the right thing.

URI.escape() might be a workaround for you:

irb> u=“http://xpto.com/index.asp?
action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”
=> “http://xpto.com/index.asp?
action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”
irb> URI.escape(u)
=>
"http://xpto.com/index.asp?action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}
"
irb> u1=URI.parse(URI.escape(u))
=> #<URI::HTTP:0x38cbd8
URL:http://xpto.com/index.asp?action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}

irb> u1.query
=> “action=showproduct&id=%7BD3E21D33-6DFF-4355-9324-AE1395CEB247%7D”
irb> URI.unescape(u1.query)
=> “action=showproduct&id={D3E21D33-6DFF-4355-9324-AE1395CEB247}”

-Rob

–
Michael F.
CTO, The Rubyists, LLC
972-996-5199

Rob B. http://agileconsultingllc.com
[email protected]

victorp · August 28, 2009, 6:12pm

Sorry, Victor, but yes it does.

Quoting from the end of RFC 2396 section 2.4.3:

So that last sentence pretty much nails it. Curly braces “…must be
escaped in order to be properly represented within a URI.”

So you’ll have to continue to escape them yourself. It wouldn’t be
right for the URI class to depart from the standard here.

-Rob

On Aug 28, 2009, at 11:40 AM, Victor Pereira wrote:

Rob B. wrote:

here a description of those US-ASCII characters that have been
irb> u="http://xpto.com/index.asp?

Michael F.
CTO, The Rubyists, LLC
972-996-5199

Rob B. http://agileconsultingllc.com
[email protected]

–
Posted via http://www.ruby-forum.com/.

Rob B. http://agileconsultingllc.com
[email protected]
+1 513-295-4739
Skype: rob.biedenharn