URI parsing barfs on '^'

regularfry · May 6, 2006, 11:57am

Hi there,

I’ve got a URL that appears valid, but that URI.parse breaks on. I’m
not sure what’s correct here. This URI parses, but is wrong:

Symbol lookup from Yahoo Finance

This URI doesn’t parse, and is correct:

FTSE 100 (^FTSE) components – Yahoo Finance

The question is, should ‘^’ need to be quoted? It’s not named as a
reserved character in RFC 3986, which I think is the most recent URI
definition, so shouldn’t URI.parse be able to handle it?

regularfry · May 6, 2006, 1:04pm

On May 06, 2006, at 11:55, Alex Y. wrote:

The question is, should ‘^’ need to be quoted? It’s not named as a
reserved character in RFC 3986, which I think is the most recent URI
definition, so shouldn’t URI.parse be able to handle it?

My guess is that Ruby’s URI parser takes its inspiration from RFC 2396,
which uri/common.rb refers to:

2.4.3. Excluded US-ASCII Characters

unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

Data corresponding to excluded characters must be escaped in order

to
be properly represented within a URI.

http://www.ietf.org/rfc/rfc2396.txt

Cheers

regularfry · May 6, 2006, 1:07pm

PA wrote:

2.4.3. Excluded US-ASCII Characters

unwise = “{” | “}” | “|” | "" | “^” | “[” | “]” | “`”

Data corresponding to excluded characters must be escaped in order to
be properly represented within a URI.

http://www.ietf.org/rfc/rfc2396.txt
Gotcha. Thought it’d be something obvious…