Hpricot & mechanize fail to parse page after redirect

Hi everyone,
My quest with mechanize/Hpricot continues :slight_smile:
Something extremely strange happened today - some simple working code
broke down, and i can’t figure out why.

I am trying to access a piratebay.org search page, which does a redirect
to a relative url like this:
original link:
http://thepiratebay.org/s/?page=0&orderby=3&q=football+manager+2008&searchTitle=on

redirects to:
/search/football manager 2008/0/3/0

Now, this all worked dandily up till yesterday. the page was redirected,
and mechanize even handled the cookie that was sent back from the site.
But today, i am getting this strange error:
“URI::InvalidURIError: bad URI(is not URI?): /search/football manager
2008/0/3/0”
from Hpricot. Mechanize gives a different one, but i’m sure it’s
inherited from hpricot’s problem with getting the page.

I have tested this on 2 different machines, and they both break down.
Can someone please give it a go and see if they can figure it out?
I would be very very thankful :slight_smile:

Thanks,
Ehud

PS - I am using hpricot 0.6, and the redirected page is parsed correctly
when accessed directly

On Nov 14, 2007, at 2:17 PM, Ehud R. wrote:

from Hpricot. Mechanize gives a different one, but i’m sure it’s
correctly
when accessed directly

If the redirect is via a 302 with a Location: header that is just the:
“/search/football manager 2008/0/3/0”

it’s probably similar to the issue I had using HTTPClient. The
relevant bit of code from HTTPClient is:
def default_redirect_uri_callback(uri, res)
newuri = URI.parse(res.header[‘location’][0])
unless newuri.is_a?(URI::HTTP)
newuri = URI.join(uri, newuri)
STDERR.puts(
“could be a relative URI in location header which is not
recommended”)
STDERR.puts(
“‘The field value consists of a single absolute URI’ in HTTP
spec”)
end
puts “Redirect to: #{newuri}” if $DEBUG
newuri
end

Note the line: URI.join(uri, newuri) which takes the (presumed)
relative newuri and interprets it with respect to the original uri.
(Note also that I’ve recently sent the author of httpclient a patch
that fixed this line.)

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

That is probably the case when using Hpricot - but mechanize handles
this and has a method that takes a relative url redirect and creates a
fully qualified one.
Also it worked for me yesterday with the exact same code (I know that
sounds crazy! :slight_smile:

Thanks for the quick and thorough reply bob!