Hello, I am writing a crawler in Ruby to crawl websites. One of the sites I crawl is very picky about headers so I am mimicking my FireFox browser as closely as possible. One of the GETs I make to this site results in a redirect response. I take the 'location' field from the redirect header and go there. When FireFox sends its GET to this location, it gets a 200 OK response. However, I keep getting redirected every time. Here is what FireFox is sending: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 184.108.40.206) Gecko/20070515 Firefox/220.127.116.11 Keep-Alive: 300 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5 Cookie: sessionid=6d7dd6277ec64983bf642760d7d77d6a Connection: keep-alive Accept: text/xml,application/xml,application/xhtml+xml,text/ html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Host: <hostname here> And here is how the server responds to FireFox: HTTP/1.x 200 OK Date: Tue, 12 Jun 2007 17:30:20 GMT Server: Microsoft-IIS/6.0 MicrosoftOfficeWebServer: 5.0_Pub X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Cache-Control: private Expires: Tue, 12 Jun 2007 17:29:18 GMT Content-Type: text/html; charset=utf-8 Content-Length: 81118 I am sending this exact same header using Ruby's Net::HTTP.get method: server = Net::HTTP.new(uri.host, uri.port) response,data = server.get(uri.request_uri, headers) where headers is a hash with the exact same keys and values as the FireFox headers above (the cookie value differs, of course, as that is retrieved and stored dynamically). But I always get redirected to the exact same URL that I just GETed. This is the response I get: RESPONSE: #<Net::HTTPFound:0x300c604> Printing Response: cache-control: private expires: Tue, 12 Jun 2007 18:17:26 GMT x-aspnet-version: 1.1.4322 content-type: text/html; charset=utf-8 x-powered-by: ASP.NET date: Tue, 12 Jun 2007 18:18:26 GMT microsoftofficewebserver: 5.0_Pub server: Microsoft-IIS/6.0 content-length: 200 location: <exact same URL I just GETed> Can anyone enlighten me as to what I am doing differently that the site redirects me to the same place? I can't tell if it's something I'm doing wrong or something Ruby is doing that is not the same as what FireFox is doing. Thanks.
on 2007-06-12 22:41
on 2007-06-12 23:05
> Accept: text/xml,application/xml,application/xhtml+xml,text/ > html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 > Host: <hostname here> Presumably this is from LiveHTTPHeaders? I note that the Referer header is not included herein, but Firefox does send those data by default. Perhaps that's the substantive difference between the Firefox request and the Net::HTTP request? Just a thought. - donald
on 2007-06-12 23:13
On Jun 12, 1:03 pm, "Ball, Donald A Jr (Library)" <email@example.com> wrote: > > Accept: text/xml,application/xml,application/xhtml+xml,text/ > > html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 > > Host: <hostname here> > > Presumably this is from LiveHTTPHeaders? I note that the Referer header > is not included herein, but Firefox does send those data by default. > Perhaps that's the substantive difference between the Firefox request > and the Net::HTTP request? Just a thought. > > - donald Donald, Good thought. The GET right before this one that FireFox sent did have the referer field but then it wasn't there for this one, so I removed it. Any other ideas? Matt