Open-uri fetches outdated content vs. curl

Try running the following program:

================
require ‘open-uri’

feed_url = “Slate Magazine

result1 = open(feed_url).read
puts “Saving result1.xml:”
File.open(“result1.xml”, “w”) {|f| f.write(result1)}

result2 = curl -L #{feed_url}
puts “Saving result2.xml:”
File.open(“result2.xml”, “w”) {|f| f.write(result2)}

command = “diff result1.xml result2.xml”
puts system(command)

result1 should be identical to result2, but it turns out that the feed
that open-uri fetches is outdated content (by over a month), while the
feed that curl fetches is up-to-date. Can anyone please explain what
is going on?

Thanks!

On 18.09.2008 02:05, Daniel C. wrote:

feed that curl fetches is up-to-date. Can anyone please explain what
is going on?

Reasons I can think of:

i) Both approaches use different paths to the server, namely a different
(or no) proxy.

ii) There is something in the request that makes the server send
different data.

Can you try to obtain HTTP headers from both approaches? That might
clear up a few things. Also, on Unix type systems check for environment
variables and ~/.xyzrc files which might affect proxy settings.

Another good idea might be to try a different tool, e.g. a web browser,
to see what that turns up.

Kind regards

robert

On Sep 18, 2:26 am, Robert K. [email protected] wrote:

================

Kind regards

    robert

Thanks for these suggestions. The problem actually just cleared itself
up, after several days where the open-uri fetch was getting outdated
content. I think it was a problem is upstream proxies. I’ll try to
look at the headers out of curiosity.

On 24.09.2008 02:11, Daniel C. wrote:

case here.
Daniel, thanks for the update! This is interesting stuff. The
distinction is probably not so much between “bad” or “good” proxies but
between proxies tailored for a particular browser version. Maybe it’s a
bug and you should show this to your IT department. Could be that they
changed firewall rules in the past and the “bad” proxy never gets
updated because of lacking connectivity. :slight_smile:

Cheers

robert

I used net/http to do the same thing, but this time I printed out the
redirect locations. The result is very interesting. If it don’t set
the “User-Agent” header, it get redirected to one proxy – the one
with outdated content. If I set the “User-Agent” header to “Mozilla/
5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/XX (KHTML, like
Gecko) Safari/YY” (faking Apple Safari), I get redirected to another
proxy, with the up to date content.

I didn’t know that servers redirected requests to bad or good proxies
depending on what the User Agent header is. But this seems to be the
case here.