OpenURI open method problem

The code I am referring to looks like this:

def open_url(url)
url_object = nil
begin
url_object = open(url)
rescue
puts "Unable to open url: " + url
end
return url_object
end

I was wondering why the open method is unable to open the url when the
urls are of this form:

http://www.anrdoezrs.net/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK

http://www.jdoqocy.com/click-5329913-10538037?url=http://www.6pm.com/rock-n-roll-cowgirl-woven-tunic-red

http://www.kqzyfj.com/click-5329913-10285745?url=http%3A%2F%2Fwww.qvc.com%2Fscripts%2Freference.pl%3Fitem%3DJ142488%26ref%3DCJ4%26tpl%3Ddetail&cjsku=J142488

even though they clearly work if you put it into your browser. I’ve read
that open uri follows redirects by default, so what is the problem
here?

Thanks

Who says those pages are doing redirects? What if the first web page
parses the query string attached to the url, then uses javascript to
load a different page?

If that’s the case, is there a way I can follow the link all the way
through so I can what I want?

If the first page is redirecting using Javascript client side perhaps
you
should consider using PhantomJS through the phantomjs.rb gem or other
equivalent mean.

To assert before which way the redirection actually happens, I’d try
with a
browser plugin like Live HTTP Headers or good old Wireshark.

Marvan

Reza Marvan Spagnolo wrote in post #1074957:

To assert before which way the redirection actually happens, I’d try
with a
browser plugin like Live HTTP Headers or good old Wireshark.

Marvan

Or good old curl.

$ curl -v
OOPS! The offer you're looking for has expired.

GET
/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK
HTTP/1.1
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4
OpenSSL/0.9.8r zlib/1.2.5
Host: www.anrdoezrs.net
Accept: /

< HTTP/1.1 302 Found
< Server: Resin/3.1.8
< P3P: policyref=“http://www.anrdoezrs.net/w3c/p3p.xml”, CP=“ALL BUS LEG
DSP COR ADM CUR DEV PSA OUR NAV INT”
< Cache-control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
< Pragma: no-cache
< Expires: Thu, 06 Sep 2012 19:27:18 GMT
< Location:
OOPS! The offer you're looking for has expired.<<iuuq%3A%2F%2Fxxx.bosepf0st.ofu%3A91%2Fdmjdl-643AA24-2167A127<<H<<
< Content-Type: text/html
< Cneonction: close
< Transfer-Encoding: chunked
< Date: Thu, 06 Sep 2012 19:27:18 GMT
<

The URL has moved here

That’s a bog-standard 302 redirect. However, curl -Lv shows that there
is a chain of three redirects before the final page is reached.

Or good old curl.

$ curl -v

Neat.

Still, the op’s question remains unanswered: why doesn’t open-uri follow
all those redirects?

Looking at the curl -Lv output, there are cookies being set. Maybe a
server side script kicks you out if the requests for the redirect urls
do not include those cookies? But then again, curl was able to follow
all the redirects without employing the --cookie-jar option.

Another option is to try Mechanize and see if it can follow all the
redirects.

And for anyone that clicked on those nasty links, feel free to remove
the tracking cookies with these domain names:

anrdoezrs
apmebf
emjcd
fashion58

+1 for curl obviously … sorry was in overkill mode … :slight_smile:

Thanks guys,

I just used mechanize and it works great

On Sep 6, 2012, at 17:21 , 7stud – [email protected] wrote:

Still, the op’s question remains unanswered: why doesn’t open-uri follow
all those redirects?

it can:

% ri OpenURI::OpenRead#open | grep -A8 :redirect:
:redirect:
Synopsis:
:redirect=>bool

:redirect=>false is used to disable HTTP redirects at all.
OpenURI::HTTPRedirect exception raised on redirection. It is true by
default.
The true means redirections between http and ftp is permitted.

Still, the op’s question remains unanswered: why doesn’t open-uri follow
all those redirects?

it can:

The docs don’t make a lot of sense to me (but I’m not surprised anymore
by how crappy the ruby docs are). open-uri follows redirects by
default, and throws an exception on redirect? How are both of those
true? In any case, I don’t get an exception–I get:

$ ruby 1.rb
Unable to open url:
http://www.anrdoezrs.net/click-5329913-10569016?url=http%3A%2F%2Fwww.fashion58.com%2Fitemdetail.asp%3Fmod%3DEH5BG213DFSK&cjsku=F58-EH5BG213DFSK
$

On Sep 7, 2012, at 21:26 , 7stud – [email protected] wrote:

Still, the op’s question remains unanswered: why doesn’t open-uri follow
all those redirects?

it can:

According to the docs, open-uri follows redirects by default. So
according to the docs, open-uri can follow redirects, but the fact
remains it doesn’t in this case. Why?

Because it redirects with invalid URI’s:

Last login: Fri Sep 7 23:45:55 on ttys008
10000 % ruby -ropen-uri -e ‘URI.parse(ARGV.shift).read’
OOPS! The offer you're looking for has expired.
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:436:in
split': bad URI(is not URI?): http://www.apmebf.com/l8122ar-zH/ry2/GFKINFIM/KIHOOGI/F/F/F?b=k1ys%3Do00w%25AH%259M%259M333.Dwt.jvt%259Myvjr-u-yvss-jv3npys-3v2lu-01upj-ylk<<o00w%3A%2F%2F333.qkvxvj5.jvt%3AF7%2Fjspjr-CA9GG8A-87CAF7AE<<N<< (URI::InvalidURIError) from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:485:in parse’

from -e:1

10001 % curl -I !$
curl -I
OOPS! The offer you're looking for has expired.
HTTP/1.1 302 Found
Server: Resin/3.1.8
P3P: policyref=“http://www.jdoqocy.com/w3c/p3p.xml”, CP=“ALL BUS LEG DSP
COR ADM CUR DEV PSA OUR NAV INT”
Cache-control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Pragma: no-cache
Expires: Sat, 08 Sep 2012 06:48:03 GMT
Location:
OOPS! The offer you're looking for has expired.<<v773%3A%2F%2FAAA.xr242qC.q20%3AME%2Fqzwqy-JHGNNFH-FEJHMEHL<<U<<
Content-Type: text/html
Connection: close
Date: Sat, 08 Sep 2012 06:48:03 GMT