Image scraping from behind a proxy

abhishek-gupta · June 4, 2008, 12:48pm

Hi,

I was looking at this post in the forum for downloading image files from
the www:
http://www.ruby-forum.com/topic/133833

But it doesnt work for me, apparently because I am behind a proxy. For
the above code(s) I get errors like the following:

c:/ruby/lib/ruby/1.8/net/http.rb:564:in initialize': No connection could be mad e because the target machine actively refused it. - connect(2) (Errno::ECONNREFU SED) from c:/ruby/lib/ruby/1.8/net/http.rb:564:in open’
from c:/ruby/lib/ruby/1.8/net/http.rb:564:in connect' from c:/ruby/lib/ruby/1.8/timeout.rb:48:in timeout’
from c:/ruby/lib/ruby/1.8/timeout.rb:76:in timeout' from c:/ruby/lib/ruby/1.8/net/http.rb:564:in connect’
from c:/ruby/lib/ruby/1.8/net/http.rb:557:in do_start' from c:/ruby/lib/ruby/1.8/net/http.rb:546:in start’
from c:/ruby/lib/ruby/1.8/open-uri.rb:243:in open_http' ... 7 levels... from test.rb:48:in write_images’
from test.rb:45:in each' from test.rb:45:in write_images’
from test.rb:76

I had run into similar problems when I had tried to obtain a http
response. Back then I started doing this (which works perfectly for me):

$proxy_addr = ‘proxyservername’
$proxy_port = 8080
$proxy=Net::HTTP::Proxy($proxy_addr, $proxy_port)

http_query=“http://www.yahoo.com”
url = URI.parse(http_query)
http_response = $proxy.get_response(url)

Is there something similar I can do for obtaining image files? I did
tweak the above code to have a http image file location in the
http_query and store the http_response.body into a normal file. Though
that didnt give me any errors, my jpeg is unreadable.

abhishek-gupta · June 4, 2008, 1:11pm

While I was writing my query I figured out what I am supposed to do
Sorry for the thread. I hope it helps other visitors to the forum.

Here’s how it works now:

$proxy_addr = ‘proxyservername’
$proxy_port = 8080

Net::HTTP::Proxy($proxy_addr, $proxy_port).start(“static.flickr.com”) {
|http|
resp = http.get(“/92/218926700_ecedc5fef7_o.jpg”)
open(“fun.jpg”, “wb”) { |file|
file.write(resp.body)
}
}

The above is tweaked version of the example available here:
http://www.rubynoob.com/articles/2006/8/21/how-to-download-files-with-a-ruby-script

It just uses Net::HTTP::Proxy instead of Net::HTTP