Image scraping from behind a proxy


I was looking at this post in the forum for downloading image files from
the www:

But it doesnt work for me, apparently because I am behind a proxy. For
the above code(s) I get errors like the following:

c:/ruby/lib/ruby/1.8/net/http.rb:564:in initialize': No connection could be mad e because the target machine actively refused it. - connect(2) (Errno::ECONNREFU SED) from c:/ruby/lib/ruby/1.8/net/http.rb:564:in open’
from c:/ruby/lib/ruby/1.8/net/http.rb:564:in connect' from c:/ruby/lib/ruby/1.8/timeout.rb:48:in timeout’
from c:/ruby/lib/ruby/1.8/timeout.rb:76:in timeout' from c:/ruby/lib/ruby/1.8/net/http.rb:564:in connect’
from c:/ruby/lib/ruby/1.8/net/http.rb:557:in do_start' from c:/ruby/lib/ruby/1.8/net/http.rb:546:in start’
from c:/ruby/lib/ruby/1.8/open-uri.rb:243:in open_http' ... 7 levels... from test.rb:48:in write_images’
from test.rb:45:in each' from test.rb:45:in write_images’
from test.rb:76

I had run into similar problems when I had tried to obtain a http
response. Back then I started doing this (which works perfectly for me):

$proxy_addr = ‘proxyservername’
$proxy_port = 8080
$proxy=Net::HTTP::Proxy($proxy_addr, $proxy_port)

url = URI.parse(http_query)
http_response = $proxy.get_response(url)

Is there something similar I can do for obtaining image files? I did
tweak the above code to have a http image file location in the
http_query and store the http_response.body into a normal file. Though
that didnt give me any errors, my jpeg is unreadable. :frowning:

While I was writing my query I figured out what I am supposed to do :slight_smile:
Sorry for the thread. I hope it helps other visitors to the forum.

Here’s how it works now:

$proxy_addr = ‘proxyservername’
$proxy_port = 8080

Net::HTTP::Proxy($proxy_addr, $proxy_port).start(“”) {
resp = http.get(“/92/218926700_ecedc5fef7_o.jpg”)
open(“fun.jpg”, “wb”) { |file|

The above is tweaked version of the example available here:

It just uses Net::HTTP::Proxy instead of Net::HTTP