Net::HTTP.get_response hangs with particular url

This seems bizar, the first get_response() to apple.com goes fine, the
2nd hangs indefinitely. It hangs on two different machines in different
network segments.

require ‘net/http’
require ‘uri’

response = Net::HTTP.get_response( URI.parse(‘http://www.apple.com’) )
puts response.class # Net::HTTPOK
puts response.code # 200

response = Net::HTTP.get_response(
URI.parse(‘http://content.digitalwell.washington.edu/isilon/1/8/45/459ffc6f-3f91-46f5-af58-155791dad3b4.mp3’)
)
puts response.class
puts response.code

(I have no affiliation with the hanging url, it’s just a url to I came
across in my server logs while processing RSS feed enclosures).

Pressing ctrl-c gives:

/opt/ruby-1.8.4/lib/ruby/1.8/net/protocol.rb:133:in sysread': Interrupt from /opt/ruby-1.8.4/lib/ruby/1.8/net/protocol.rb:133:in rbuf_fill’
from /opt/ruby-1.8.4/lib/ruby/1.8/timeout.rb:56:in timeout' from /opt/ruby-1.8.4/lib/ruby/1.8/timeout.rb:76:in timeout’
from /opt/ruby-1.8.4/lib/ruby/1.8/net/protocol.rb:132:in
rbuf_fill' from /opt/ruby-1.8.4/lib/ruby/1.8/net/protocol.rb:86:in read’
from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:2180:in
read_body_0' from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:2141:in read_body’
from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:2166:in body' from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:2105:in reading_body’
from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:1048:in request' from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:944:in request_get’
from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:380:in
get_response' from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:545:in start’
from /opt/ruby-1.8.4/lib/ruby/1.8/net/http.rb:379:in
`get_response’
from test.rb:8

Anyone?

I’m not positive, but I would point out that the URL in question is an
82MB mp3. I was under the impression that getting a HTTPResponse
actually downloaded the body of the requested page/file. So perhaps the
response is just taking a long time in arriving? My connection would
need more than 10 minutes for this to return (though I admit my error is
different when I press control-c)

Dan

Dan Z. wrote:

I’m not positive, but I would point out that the URL in question is an
82MB mp3. I was under the impression that getting a HTTPResponse
actually downloaded the body of the requested page/file. So perhaps the
response is just taking a long time in arriving? My connection would
need more than 10 minutes for this to return (though I admit my error is
different when I press control-c)

Dan

Hi Dan,

My connection to this file seems much worse than yours, I just tried it
with curl and it took 50 minutes! That’s why it took so incredibly long!

By the way, I came across this issue while looking for a clean and
simple way to (1) catch redirects and (2) test whether an object is
available for download on a certain location.

I hereby post a snippet who does just that, maybe it helps other folks
who are looking for this solution and stumble on long waits for
get_response() to finish:

require ‘open-uri’
require ‘timeout’

def test( _url )
begin
content_length = nil
bytes_read = nil
Timeout::timeout( 10 ) do | length |
open( _url, “User-Agent” => “test”,
:content_length_proc => lambda { |cl| content_length = cl },
:progress_proc => lambda { |br|
bytes_read = br
raise HadEnoughException if bytes_read > 0 } )
end
rescue HadEnoughException
puts “HadEnoughException”
rescue Timeout::Error
puts “Timed out on [#{_url}]”
rescue OpenURI::HTTPError
puts “Cannot open [#{_url}]”
rescue Exception => e
puts “Exception #{e.message}”
end
return content_length, bytes_read
end

class HadEnoughException < Exception
end

content_length, bytes_read = test(‘http://apple.com’)
puts content_length, bytes_read

outputs HadEnoughException 31805 784 (values will be different on your

computer)

content_length, bytes_read =
test(‘http://content.digitalwell.washington.edu/isilon/1/8/45/459ffc6f-3f91-46f5-af58-155791dad3b4.mp3’)
puts content_length, bytes_read

outputs HadEnoughException 86136581 769 (values will be different on

your computer)

Note that not all web servers return the content_length in the http
header, so this value may be nil