Timeout not obeyed when trying to open bad url

Here is something Ive never seen before:

I have a list of urls fed into mechanize (which uses net/http to grab
pages)

I have it setup as thus:
require ‘mechanize’
agent = WWW::Mechanize.new;
error_count = 0
begin
Timeout::timeout(2) {
@tracked_page = agent.get(“http://#{site_url}”)
}
rescue Timeout::Error => timeout_error
puts “I TIMED OUT AFTER 2 SECS BUT IM TRYING AGAIN:
#{timeout_error}”
error_count += 1
if error_count < 5
puts “ATTEMPT NUMBER #{@error_count} QUITTING AFTER 4 TRIES”
retry
end
end

This is all well and good, it works fine and catches any timeout
exceptions, except when its trying to deal with one particular URL
(www.webdevking.com).

This URL is not currently resolving to any host. it returns “unknown
host” when you try to connect to it.

For some reason, when I run this code on the url: www.webdevking.com
it hangs for upwards of 30 seconds. For some reason, it fails to obey
the Timeout of two seconds. I’ve tried setting the mechanize timeout:
agent.open_timeout and agent.read_timeout, but these aren’t obeyed
either. The timeout is obeyed for every URL but that one.

I would like to get to the root of this problem, because a 30 second
slowdown when process a batch of urls is unacceptable.

Any ideas?!?!

Thanks in advance.

On 17 May 2008, at 00:16, histrionics wrote:

Here is something Ive never seen before:

I’ve seen this before. Bottom line is, ruby’s threading sucks, and
when running c code (extensions or bits of the stdlib that call
through to c code) you can block the entire ruby interpreter. Things
that can do this include mysql queries, some parts of name resolving,
and it would appear some other bits of the networking libraries.

Fred

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs