Rescue http Bad File Descriptor error (EBADF)

I have a little web spider that scraps several web pages. Sometimes the
script gets a Bad File Descriptor error and the script bails out.

As far as I can understand this error is an OS (Windows XP) error and
there
is nothing ruby can do to avoid it (Maybe XP cannot handle so much http
connections so rapidly). But I cannot find a way to recover from the
error… I don’t want the script to bail out, but simply continue with
the
next page.

I have tried to rescue and try/catch the error with all imaginable
exception
classes, still the script always bails out when this error ocurrs. I
know
this error is in the ruby net/http library since I have used Mechanize,
Http-access2, http-access and all of them suffer from this error.

Here are the details of the error

c:/ruby/lib/ruby/1.8/net/http.rb:562:in initialize': Bad file descriptor - connect(2) (Errno::EBADF) from c:/ruby/lib/ruby/1.8/net/http.rb:562:inconnect’
from c:/ruby/lib/ruby/1.8/timeout.rb:48:in timeout' from c:/ruby/lib/ruby/1.8/timeout.rb:76:intimeout’
from c:/ruby/lib/ruby/1.8/net/http.rb:562:in connect' from c:/ruby/lib/ruby/1.8/net/http.rb:555:indo_start’
from c:/ruby/lib/ruby/1.8/net/http.rb:544:in start' from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.4.7/lib/mechanize.rb:279:infetch_page’
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.4.7/lib/mechanize.rb:138:in
get' from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.4.7/lib/mechanize.rb:193:insubmit’
from ./urls.rb:1001
from ./urls.rb:998
from ./spider.rb:540:in get_pages' from ./spider.rb:470:instart’
from ./spider.rb:733:in process_link' from ./spider.rb:677:instart’
from ./spider.rb:670:in `start’
from main2.rb:4

I repeat that this error is very sporadic… it occurs at different
times,
different pages and using either Mechanize or http-access2 or simple
open-uri.

The code I use (Mechanize case) is simply:

def get_page
@agent = WWW::Mechanize.new
@page = @agent.get @url
end

This code is executed for each page (100~). Adding rescue or try/catch
anywhere inside the get_page method or around it when called does not
catch
the error… the script always stops when the error ocurrs. Also since
this
error is very sporadic I cannot reproduce it making it very difficult to
debug.

Any tips are very appreciated.

Horacio

Horacio S. wrote:

I have a little web spider that scraps several web pages. Sometimes the
script gets a Bad File Descriptor error and the script bails out.

As far as I can understand this error is an OS (Windows XP) error and
there
is nothing ruby can do to avoid it (Maybe XP cannot handle so much http
connections so rapidly). But I cannot find a way to recover from the
error… I don’t want the script to bail out, but simply continue with
the
next page.

I have tried to rescue and try/catch the error with all imaginable
exception
classes, still the script always bails out when this error ocurrs. I
know
this error is in the ruby net/http library since I have used Mechanize,
Http-access2, http-access and all of them suffer from this error.

Did you try this?

begin

Read stuff etc.

rescue Errno::EBADF

Whatever

end

Here are the details of the error

This code is executed for each page (100~). Adding rescue or try/catch
anywhere inside the get_page method or around it when called does not
catch
the error… the script always stops when the error ocurrs. Also since
this
error is very sporadic I cannot reproduce it making it very difficult to
debug.

Any tips are very appreciated.

Horacio

I think I did try… but will try again. now I have to wait until the
error
occurs again…

thanks