Too many open files causes Mechanize/Net::HTTP to timeout?

I’m experiencing an issue where my get/submit calls to the Mechanize
agent are leading to timeout exceptions when the ruby script has too
many open file descriptors ( > 1000). However, I’m not seeing anything
about an overstep violation sent anywhere to syslog, and no error
message sent to stdout. The process has the large amount of open files
because it’s executed by PHP from an Apache install with > 500 VHosts
(with two logs for each vhost), and all of the open FD’s get inherited
into the ruby process (as reported by lsof). I can get the script
operating properly by commenting out several of the logs in my Apache
install, but I’d prefer to get to the heart of what wall
Ruby/Mechanize/Net::HTTP is running into (and not logging/reporting).

Since the cause of this is too many open files, I’ve tried the
following:

  • Setting the hard and soft ulimits in /etc/limits and
    /etc/security/limits.conf (however if there was an overstep, grsecurity
    should have logged this to the syslog/dmesg)
  • cat /proc/sys/fs/file-max

205126

  • Recompiled ruby (ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux])
    after updating the ulimits (this was something recommended in a similar
    situation with Squid proxy, although I didn’t see anything mentioned
    about FD_SETSIZE or similar)

What I’m imagining is that the default FD_SETSIZE defined in the linux
headers is what’s getting hit, although I’ve heard that modifying those
headers to increase the value can lead to instability in some services.

I’m hoping that someone can tell me:

  • Which limit the process is hitting, and how to extend that limit
  • Is this potentially a bug in Ruby that should go upstream?
  • If there’s any sane way to discard all the inherited open FD’s that it
    inherits
  • (and a long-shot non-ruby PHP question) if there’s any way I can have
    PHP execute the ruby script and have it not inherit the FD’s

== References ==

Sample driver of the .php script that executes the ruby script:

<?php $id = 108; // hard-coded test $command = 'ruby ' . escapeshellarg('/www/CLIENT/htdocs/include/script/nysif_scrape/run.rb') . ' ' . escapeshellarg($id) . ' >/dev/null &'; // /dev/null redir is needed to keep program in the background system($command); ?>

Here’s the backtrace on the exception that gets thrown on the timeout:
/usr/lib/ruby/1.8/timeout.rb:54:in rbuf_fill' /usr/lib/ruby/1.8/timeout.rb:56:intimeout’
/usr/lib/ruby/1.8/timeout.rb:76:in timeout' /usr/lib/ruby/1.8/net/protocol.rb:132:inrbuf_fill’
/usr/lib/ruby/1.8/net/protocol.rb:116:in readuntil' /usr/lib/ruby/1.8/net/protocol.rb:126:inreadline’
/usr/lib/ruby/1.8/net/http.rb:2029:in read_status_line' /usr/lib/ruby/1.8/net/http.rb:2018:inread_new’
/usr/lib/ruby/1.8/net/http.rb:1059:in request' /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:514:infetch_page’
/usr/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:185:in
get' /www/CLIENT/htdocs/include/script/nysif_scrape/lib/AuthedAgent.rb:18:ininitialize’
/usr/lib/ruby/1.8/singleton.rb:95:in new' /usr/lib/ruby/1.8/singleton.rb:95:ininstance’
www/CLIENT/htdocs/include/script/nysif_scrape/run.rb:20

For reference, here’s my AuthedAgent class definition. It’s a Singleton
providing access to an Agent that has already logged into the Login form
on the site, and $import is just an ActiveRecord row to allow for ajax-y
progress bar updates on the web-side. It times out on either the ‘get’
or the ‘submit’ method calls.

class AuthedAgent < WWW::Mechanize
include Singleton

def initialize
super

   $import.status = 'connecting';
   $import.save!
   login_page = get($config[:urls][:login])
   raise SiteChangedException, "Login page is 404" if

(!login_page.title.nil? && login_page.title.include?(“The page cannot be
found”))

   $import.status = 'authenticating';
   $import.save!
   form = login_page.form('frmLogin0')
   raise SiteChangedException, "Could not locate form 'frmLogin0' on

login page" if form.nil?
form.fields.name(“LOGIN”).value = $config[:username]
form.fields.name(“PWD”).value = $config[:password]
post_login = submit(form)
raise SiteChangedException, “Username and password do not appear
valid. Check credentials, or possible site change.” if !authenticated?
end

def authenticated?
cookie = cookies.find { |c| c.name == “LOGIN” }
raise SiteChangedException, “‘LOGIN’ cookie is no longer being
set” if cookie.nil?
cookie.value.include? “TAG3=” + $config[:username]
end
end

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs