Collecting thread: Too many open files (Rails+SCGI crash)

I have a problem with running Rails 0.13.1 with scgi_rails 0.4.3 and
Apache. After running for some time (more than a day), the SCGI
process stops with the following error (from scgi.log):


[ERR][24189] Collecting thread: Too many open files
[ERR][24189]
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:188:in
accept' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:205:injoin’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:205:in
collect_thread' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:195:inlisten’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:412:in
`run’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:61
[INF][24189] Exited accept loop. Shutdown complete.

The SCGI process shuts down and the application is unavailable. That
has happened two times now.

The first time, I figured it might have something to do with opening
files (and StringIO’s) without closing them. So I replaced every
File.new and StringIO.new line with File.open and StringIO.open,
followed by a block. From the manual, I understand that this will
close the file pointer when the block is finished, am I correct?

The problem is: now I’m closing every file after opening it, but the
problem still exists. What can I do to solve that?

Extra information: in the scgi.log file, there are also some lines like
this:


[ERR][24189] Handling client: cannot convert nil into String
[ERR][24189]
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:244:in
<<' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:244:inhandle_client’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:in
initialize' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:innew’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:in
handle_client' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:188:inlisten’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in
initialize' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:innew’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in
listen' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:412:inrun’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:61

and the same error but with a different backtrace:


[ERR][24189] Handling client: cannot convert nil into String
[ERR][24189]
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:89:in
initialize' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:89:innew’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:89:in
initialize' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:19:innew’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:19:in
process_request' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:291:inread_header’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:253:in
handle_client' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:ininitialize’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:in
new' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:inhandle_client’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:188:in
listen' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:ininitialize’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in
new' /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:inlisten’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:412:in
`run’
/usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:61

I do not know exactly what those errors mean and whether they have
anything to do with my problem. They occurred about six times between
the last crash and the restart before that.

Does anyone know what I can do to stop my application from crashing?

Thanks,
Gijs

Hello Gijs,

It could be entirely possible that you have run out of file

descriptors. You may want to look at increasing this value, on Solaris
there are some parameters you can set in /etc/system I believe, don’t
remember what they are off hand, I have see success with about 8192.

Cheers,

Aly.

Gijs van Tulder wrote:

collect_thread' files (and StringIO's) without closing them. So I replaced every [ERR][24189] Handling client: cannot convert nil into String /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in [ERR][24189] Handling client: cannot convert nil into String /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:253:in /usr/local/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in listen’
Thanks,
Gijs


Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails


Aly Dharshi
[email protected]

      "A good speech is like a good dress
       that's short enough to be interesting
       and long enough to cover the subject"

Hey Gijs,

You can get this too many open files for many reasons. First up, even
sockets are considered open files on most systems. Second, you’ll need
to check what your limits are for that user. Depending on the system,
you might have a piddly setting for max open files. I know some Linux
systems give you 1024, and OSX was notorious for only giving you a
measly 256 or some such nonsense.

So, from the error message below what seems to be going on is that the
accept call is trying to accept a client, and then the number of open
files is too great so it aborts with an error. I’ve run millions of
requests through several applications so it’s not a socket leak. Most
likely either your blasting too many clients at it and it just runs
out, or you’re opening a ton of files and not closing them.

A very quick way to tell is to use the lsof command. Simply do:

lsof | grep ruby

And it should print out all the files, sockets, and other stuff that
your ruby app has open. Run it as root for more info.

Zed

On Tue, 15 Nov 2005 23:23:02 +0100

Hi Zed and Aly,

You can get this too many open files for many reasons. First up, even
sockets are considered open files on most systems. Second, you’ll need
to check what your limits are for that user.

My limit was 1024 open files. I followed Aly’s suggestion and changed
that to 8192 open files.

I’ve also tried running lsof. That gives me a somewhat constant 182
open files for four running Rails apps (of which one was the one that
crashed). A day of using the application does not lead to an increased
number of open files. The files that are open are libraries, the log
files, some sockets and so on. Nothing that was opened by my code. My
application did not crash during the time I monitored lsof, so perhaps
there is a higher number when it does crash. I’m now storing the lsof
output to see if I can catch it when it crashes again. (But of course
I hope it doesn’t.)

Thanks for the suggestions,

Gijs

Have you monitored the memory usage of the process? Have you
experienced memory leaks?


Aníbal Rojas
[email protected]

Hi Aníbal and Zed,

Have you monitored the memory usage of the process? Have you
experienced memory leaks?

No, I didn’t monitor that. But I’ve monitored it for a while now, and
I think that’s my problem. The memory usage starts with ~29.000 kB,
and after a few days running it has climbed to more dan 120.000 kB. I
guess it stops when that becomes too large.

Now my problem is: why does it leak memory? I used ApacheBench to do a
few tests, but I couldn’t get it to leak memory. How does one find
memory leaks in Ruby?

Perhaps there’s a clue in the SCGI-ctrl status (memory was ~120MB at
that time, it didn’t crash yet but I restarted it after copying this
status):

Status as of Wed Nov 23 00:25:07 CET 2005:
PID: 20214 Started: Sat Nov 19 22:22:12 CET 2005 Environment:
production
Connected Requests: 18844
Conns/Second: Not Set
Total Conns: 379632
Max Simultaneous Conns: 1073741823
Shutdown Started: true
Processing Time: 7989.97 247.34 0.05 0.07
Current Settings:
env: production
config: config/scgi.yaml
host: 127.0.0.1
control_url: druby://127.0.0.1:10101
password: JHrFufg3Hk0dk
port: 10001
logfile: log/scgi.log

I’m not sure why it shows 18844 Connected Requests. In my Apache log
file everything was quiet, the last visitor had left a few minutes
earlier (and that visitor had only a few hits). The scgi-log didn’t
show much errors either, a few ‘Handling client: can’t convert nil
into String’ and one ‘Handling client: bad content body’ over all
these 379.632 requests. Is it possible that these 18.844 connections
were filling my memory?

Thanks,

Gijs

Hi Zed,

If you can, please run lsof and tell me if
this is all sockets open or additional other files.

Ah yes, I should’ve done that. Unfortunately I restarted (killed) the
application after copying the scgi-status, so I can’t give you an lsof
list. It now has had 186173 connections, but none of them stay
connected. Memory usage is still normal. I’m currently logging ps,
lsof and scgi-status, I’ll send them to you as soon as there’s
something to see.

Incidentally, are you doing anything in particular in your
application? Opening external resources other than databases? Hitting
an LDAP? Opening other sockets? Any info will help.

No, no other external connections than the MySQL database (using
ActiveRecord). I’m opening and closing a few StringIOs, a few iconvs
and occasionally the application runs curl (as a shell command, using
IO.popen). And it uses the ActiveRecordStore for sessions of around 5
kB per row.

Thanks for your help,

Gijs

Our problem was the MySQL libs just changing them took away the memory
leak (it was happening only in Windows) I am pretty new to Runy and
Rails and I would not know how to find you culprit of the mem leak,
but I woul check the version of everything and try to keep them as
current as possible.


Aníbal Rojas
http://www.lacaraoscura.com
[email protected]

Bingo! This looks like somehow the connections aren’t being closed
properly so the threads are staying around. I’ll look at this right
away and see what’s going on. If you can, please run lsof and tell me if
this is all sockets open or additional other files. I have a hunch
what’s going on.

Incidentally, are you doing anything in particular in your
application? Opening external resources other than databases? Hitting
an LDAP? Opening other sockets? Any info will help.

Thanks for keeping on this Gijs.

Zed A. Shaw

On Wed, 23 Nov 2005 00:39:33 +0100

Our problem was the MySQL libs just changing them took away the memory
leak (it was happening only in Windows) I am pretty new to Runy and

Okay. This was on RedHat Linux, with Rails 0.14.3, Ruby 1.8.3 and with
the newest MySQL gem (everything installed only a few days ago).

I’ve also found a strange problem with the handle_client method of
scgi.rb. In this part:

(starting with line 237)

we only read 10 bytes of the length. any request longer than this is

invalid
while len.length <= 10
c = socket.read(1)
if c == ‘:’
# found the terminal, len now has a length in it so read the
payload
break
else
len << c
end
end

The len << c sometimes gives a ‘Handling client: can’t convert nil
into String’ error. That happens when, I think, there are less than 10
bytes in the socket. Apache then shows a ‘scgi: Unknown error 500:
error sending request body’ in the error_log.

This also seems to have caused a lot of retries: today’s log shows one
client requesting ~400 times the same page within 13 seconds, every
time resulting in that error 500 message. The browser was Internet
Explorer 6.0, but I suspect the problem is somewhere in Apache. (Note:
this problem didn’t lead to higher memory usage or new hanging
requests.)

Regards,

Gijs

Uh, if the socket has less that 10 bytes in total (before being closed)
then it’s an entirely invalid SCGI request. No valid request can be 10
bytes and still have all the require HTTP headers to process it.

The comment in the code mentioning “10 bytes” means that it will not
try to handle a request larger than 999,999,999 bytes long. If you’re
seriously trying to post/upload a request that’s larger than 999+MB
then you’ve really got to rethink how you’re doing things. (Rails is a
very poor FTP server.)

Otherwise, can you shoot me the stacktrace? I can throw in some
additional checks to handle it. My first thought is maybe there’s
something up with your web server?

I’m also going to try sending you a special “debugging” version of
scgi.rb library for you to run which logs some additional stuff for me.

Zed

On Wed, 23 Nov 2005 22:59:19 +0100

Gijs,

Using popen and curl actually could cause this very problem. Make
double sure that you’re really processing the popen right and that
you’re not leaking processes or not collecting the popen results
right.

The use of popen is something that I’ve not done in a rails app, so
this would be the first thing that stands out as possibly different.

If you can, send me the code you use for doing the popen off-list and
I’ll debug it a bit.

Zed

On Wed, 23 Nov 2005 22:28:37 +0100