Mongrel hangs with 100% CPU / EBADF (Bad file descriptor)

I posted this at the mongrel forum, but got no response.,
I believe the problems isn’t in mongrel itself, so maybe somebody here
has some info on this.
Anyway, here is the message:

We have a server with 10 running mongrel_cluster instances with apache
in front of them, and every now and then one or some of them hang.
No activity is seen in the database (we’re using activerecord sessions).
Mysql with innodb tables. show innodb status shows no locks. show
processlist shows nothing.
The server is linux debian 4.0
Ruby is: ruby 1.8.6 (2008-03-03 patchlevel 114) [i486-linux]
Rails is: Rails 1.1.2 (yes, quite old)
We’re using the native mysql connector (gem install mysql)

“strace -p PID” gives the following in a loop for the hung mongrel
process:

gettimeofday({1219834026, 235289}, NULL) = 0
select(4, [3], [0], [], {0, 905241}) = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 235477}, NULL) = 0
select(4, [3], [0], [], {0, 905053}) = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 235654}, NULL) = 0
select(4, [3], [0], [], {0, 904875}) = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 235829}, NULL) = 0
select(4, [3], [0], [], {0, 904700}) = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 236017}, NULL) = 0
select(4, [3], [0], [], {0, 904513}) = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 236192}, NULL) = 0
select(4, [3], [0], [], {0, 904338}) = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 236367}, NULL) = 0

I used lsof and found that the process used 67 file descriptors (lsof -p
PID |wc -l)

Is there any other way I can debug this, so that I could for example
determine which file descriptor is “bad”?
Any other info or suggestions? Anybody else seen this?

The site is fairly used, but not overly so, load averages usually around
0.3.


Some additional info. I installed mongrel_proctitle to show what the
hung processes were doing, and it seems they are hanging on a method
that displays images using file_column / images from the database /
rmagick to resize and make the images greyscale. Not conclusive the
problem is here, but it is a suspicion.
Is there something obviously wrong with the following? The method
displays a static image if the order doesn’t contain an image, else the
image resized from the order. The cache stuff is so that the image gets
updated in the browser every time. The image is inserted in the page
with a normal image tag.

code:
def preview_image
@order = session[:order]
if @order.image.nil?
@headers[‘Pragma’] = ‘no-cache’
@headers[‘Cache-Control’] = ‘no-cache, must-revalidate’
send_data(EMPTY_PIC.to_blob, :filename => “img.jpg”, :type =>
“image/jpeg”, :disposition => “inline”)
else
@pic = Image.read(@order.image)[0]
if (@order.crop)
@pic.crop!(@order.crop[:x1].to_i, @order.crop[:y1].to_i,
@order.crop[:width].to_i, @order.crop[:height].to_i, true)
end
@pic.resize!(103,130)
@pic = @pic.quantize(256, Magick::GRAYColorspace)
@headers[‘Pragma’] = ‘no-cache’
@headers[‘Cache-Control’] = ‘no-cache, must-revalidate’
send_data(@pic.to_blob, :filename => “img.jpg”, :type =>
“image/jpeg”, :disposition => “inline”)
end
end

i had a simmilar problem recently. as it turned out, the server was
running without swap space configured. maybe you run out of memory?

cheers

gerold

Thanks, but that’s not it:

         total       used       free     shared    buffers 

cached
Mem: 2073760 1866560 207200 0 30696
1286868
-/+ buffers/cache: 548996 1524764
Swap: 1999992 108 1999884

I had some problems also when /tmp got full (mainly the db didn’t like
it), but fixed those.

Hello,

I am facing exactly same problem and was wondering if you found the
solution for it.

Thanks
Vins

On Aug 31, 9:15 pm, Front L. [email protected]

Here is the lsof output if anybody can find any problems in it. Don’t
know how it will format in this message…

lsof: WARNING: can’t stat() ext3 file system /dev/.static/dev
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
mongrel_r 11628 username cwd DIR 9,2 4096 1870688
/home/domains/example.com/usernameOrder/releases/20080831121802
mongrel_r 11628 username rtd DIR 9,1 4096 2 /
mongrel_r 11628 username txt REG 9,1 3564 167172
/usr/bin/ruby1.8
mongrel_r 11628 username mem REG 0,0 0
[heap] (stat: No such file or directory)
mongrel_r 11628 username DEL REG 0,8 15560245
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560242
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560602
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560601
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560684
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560683
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560685
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560568
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560607
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560569
/dev/zero
mongrel_r 11628 username mem REG 9,1 1933648 456972
/usr/lib/libmysqlclient.so.15.0.0
mongrel_r 11628 username DEL REG 0,8 15442414
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560546
/dev/zero
mongrel_r 11628 username mem REG 9,1 67408 457393
/lib/i686/cmov/libresolv-2.7.so
mongrel_r 11628 username mem REG 9,1 17884 457386
/lib/i686/cmov/libnss_dns-2.7.so
mongrel_r 11628 username DEL REG 0,8 15560541
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560246
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560693
/dev/zero
mongrel_r 11628 username DEL REG 0,8 15560608
/dev/zero
mongrel_r 11628 username mem REG 9,1 25700 164963
/usr/lib/gconv/gconv-modules.cache
mongrel_r 11628 username mem REG 9,1 83708 457384
/lib/i686/cmov/libnsl-2.7.so
mongrel_r 11628 username mem REG 9,1 140602 506903
/var/lib/gems/1.8/gems/mysql-2.7/lib/mysql.so
mongrel_r 11628 username mem REG 9,1 1282816 180935
/usr/lib/locale/locale-archive
mongrel_r 11628 username mem REG 9,1 16672 167438
/usr/lib/libXdmcp.so.6.0.0
mongrel_r 11628 username mem REG 9,1 7220 167436
/usr/lib/libXau.so.6.0.0
mongrel_r 11628 username mem REG 9,1 434840 167477
/usr/lib/libfreetype.so.6.3.10
mongrel_r 11628 username mem REG 9,1 1147196 167011
/usr/lib/libxml2.so.2.6.27
mongrel_r 11628 username mem REG 9,1 66200 457138
/lib/libbz2.so.1.0.3
mongrel_r 11628 username mem REG 9,1 965952 167440
/usr/lib/libX11.so.6.2.0
mongrel_r 11628 username mem REG 9,1 85060 168009
/usr/lib/libICE.so.6.3.0
mongrel_r 11628 username mem REG 9,1 32144 168011
/usr/lib/libSM.so.6.0.0
mongrel_r 11628 username mem REG 9,1 320344 168013
/usr/lib/libXt.so.6.0.0
mongrel_r 11628 username mem REG 9,1 52848 168023
/usr/lib/libXext.so.6.4.0
mongrel_r 11628 username mem REG 9,1 137760 167484
/usr/lib/libpng12.so.0.15.0
mongrel_r 11628 username mem REG 9,1 127488 167481
/usr/lib/libjpeg.so.62.0.0
mongrel_r 11628 username mem REG 9,1 288776 168015
/usr/lib/libjasper-1.701.so.1.0.0
mongrel_r 11628 username mem REG 9,1 343712 168021
/usr/lib/libtiff.so.4.2.1
mongrel_r 11628 username mem REG 9,1 197880 168017
/usr/lib/liblcms.so.1.0.15
mongrel_r 11628 username mem REG 9,1 3637020 168025
/usr/lib/libMagick.so.9.0.0
mongrel_r 11628 username DEL REG 0,8 15560547
/dev/zero
mongrel_r 11628 username mem REG 9,1 252348 217396
/usr/lib/ruby/1.8/i486-linux/RMagick.so
mongrel_r 11628 username mem REG 9,1 38412 457387
/lib/i686/cmov/libnss_files-2.7.so
mongrel_r 11628 username mem REG 9,1 195564 457011
/lib/libncurses.so.5.6
mongrel_r 11628 username mem REG 9,1 188760 457143
/lib/libreadline.so.5.2
mongrel_r 11628 username DEL REG 0,8 15560542
/dev/zero
mongrel_r 11628 username mem REG 9,1 15312 214202
/usr/lib/ruby/1.8/i486-linux/readline.so
mongrel_r 11628 username mem REG 9,1 12192 213239
/usr/lib/ruby/1.8/i486-linux/digest/sha1.so
mongrel_r 11628 username mem REG 9,1 12868 213153
/usr/lib/ruby/1.8/i486-linux/iconv.so
mongrel_r 11628 username mem REG 9,1 11744 213164
/usr/lib/ruby/1.8/i486-linux/racc/cparse.so
mongrel_r 11628 username mem REG 9,1 236540 213151
/usr/lib/ruby/1.8/i486-linux/nkf.so
mongrel_r 11628 username mem REG 9,1 6688 213237
/usr/lib/ruby/1.8/i486-linux/digest/md5.so
mongrel_r 11628 username mem REG 9,1 15552 213160
/usr/lib/ruby/1.8/i486-linux/strscan.so
mongrel_r 11628 username mem REG 9,1 31312 213245
/usr/lib/ruby/1.8/i486-linux/zlib.so
mongrel_r 11628 username mem REG 9,1 12608 213243
/usr/lib/ruby/1.8/i486-linux/thread.so
mongrel_r 11628 username mem REG 9,1 36325 360155
/var/lib/gems/1.8/gems/fastthread-1.0.1/lib/fastthread.so
mongrel_r 11628 username mem REG 9,1 44423 347866
/var/lib/gems/1.8/gems/mongrel-1.1.5/lib/http11.so
mongrel_r 11628 username mem REG 9,1 38844 213246
/usr/lib/ruby/1.8/i486-linux/socket.so
mongrel_r 11628 username mem REG 9,1 7840 213249
/usr/lib/ruby/1.8/i486-linux/etc.so
mongrel_r 11628 username mem REG 9,1 109364 213241
/usr/lib/ruby/1.8/i486-linux/syck.so
mongrel_r 11628 username mem REG 9,1 15888 213244
/usr/lib/ruby/1.8/i486-linux/stringio.so
mongrel_r 11628 username mem REG 9,1 78500 163229
/usr/lib/libz.so.1.2.3
mongrel_r 11628 username mem REG 9,1 1338692 182762
/usr/lib/i686/cmov/libcrypto.so.0.9.8
mongrel_r 11628 username mem REG 9,1 278628 182763
/usr/lib/i686/cmov/libssl.so.0.9.8
mongrel_r 11628 username mem REG 9,1 3820 213247
/usr/lib/ruby/1.8/i486-linux/fcntl.so
mongrel_r 11628 username mem REG 9,1 224716 212332
/usr/lib/ruby/1.8/i486-linux/openssl.so
mongrel_r 11628 username mem REG 9,1 1360292 457376
/lib/i686/cmov/libc-2.7.so
mongrel_r 11628 username mem REG 9,1 149328 457382
/lib/i686/cmov/libm-2.7.so
mongrel_r 11628 username mem REG 9,1 38300 457378
/lib/i686/cmov/libcrypt-2.7.so
mongrel_r 11628 username mem REG 9,1 9684 457379
/lib/i686/cmov/libdl-2.7.so
mongrel_r 11628 username mem REG 9,1 112354 457392
/lib/i686/cmov/libpthread-2.7.so
mongrel_r 11628 username mem REG 9,1 783564 164396
/usr/lib/libruby1.8.so.1.8.6
mongrel_r 11628 username mem REG 9,1 10228 213152
/usr/lib/ruby/1.8/i486-linux/digest.so
mongrel_r 11628 username mem REG 9,1 109152 457054
/lib/ld-2.7.so
mongrel_r 11628 username 1w REG 9,2 462923 1575329
/home/domains/example.com/usernameOrder/shared/log/mongrel.8001.log
mongrel_r 11628 username 2w REG 9,2 462923 1575329
/home/domains/example.com/usernameOrder/shared/log/mongrel.8001.log
mongrel_r 11628 username 3u IPv4 15442350 TCP
localhost:8001 (LISTEN)
mongrel_r 11628 username 4w REG 9,2 118943548 1575355
/home/domains/example.com/usernameOrder/shared/log/production.log
mongrel_r 11628 username 5u REG 9,1 145306 234226
/tmp/mongrel.11628.0 (deleted)
mongrel_r 11628 username 7u unix 0xc3c12480 15442417
socket
mongrel_r 11628 username 11u REG 9,1 50 234180
/tmp/CGI.11628.2
mongrel_r 11628 username 12u REG 9,1 26228 234227
/tmp/CGI.11628.3

Same problem here…CPU up to 99% – app is unresponsive.

$ strace -p :

sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0

Over and over…

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ PPID COMMAND
5393 ettinger 21 0 821m 810m 2680 R 97.9 40.1 2:51.87 1
/usr/bin/ruby1.8 /usr/bin/mongrel_rails start -C /h

I have had the same issue. It does not always reproduce, but I have a
machine that runs into it at least once a day.

ruby 1.9.2p136

gettimeofday is being called as fast as possible, using up 99% of a CPU.

Has anyone found a solution?

Vins wrote:

Hello,

I am facing exactly same problem and was wondering if you found the
solution for it.

Thanks
Vins

On Aug 31, 9:15�pm, Front L. [email protected]

Nope, not yet! Trying to reproduce it on a devel-server, quite
impossible…
But it always hangs in code that handles image resizing etc.

Sorry, I didn’t mean to hurt your feelings.

On Tue, Aug 27, 2013 at 3:48 PM, Daniel W. [email protected]
wrote:

ruby 1.9.2p136

gettimeofday is being called as fast as possible, using up 99% of a CPU.

Something’s probably trying desperately to catch up to the present.
( Currently: late summer, 2013. )

Seriously, you’re responding to a 5-year-old post to tell us about a
problem you’re having with an obsolete version of Ruby??

And with absolutely no useful details about your platform?


Hassan S. ------------------------ [email protected]

twitter: @hassan

you guys got my hopes up. thought this was the 2013 Revival Tour for my
post.