Forum: Mongrel Error: Mongrel timed out this thread: too many open files

A603b733176e19e6f2b2dccec29294c0?d=identicon&s=25 Emmett Shear (emmett)
on 2008-05-29 22:08
(Received via mailing list)
I just switched to Mongrel, and it's been working much better than my
previous lighttpd/fastcgi setup. So thanks for the awesomeness.

My current problem: once or twice an hour, I get following error in
production

Mongrel timed out this thread: too many open files

I never get it in testing or on our staging server. Any ideas what would
cause that? It doesn't *appear* particularly correlated with load to me,
but
I'm only receiving notifications after the fact so I can't be sure.

Thanks,
Emmett
8c43ed7f065406bf171c0f3eb32cf615?d=identicon&s=25 Zed A. Shaw (Guest)
on 2008-05-29 23:09
(Received via mailing list)
On Thu, 29 May 2008 13:07:27 -0700
"Emmett Shear" <emmett@justin.tv> wrote:

> I'm only receiving notifications after the fact so I can't be sure.
A couple things cause this.  One is that the mongrel is overloaded with
too many connections so it can't accept any more.

If there's isn't that much load on the server, then it's more likely
that you are leaking an open file here or there.  If you are doing code
like this:

a = open("blah.txt")
a.write("hi")
a.close()

Then you are probably leaking files.  Look for that, and then translate
to the block form:

open("blah.txt") {|a| a.write("hi") }

That's probably the #1 mistake people make from other languages.

--
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/
585a895773834b0c609e90615a31d0ae?d=identicon&s=25 Brian Weaver (Guest)
on 2008-06-01 04:12
(Received via mailing list)
Emmett,

Contrary to what Zed's message seems to imply, there is nothing
inherently wrong with codling like:

a = FIle.open("blah.txt")
a.write("hi!")
a.close()

You simply need to understand that if any error occurs during
a.write(...) or  a similar call then a.close will not be invoked. If
you use error handling like

a = FIle.open("blah.txt")
begin
  a.write("hi!")
ensure
  a.close()
end

then you will ensure that the file is actually closed regardless of an
exception. Of course a block like that is kind of ugly, so it's better
to do what Zed suggested and actually associate a code block with the
open call. This means that even if the block faults the file is
closed; it's just a cleaner syntax.

Here are some links that kind of explains it too:

  http://www.meshplex.org/wiki/Ruby/File_handling_Input_Output
  http://www.math.hokudai.ac.jp/~gotoken/ruby/ruby-u...

-- Brian

On Thu, May 29, 2008 at 5:02 PM, Zed A. Shaw <zedshaw@zedshaw.com>
wrote:
>>
>
>
> --
> Zed A. Shaw
> - Hate: http://savingtheinternetwithhate.com/
> - Good: http://www.zedshaw.com/
> - Evil: http://yearofevil.com/
> _______________________________________________
> Mongrel-users mailing list
> Mongrel-users@rubyforge.org
> http://rubyforge.org/mailman/listinfo/mongrel-users
>



--

/* insert witty comment here */
A603b733176e19e6f2b2dccec29294c0?d=identicon&s=25 Emmett Shear (emmett)
on 2008-06-01 06:05
(Received via mailing list)
Looks like I was overloading the mongrels with connections...I took down
the
number of connections allowed in HAProxy and it looks like the problem
went
away. So,
thanks!

This has uncovered a new problem though, one that's truly baffling me:

- Start up mongrel instances. Everything is awesome. Site is fast, life
is
good.
- Wait 30-40 minutes.
- Observe that updates and inserts in the database (postgres) are
becoming
slow. And by slow, I mean 30-40 seconds for a simple insert or update
where
it previously took less than 0.1 seconds. Load on DB server itself
remains
nominal; less than 2 on an 8 core box. No error messages of importance
that
I can see. Inserts and updates from other sources (script/console, psql)
are
fast.

This started happening just after switching from fcgi to mongrels. Could
it
be something is different about how it handles database connections? Was
I
relying on some kind of bug before?

E

On Sat, May 31, 2008 at 7:11 PM, Brian Weaver <cmdrclueless@gmail.com>
8c43ed7f065406bf171c0f3eb32cf615?d=identicon&s=25 Zed A. Shaw (Guest)
on 2008-06-01 08:14
(Received via mailing list)
On Sat, 31 May 2008 21:03:21 -0700
"Emmett Shear" <emmett@justin.tv> wrote:

> - Observe that updates and inserts in the database (postgres) are becoming
> slow. And by slow, I mean 30-40 seconds for a simple insert or update where
> it previously took less than 0.1 seconds. Load on DB server itself remains
> nominal; less than 2 on an 8 core box. No error messages of importance that
> I can see. Inserts and updates from other sources (script/console, psql) are
> fast.

Well, it sounds like your site already has some traffic.  Without
getting into a remote debugging session, have you checked your indexes
to make sure you're adding the right ones to the right columns?

If you were say entering a ton of strings into a DB and then querying
for them with insane LIKE clauses, you'd see this kind of behavior.  As
you added more rows your app would get slower and slower.

--
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/
A603b733176e19e6f2b2dccec29294c0?d=identicon&s=25 Emmett Shear (emmett)
on 2008-06-01 23:24
(Received via mailing list)
At first, I thought I'd messed up something in the database too. But
running
the *exact* same updates and inserts against the production database,
through the console, yields normal, fast results. The *only* place I see
these 30-40 second updates/inserts is from mongrels that have been under
load for a while; I don't see the slowness when running the exact same
things from console, or from the old FCGI setup.

What could be different about the doing the database queries in Mongrel
that
could cause this? I'm not too clear on exactly how Mongrel differs from
FCGI, other than being faster and not using FCGI (the protocol). Could
it be
possible that the database connections are longer lived, or somehow
shared
between multiple threads, or something like that? I start with the
assumption Mongrel does things the right way, and that I've made some
mistake in configuring my application, but I'm at a loss as to where to
start looking.

Thanks,
Emmett
339adb96fe66114b0f58566f14c8e609?d=identicon&s=25 Tikhon Bernstam (Guest)
on 2008-06-03 08:15
(Received via mailing list)
Hi Emmett,


 I've think I've seen the problem you've described when using
acts_as_ferret
+ ferret DRb server (though I'll assume you aren't actually using ferret
--
as the inimitable Engine Yard guys pointed out this weekend during one
of
their talks, ferret is a common cause of problems for their users.  I
haven't played with ferret in months btw, so this example might be
outdated,
but this example illustrates a more general problem, I think).     in
this
ferret case, the problem, I believe, is that when you have some model
Foo
that uses acts_as_ferret and you call foo.save, the COMMIT on the save
transaction occurs *after* the ferret after_create/after_update hooks.
So
the COMMIT occurs *after* the call to the ferret DRb server.  Normally
this
is ok, but if you are indexing large amounts of text (e.g.) or the DRb
server gets busy for whatever reason, we saw that the save transactions
can
suddenly take a long time.


The example above illustrates a more general point, I think -- be
careful
with what you're doing in your AR hooks.   Again, the problem is that
when
you save your AR object, that save is wrapped in a transaction, and the
commit on that transaction occurs after the AR hooks like after_create.


To verify this, here's a simple example:

# script/generate model foo && rake db:migrate

class Foo < ActiveRecord::Base

  after_create  { sleep 10 }

end


# then from script/console

foo = Foo.create


# now watch your database -- the transaction begins, but the COMMIT
doesn't
occur until after the 10 seconds of sleep.


So what plugins are you using?   And are you using any interesting AR
hooks
that could potentially take a long time (like talking to a DRb server or
uploading files to s3 as an after_create, for example)?


Best,


Tikhon Bernstam

Co-founder, Scribd.com
C683c23bf52cfa8d2f672774c4fa5a41?d=identicon&s=25 Nicolas Escobar (nejrb)
on 2008-06-03 16:17
Tikhon Bernstam wrote:

>  I've think I've seen the problem you've described when using
> acts_as_ferret
> + ferret DRb server

I have exactly the same problem.

Initially I was running acts_as_ferret with a DRb server in a
mongrel_cluster and it was working ok. Then, I changed a field in a
table and restarted the mongrel_cluster. It was then when it stopped
working (same error as posted). I used the backup version and dropped
the field that I created but the same happens.

In 'development' enviroment with a single instance of mongrel it works
though, using acts_as_ferret and DRb server.
This topic is locked and can not be replied to.