We are hitting this error again when running a load test and is
starting to create some concern around the scalability of our app.
Every time we throw a lot of concurrent load at it, tomcat and the
jruby app just seem to fall over withour properly queueing requests
and handling threads properly.
we are getting the invalidecachedescendants error with Jruby 1.2 on
Tomcat 5.5 and Solaris 10 x86. I have a link the pastie error below,
any help here would be greatly appreciated.
http://pastie.org/445077
Well we definitely need to fix it. The problem here seems to be that one
of these maps is being modified while we’re trying to walk it. Do you
have any idea what the other thread might be doing at this point?
I’ll have a quick look at the code right now and see if I can come up
with anything simple to eliminate this problem. If you have some code we
can use to trigger this problem it would be a big help, or if you want
to stop by IRC we can talk about it more.
Well we definitely need to fix it. The problem here seems to be that one
of these maps is being modified while we’re trying to walk it. Do you
have any idea what the other thread might be doing at this point?
I’ll have a quick look at the code right now and see if I can come up
with anything simple to eliminate this problem. If you have some code we
can use to trigger this problem it would be a big help, or if you want
to stop by IRC we can talk about it more.
Don’t fear…it shall be fixed.
Ok, I see a couple simple fixes:
Have new classes always create a new subclasses set, so the set itself
is never directly mutated. This allows iteration to happen safely
without synchronization.
Have iteration construct a copy of the set before iteration. This also
makes it safe, but cause potentially large amounts of useless objects
during invalidation.
Synchronize all use of subclasses set against a global lock.
I’ve implemented the first scenario since I think it represents the
least impact to the system and allows invalidation to be completely
lock-free. There could be a small performance impact when creating new
classes, especially lots of new classes when there’s lots of siblings
(new subclasses set creation will be O(n) on number of siblings). For
the moment, though, I haven’t seen any perf impact.
we are getting the invalidecachedescendants error with Jruby 1.2 on
Tomcat 5.5 and Solaris 10 x86. I have a link the pastie error below,
any help here would be greatly appreciated.
Here’s a test case that seems to blow up almost immediately without my
fix and runs forever with my fix:
class Foo; end
t1 = Thread.new { loop { class Foo; def bar; end; end } }
t2 = Thread.new { loop { Class.new(Foo) } }
sleep 0.1 while t1.alive? and t2.alive?
Give the patch a try and let me know how it feels to you. And if you can
come up with a good case, we would still like to have it.
I went ahead and committed this fix to master, along with an identical
fix for included module hierarchies (which I found could run into the
same issue). It still needs a test, probably based on my quick version.
thanks charles. I will try to get a sample of code that produces the
error. It only seems to come when we run our routine stress tests
which right now are not looking too great since this error gets thrown
very often once the concurrency gets bumped up.
Let me see if i can pear down the stress test to 1 controller in our
rails app.
Thanks for looking into this so quickly. WOuld it be worthwhile to
try this with config.threadsafe! on and off ? Is it the thread safety
thats killing us here ?
Adam
On Mon, Apr 13, 2009 at 7:32 PM, Charles Oliver N. [email protected] wrote:
are there any known “issues” in a rails controller that would cause
this to happen? This might help allow us to easily identify where we
have some non-threadsafe methods in our controllers causing some
havoc.
Maybe something to do with storing sessions in the DB ?
Adam
On Mon, Apr 13, 2009 at 9:05 PM, Charles Oliver N. [email protected] wrote:
Thanks for looking into this so quickly. WOuld it be worthwhile to
Leaking IO channels and eventually failing to open a new one? Error
would get lost as well because of this. You could prstat (or equivalent
on OS X/linux) the process to see if it has scads of open files or
something.
thanks charles. I will try to get a sample of code that produces the
error. It only seems to come when we run our routine stress tests
which right now are not looking too great since this error gets thrown
very often once the concurrency gets bumped up.
Let me see if i can pear down the stress test to 1 controller in our rails app.
Thanks for looking into this so quickly. WOuld it be worthwhile to
try this with config.threadsafe! on and off ? Is it the thread safety
thats killing us here ?
It could help, since it would mean only one thread is active in a given
JRuby runtime at a given moment. The problem largely stems from one
thread adding/updating methods in a given hierarchy while another thread
is creating new classes somewhere downstream in that hierarchy.
Non-threadsafe mode would presumably prevent that from ever happening.
Could the use of a potentially non thread-safe memcached caused this ?
Is there something we can do in environment.rb to force require all
classes or something that can minimize this risk ?
On Mon, Apr 13, 2009 at 11:53 PM, Charles Oliver N. [email protected] wrote:
Grasping at straws:
this to happen? This might help allow us to easily identify where we
AD wrote:
Thanks for looking into this so quickly. WOuld it be worthwhile to
For this thread, it’s possible there’s something in Rails doing this,
like lazy library loading/requiring, autoloads, and the like, but it’s
also possible this is simply a singleton class getting defined in one
thread and a method being defined in another. There’s potentially a lot
of normally benign cases that could break as a result of this problem.
Charles Oliver N. wrote:
are there any known “issues” in a rails controller that would cause
AD wrote:
try this with config.threadsafe! on and off ? Is it the thread safety
Its OK i am just trying to figure out how to backtrack best here. I
can try to send a QUIT to the java process when it happens but not
sure if this is entirely possible (or if i am guaranteed to get the
active thread). Is there anything i can do to help get a dump when
this happens of where it got the exception? Any way to put in a catch
for this error and log a dump of that time ?
I agree it would be most helpful if we could find which lib threw it
out of wack , but not sure how to best do that. I appreciate all the
help here.
Adam
On Wed, Apr 15, 2009 at 2:09 PM, Charles Oliver N. [email protected] wrote:
Could the use of a potentially non thread-safe memcached caused this ?
Is there something we can do in environment.rb to force require all
classes or something that can minimize this risk ?
Yes, that could certainly cause it too. In general the problem is
systemic in Ruby…autoload and require are simply not safe across
threads, and for years people have been using them and getting lucky
that more stuff hasn’t broken.
If you can determine which library is causing the blow-up, we could
possibly try to find a workaround. I know it’s not simple to do,
however, since the thread causing the blow-up (the one creating a new
class) continues happily running. You may be able to get a thread dump
of all threads at that moment, if you’re watching the server right then,
by issueing a QUIT signal to the process (or pressing Ctrl+\ in the
terminal containing the server).
I’m sorry I don’t have a better answer I know this is frustrating,
especially when there’s a fix but no release yet.
Its OK i am just trying to figure out how to backtrack best here. I
can try to send a QUIT to the java process when it happens but not
sure if this is entirely possible (or if i am guaranteed to get the
active thread). Is there anything i can do to help get a dump when
this happens of where it got the exception? Any way to put in a catch
for this error and log a dump of that time ?
The QUIT dump should dump the current stack of all threads.
As far as catching: Yes, you should be able to catch Java exceptions in
Ruby code. If it’s not coming from a nice wrapped-up Java Integration
call (as in this case) you’ll want to rescue the actual exception name.
So in this case:
begin
code that seems to die because of the error
rescue java.util.ConcurrentModificationException
do something to figure out where other threads are
end
I agree it would be most helpful if we could find which lib threw it
out of wack , but not sure how to best do that. I appreciate all the
help here.
Not that I know of, but perhaps we should start a wiki page and begin
gathering those best practices…
AD wrote:
this happens of where it got the exception? Any way to put in a catch
do something to figure out where other threads are
To unsubscribe from this list, please visit:
Yea i will fire one up, i think this is a pretty big deal. My biggest
concern is actually being able to track it down. But some best
practices, known non-thread safe gems, etc will be very helpful.
On Wed, Apr 15, 2009 at 3:15 PM, Charles Oliver N. [email protected] wrote:
do something to figure out where other threads are
ok Charles, here is a link to the xml.builder that we are using in a
view. Its a bit crazy but maybe there is something in there that is
painfully obvious.
So right now we took out some of the recursion and this seems to have
helped, but still trying to nail down where this could be happening.
We are also seeing it in other parts of the app.