invalidateCacheDescendants Error


#1

Hello,

We are hitting this error again when running a load test and is
starting to create some concern around the scalability of our app.
Every time we throw a lot of concurrent load at it, tomcat and the
jruby app just seem to fall over withour properly queueing requests
and handling threads properly.

we are getting the invalidecachedescendants error with Jruby 1.2 on
Tomcat 5.5 and Solaris 10 x86. I have a link the pastie error below,
any help here would be greatly appreciated.

http://pastie.org/445077

Thanks
Adam


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#2

AD wrote:

http://pastie.org/445077
Well we definitely need to fix it. The problem here seems to be that one
of these maps is being modified while we’re trying to walk it. Do you
have any idea what the other thread might be doing at this point?

I’ll have a quick look at the code right now and see if I can come up
with anything simple to eliminate this problem. If you have some code we
can use to trigger this problem it would be a big help, or if you want
to stop by IRC we can talk about it more.

Don’t fear…it shall be fixed.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#3

Charles Oliver N. wrote:

Well we definitely need to fix it. The problem here seems to be that one
of these maps is being modified while we’re trying to walk it. Do you
have any idea what the other thread might be doing at this point?

I’ll have a quick look at the code right now and see if I can come up
with anything simple to eliminate this problem. If you have some code we
can use to trigger this problem it would be a big help, or if you want
to stop by IRC we can talk about it more.

Don’t fear…it shall be fixed.

Ok, I see a couple simple fixes:

  • Have new classes always create a new subclasses set, so the set itself
    is never directly mutated. This allows iteration to happen safely
    without synchronization.
  • Have iteration construct a copy of the set before iteration. This also
    makes it safe, but cause potentially large amounts of useless objects
    during invalidation.
  • Synchronize all use of subclasses set against a global lock.

I’ve implemented the first scenario since I think it represents the
least impact to the system and allows invalidation to be completely
lock-free. There could be a small performance impact when creating new
classes, especially lots of new classes when there’s lots of siblings
(new subclasses set creation will be O(n) on number of siblings). For
the moment, though, I haven’t seen any perf impact.

The patch for this is here: http://gist.github.com/94810

Someone else also reported this bug here:
http://jira.codehaus.org/browse/JRUBY-3551

Give the patch a try and let me know how it feels to you. And if you can
come up with a good case, we would still like to have it.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#4

AD wrote:

we are getting the invalidecachedescendants error with Jruby 1.2 on
Tomcat 5.5 and Solaris 10 x86. I have a link the pastie error below,
any help here would be greatly appreciated.

Here’s a test case that seems to blow up almost immediately without my
fix and runs forever with my fix:

class Foo; end
t1 = Thread.new { loop { class Foo; def bar; end; end } }
t2 = Thread.new { loop { Class.new(Foo) } }
sleep 0.1 while t1.alive? and t2.alive?

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#5

Charles Oliver N. wrote:

Give the patch a try and let me know how it feels to you. And if you can
come up with a good case, we would still like to have it.

I went ahead and committed this fix to master, along with an identical
fix for included module hierarchies (which I found could run into the
same issue). It still needs a test, probably based on my quick version.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#6

thanks charles. I will try to get a sample of code that produces the
error. It only seems to come when we run our routine stress tests
which right now are not looking too great since this error gets thrown
very often once the concurrency gets bumped up.

Let me see if i can pear down the stress test to 1 controller in our
rails app.

Thanks for looking into this so quickly. WOuld it be worthwhile to
try this with config.threadsafe! on and off ? Is it the thread safety
thats killing us here ?

Adam

On Mon, Apr 13, 2009 at 7:32 PM, Charles Oliver N.
removed_email_address@domain.invalid wrote:


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#7

are there any known “issues” in a rails controller that would cause
this to happen? This might help allow us to easily identify where we
have some non-threadsafe methods in our controllers causing some
havoc.

Maybe something to do with storing sessions in the DB ?

Adam

On Mon, Apr 13, 2009 at 9:05 PM, Charles Oliver N.
removed_email_address@domain.invalid wrote:

Thanks for looking into this so quickly. WOuld it be worthwhile to


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#8

Grasping at straws:

Leaking IO channels and eventually failing to open a new one? Error
would get lost as well because of this. You could prstat (or equivalent
on OS X/linux) the process to see if it has scads of open files or
something.

  • Charlie

AD wrote:

removed_email_address@domain.invalid wrote:

try this with config.threadsafe! on and off ? Is it the thread safety
To unsubscribe from this list, please visit:


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#9

AD wrote:

thanks charles. I will try to get a sample of code that produces the
error. It only seems to come when we run our routine stress tests
which right now are not looking too great since this error gets thrown
very often once the concurrency gets bumped up.

Let me see if i can pear down the stress test to 1 controller in our rails app.

Thanks for looking into this so quickly. WOuld it be worthwhile to
try this with config.threadsafe! on and off ? Is it the thread safety
thats killing us here ?

It could help, since it would mean only one thread is active in a given
JRuby runtime at a given moment. The problem largely stems from one
thread adding/updating methods in a given hierarchy while another thread
is creating new classes somewhere downstream in that hierarchy.
Non-threadsafe mode would presumably prevent that from ever happening.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#10

Could the use of a potentially non thread-safe memcached caused this ?
Is there something we can do in environment.rb to force require all
classes or something that can minimize this risk ?

On Mon, Apr 13, 2009 at 11:53 PM, Charles Oliver N.
removed_email_address@domain.invalid wrote:

Grasping at straws:

this to happen? This might help allow us to easily identify where we

AD wrote:

Thanks for looking into this so quickly. WOuld it be worthwhile to


http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#11

Ooops, this was meant for the logging.

For this thread, it’s possible there’s something in Rails doing this,
like lazy library loading/requiring, autoloads, and the like, but it’s
also possible this is simply a singleton class getting defined in one
thread and a method being defined in another. There’s potentially a lot
of normally benign cases that could break as a result of this problem.

Charles Oliver N. wrote:

are there any known “issues” in a rails controller that would cause

AD wrote:

try this with config.threadsafe! on and off ? Is it the thread safety


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#12

Its OK i am just trying to figure out how to backtrack best here. I
can try to send a QUIT to the java process when it happens but not
sure if this is entirely possible (or if i am guaranteed to get the
active thread). Is there anything i can do to help get a dump when
this happens of where it got the exception? Any way to put in a catch
for this error and log a dump of that time ?

I agree it would be most helpful if we could find which lib threw it
out of wack , but not sure how to best do that. I appreciate all the
help here.

Adam

On Wed, Apr 15, 2009 at 2:09 PM, Charles Oliver N.
removed_email_address@domain.invalid wrote:

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#13

AD wrote:

Could the use of a potentially non thread-safe memcached caused this ?
Is there something we can do in environment.rb to force require all
classes or something that can minimize this risk ?

Yes, that could certainly cause it too. In general the problem is
systemic in Ruby…autoload and require are simply not safe across
threads, and for years people have been using them and getting lucky
that more stuff hasn’t broken.

If you can determine which library is causing the blow-up, we could
possibly try to find a workaround. I know it’s not simple to do,
however, since the thread causing the blow-up (the one creating a new
class) continues happily running. You may be able to get a thread dump
of all threads at that moment, if you’re watching the server right then,
by issueing a QUIT signal to the process (or pressing Ctrl+\ in the
terminal containing the server).

I’m sorry I don’t have a better answer :frowning: I know this is frustrating,
especially when there’s a fix but no release yet.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#14

AD wrote:

Its OK i am just trying to figure out how to backtrack best here. I
can try to send a QUIT to the java process when it happens but not
sure if this is entirely possible (or if i am guaranteed to get the
active thread). Is there anything i can do to help get a dump when
this happens of where it got the exception? Any way to put in a catch
for this error and log a dump of that time ?

The QUIT dump should dump the current stack of all threads.

As far as catching: Yes, you should be able to catch Java exceptions in
Ruby code. If it’s not coming from a nice wrapped-up Java Integration
call (as in this case) you’ll want to rescue the actual exception name.
So in this case:

begin
code that seems to die because of the error
rescue java.util.ConcurrentModificationException
do something to figure out where other threads are
end

I agree it would be most helpful if we could find which lib threw it
out of wack , but not sure how to best do that. I appreciate all the
help here.

The above may help narrow it down.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#15

are there any “Best practices” we can follow when turning on
config.threadsafe! in Rails that can help guide us here ?

On Wed, Apr 15, 2009 at 2:27 PM, Charles Oliver N.
removed_email_address@domain.invalid wrote:

I agree it would be most helpful if we could find which lib threw it
http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#16

Not that I know of, but perhaps we should start a wiki page and begin
gathering those best practices…

AD wrote:

this happens of where it got the exception? Any way to put in a catch
do something to figure out where other threads are
To unsubscribe from this list, please visit:


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#17

Yea i will fire one up, i think this is a pretty big deal. My biggest
concern is actually being able to track it down. But some best
practices, known non-thread safe gems, etc will be very helpful.

On Wed, Apr 15, 2009 at 3:15 PM, Charles Oliver N.
removed_email_address@domain.invalid wrote:

do something to figure out where other threads are

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#18

ok Charles, here is a link to the xml.builder that we are using in a
view. Its a bit crazy but maybe there is something in there that is
painfully obvious.

http://pastie.org/449091

Adam

On Wed, Apr 15, 2009 at 3:34 PM, AD removed_email_address@domain.invalid wrote:

sure if this is entirely possible (or if i am guaranteed to get the
this case) you’ll want to rescue the actual exception name. So in this
help here.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#19

So right now we took out some of the recursion and this seems to have
helped, but still trying to nail down where this could be happening.
We are also seeing it in other parts of the app.

On Thu, Apr 16, 2009 at 4:53 PM, AD removed_email_address@domain.invalid wrote:

concern is actually being able to track it down. But some best

config.threadsafe! in Rails that can help guide us here ?

this happens of where it got the exception? Any way to put in a catch

The above may help narrow it down.

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email