Forum: JRuby invalidateCacheDescendants Error

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-13 18:14
(Received via mailing list)
Hello,

 We are hitting this error again when running a load test and is
starting to create some concern around the scalability of our app.
Every time we throw a lot of concurrent load at it, tomcat and the
jruby app just seem to fall over withour properly queueing requests
and handling threads properly.

 we are getting the invalidecachedescendants error with Jruby 1.2 on
Tomcat 5.5 and Solaris 10 x86.  I have a link the pastie error below,
any help here would be greatly appreciated.

http://pastie.org/445077

Thanks
Adam

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-13 23:59
(Received via mailing list)
AD wrote:
> http://pastie.org/445077
Well we definitely need to fix it. The problem here seems to be that one
of these maps is being modified while we're trying to walk it. Do you
have any idea what the *other* thread might be doing at this point?

I'll have a quick look at the code right now and see if I can come up
with anything simple to eliminate this problem. If you have some code we
can use to trigger this problem it would be a big help, or if you want
to stop by IRC we can talk about it more.

Don't fear...it shall be fixed.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-14 01:05
(Received via mailing list)
Charles Oliver Nutter wrote:
> Well we definitely need to fix it. The problem here seems to be that one
> of these maps is being modified while we're trying to walk it. Do you
> have any idea what the *other* thread might be doing at this point?
>
> I'll have a quick look at the code right now and see if I can come up
> with anything simple to eliminate this problem. If you have some code we
> can use to trigger this problem it would be a big help, or if you want
> to stop by IRC we can talk about it more.
>
> Don't fear...it shall be fixed.

Ok, I see a couple simple fixes:

* Have new classes always create a new subclasses set, so the set itself
is never directly mutated. This allows iteration to happen safely
without synchronization.
* Have iteration construct a copy of the set before iteration. This also
makes it safe, but cause potentially large amounts of useless objects
during invalidation.
* Synchronize all use of subclasses set against a global lock.

I've implemented the first scenario since I think it represents the
least impact to the system and allows invalidation to be completely
lock-free. There could be a small performance impact when creating new
classes, especially lots of new classes when there's lots of siblings
(new subclasses set creation will be O(n) on number of siblings). For
the moment, though, I haven't seen any perf impact.

The patch for this is here: http://gist.github.com/94810

Someone else also reported this bug here:
http://jira.codehaus.org/browse/JRUBY-3551

Give the patch a try and let me know how it feels to you. And if you can
come up with a good case, we would still like to have it.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-14 01:10
(Received via mailing list)
AD wrote:
>  we are getting the invalidecachedescendants error with Jruby 1.2 on
> Tomcat 5.5 and Solaris 10 x86.  I have a link the pastie error below,
> any help here would be greatly appreciated.

Here's a test case that seems to blow up almost immediately without my
fix and runs forever with my fix:

class Foo; end
t1 = Thread.new { loop { class Foo; def bar; end; end } }
t2 = Thread.new { loop { Class.new(Foo) } }
sleep 0.1 while t1.alive? and t2.alive?

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-14 01:32
(Received via mailing list)
Charles Oliver Nutter wrote:
> Give the patch a try and let me know how it feels to you. And if you can
> come up with a good case, we would still like to have it.

I went ahead and committed this fix to master, along with an identical
fix for included module hierarchies (which I found could run into the
same issue). It still needs a test, probably based on my quick version.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-14 02:51
(Received via mailing list)
thanks charles. I will try to get a sample of code that produces the
error.  It only seems to come when we run our routine stress tests
which right now are not looking too great since this error gets thrown
very often once the concurrency gets bumped up.

Let me see if i can pear down the stress test to 1 controller in our
rails app.

Thanks for looking into this so quickly.  WOuld it be worthwhile to
try this with config.threadsafe! on and off ?  Is it the thread safety
thats killing us here ?

Adam

On Mon, Apr 13, 2009 at 7:32 PM, Charles Oliver Nutter
<charles.nutter@sun.com> wrote:
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>   http://xircles.codehaus.org/manage_email
>
>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-14 03:06
(Received via mailing list)
AD wrote:
> thanks charles. I will try to get a sample of code that produces the
> error.  It only seems to come when we run our routine stress tests
> which right now are not looking too great since this error gets thrown
> very often once the concurrency gets bumped up.
>
> Let me see if i can pear down the stress test to 1 controller in our rails app.
>
> Thanks for looking into this so quickly.  WOuld it be worthwhile to
> try this with config.threadsafe! on and off ?  Is it the thread safety
> thats killing us here ?

It could help, since it would mean only one thread is active in a given
JRuby runtime at a given moment. The problem largely stems from one
thread adding/updating methods in a given hierarchy while another thread
is creating new classes somewhere downstream in that hierarchy.
Non-threadsafe mode would presumably prevent that from ever happening.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-14 04:46
(Received via mailing list)
are there any known "issues" in a rails controller that would cause
this to happen?  This might help allow us to easily identify where we
have some non-threadsafe methods in our controllers causing some
havoc.

Maybe something to do with storing sessions in the DB ?

Adam

On Mon, Apr 13, 2009 at 9:05 PM, Charles Oliver Nutter
<charles.nutter@sun.com> wrote:
>> Thanks for looking into this so quickly.  WOuld it be worthwhile to
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>   http://xircles.codehaus.org/manage_email
>
>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-14 05:49
(Received via mailing list)
Grasping at straws:

Leaking IO channels and eventually failing to open a new one? Error
would get lost as well because of this. You could prstat (or equivalent
on OS X/linux) the process to see if it has scads of open files or
something.

- Charlie

AD wrote:
> <charles.nutter@sun.com> wrote:
>>> try this with config.threadsafe! on and off ?  Is it the thread safety
>> To unsubscribe from this list, please visit:
>
>


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-14 05:53
(Received via mailing list)
Ooops, this was meant for the logging.

For this thread, it's possible there's something in Rails doing this,
like lazy library loading/requiring, autoloads, and the like, but it's
also possible this is simply a singleton class getting defined in one
thread and a method being defined in another. There's potentially a lot
of normally benign cases that could break as a result of this problem.

Charles Oliver Nutter wrote:
>> are there any known "issues" in a rails controller that would cause
>>> AD wrote:
>>>> try this with config.threadsafe! on and off ?  Is it the thread safety
>>>
>>
>
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-15 19:55
(Received via mailing list)
Could the use of a potentially non thread-safe memcached caused this ?
 Is there something we can do in environment.rb to force require all
classes or something that can minimize this risk ?



On Mon, Apr 13, 2009 at 11:53 PM, Charles Oliver Nutter
<charles.nutter@sun.com> wrote:
>> Grasping at straws:
>>> this to happen?  This might help allow us to easily identify where we
>>>> AD wrote:
>>>>> Thanks for looking into this so quickly.  WOuld it be worthwhile to
>>>>
>>> ---------------------------------------------------------------------
>>   http://xircles.codehaus.org/manage_email
>
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-15 20:09
(Received via mailing list)
AD wrote:
> Could the use of a potentially non thread-safe memcached caused this ?
>  Is there something we can do in environment.rb to force require all
> classes or something that can minimize this risk ?

Yes, that could certainly cause it too. In general the problem is
systemic in Ruby...autoload and require are simply not safe across
threads, and for years people have been using them and getting lucky
that more stuff hasn't broken.

If you can determine which library is causing the blow-up, we could
possibly try to find a workaround. I know it's not simple to do,
however, since the thread causing the blow-up (the one creating a new
class) continues happily running. You may be able to get a thread dump
of all threads at that moment, if you're watching the server right then,
by issueing a QUIT signal to the process (or pressing Ctrl+\ in the
terminal containing the server).

I'm sorry I don't have a better answer :( I know this is frustrating,
especially when there's a fix but no release yet.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-15 20:20
(Received via mailing list)
Its OK i am just trying to figure out how to backtrack best here.  I
can *try* to send a QUIT to the java process when it happens but not
sure if this is entirely possible (or if i am guaranteed to get the
active thread).  Is there anything i can do to help get a dump when
this happens of where it got the exception?  Any way to put in a catch
for this error and log a dump of that time ?

I agree it would be most helpful if we could find which lib threw it
out of wack , but not sure how to best do that.  I appreciate all the
help here.

Adam

On Wed, Apr 15, 2009 at 2:09 PM, Charles Oliver Nutter
<charles.nutter@sun.com> wrote:
>
> - Charlie
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>   http://xircles.codehaus.org/manage_email
>
>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-15 20:28
(Received via mailing list)
AD wrote:
> Its OK i am just trying to figure out how to backtrack best here.  I
> can *try* to send a QUIT to the java process when it happens but not
> sure if this is entirely possible (or if i am guaranteed to get the
> active thread).  Is there anything i can do to help get a dump when
> this happens of where it got the exception?  Any way to put in a catch
> for this error and log a dump of that time ?

The QUIT dump should dump the current stack of all threads.

As far as catching: Yes, you should be able to catch Java exceptions in
Ruby code. If it's not coming from a nice wrapped-up Java Integration
call (as in this case) you'll want to rescue the actual exception name.
So in this case:

begin
   code that seems to die because of the error
rescue java.util.ConcurrentModificationException
   do something to figure out where other threads are
end

> I agree it would be most helpful if we could find which lib threw it
> out of wack , but not sure how to best do that.  I appreciate all the
> help here.

The above may help narrow it down.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-15 20:28
(Received via mailing list)
are there any "Best practices" we can follow when turning on
config.threadsafe! in Rails that can help guide us here ?

On Wed, Apr 15, 2009 at 2:27 PM, Charles Oliver Nutter
<charles.nutter@sun.com> wrote:
>
>> I agree it would be most helpful if we could find which lib threw it
>   http://xircles.codehaus.org/manage_email
>
>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2009-04-15 21:16
(Received via mailing list)
Not that I know of, but perhaps we should start a wiki page and begin
gathering those best practices...

AD wrote:
>>> this happens of where it got the exception?  Any way to put in a catch
>>  do something to figure out where other threads are
>> To unsubscribe from this list, please visit:
>
>


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-15 21:35
(Received via mailing list)
Yea i will fire one up, i think this is a pretty big deal.  My biggest
concern is actually being able to track it down.  But some best
practices, known non-thread safe gems, etc will be very helpful.

On Wed, Apr 15, 2009 at 3:15 PM, Charles Oliver Nutter
<charles.nutter@sun.com> wrote:
>>>
>>>
>>>  do something to figure out where other threads are
>>> ---------------------------------------------------------------------
>>    http://xircles.codehaus.org/manage_email
>
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-16 22:54
(Received via mailing list)
ok Charles, here is a link to the xml.builder that we are using in a
view.  Its a bit crazy but maybe there is something in there that is
painfully obvious.

http://pastie.org/449091

Adam

On Wed, Apr 15, 2009 at 3:34 PM, AD <straightflush@gmail.com> wrote:
>>>
>>>>> sure if this is entirely possible (or if i am guaranteed to get the
>>>> this case) you'll want to rescue the actual exception name. So in this
>>>>> help here.
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this list, please visit:
>>
>>   http://xircles.codehaus.org/manage_email
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
F15fdc7cb2e911b3808837f2be244add?d=identicon&s=25 AD (Guest)
on 2009-04-17 20:05
(Received via mailing list)
So right now we took out some of the recursion and this seems to have
helped, but still trying to nail down where this could be happening.
We are also seeing it in other parts of the app.

On Thu, Apr 16, 2009 at 4:53 PM, AD <straightflush@gmail.com> wrote:
>> concern is actually being able to track it down.  But some best
>>>> config.threadsafe! in Rails that can help guide us here ?
>>>>>> this happens of where it got the exception?  Any way to put in a catch
>>>>>
>>>>> The above may help narrow it down.
>>>>
>>>
>>>   http://xircles.codehaus.org/manage_email
>>>
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email
This topic is locked and can not be replied to.