Debugging stuck process?

I have a MRI 2.1.3 process in production which is completely
unresponsive to signals and appears to be dead. Debugging it seems to
indicate that all threads are blocked on something but I can’t tell
what.

Here’s the GitHub issue with detailed info. Any tips on how to
determine which threads are deadlocked and/or what lock/mutex they are
waiting on?

Mike

Disclaimer: I have never worked with Sidekiq.

Mike P. wrote in post #1170793:

I have a MRI 2.1.3 process in production which is completely
unresponsive to signals and appears to be dead. Debugging it seems to
indicate that all threads are blocked on something but I can’t tell
what.

If there is no work in any queue and nobody currently tries to add work
to a queue that state (all threads blocked) would be expected.

Here’s the GitHub issue with detailed info. Any tips on how to
determine which threads are deadlocked and/or what lock/mutex they are
waiting on?

It may be more interesting to pull the strace or debug while an attempt
is made to add another work item.

I notice method names like “rr::Locker::setupLockAndCall”, apparently
this is from therubyracer. Maybe there are some unhealthy interactions
between that and Sidekiq.

Additional hint: with thread / process pools of limited size a subtle
form of deadlock can be created that is not caused by flawed application
logic. I encountered it the first time with Oracle Shared Server (see
[1] for details). Basically this will create a limited size pool of
worker processes. Now, here is a deadlock that you can create by
switching from Dedicated Server (the default) to Shared Server:
connection A starts a transaction and locks a resource X. Other
connections also start transactions and want to do something that needs
X unlocked; these transactions are executed and block. Now, if there
are at least as many tasks as the limit of the process pool, the system
will starve because A will never get a chance to complete the
transaction. This will not happen with Dedicated Server (or unlimited
thread / process pools) as A will be able to complete.

[1]
http://docs.oracle.com/cd/B28359_01/server.111/b28310/manproc001.htm#ADMIN11166