Re[jruby-user] quest hangs in mysql driver for 15 minutes

We have an issue with our web app where occasionally a request will hang
in
the mysql driver for 15 minutes until we get an idle thread timeout
after
900 seconds and the request finally completes after 935 seconds. I
don’t
have a thread dump for this as its intermittent but a similar thing
happens
to some worker tasks that we have that will lock up for around 30
minutes
after a long period of idle time. The traceback for these are as
follows:

“main” prio=10 tid=0x0000000041b9e800 nid=0x5117 runnable
[0x00007f256c22a000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at
com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
at
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
at
com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)

  • locked <0x00007f24ed530e58> (a
    com.mysql.jdbc.util.ReadAheadInputStream)
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1910)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2304)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2803)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1573)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1665)
    at com.mysql.jdbc.Connection.execSQL(Connection.java:3170)
  • locked <0x00007f24ed5311f8> (a java.lang.Object)
    at com.mysql.jdbc.Connection.setAutoCommit(Connection.java:5273)
  • locked <0x00007f24ed5311f8> (a java.lang.Object)
    at jdbc_adapter.RubyJdbcConnection$2.call(RubyJdbcConnection.java:109)
    at
    jdbc_adapter.RubyJdbcConnection.withConnectionAndRetry(RubyJdbcConnection.java:1086)
    at jdbc_adapter.RubyJdbcConnection.begin(RubyJdbcConnection.java:107)

Looking at the MysqlAdmin, all the connections are sleeping. I believe
this
is the same issue from this post:

http://www.ruby-forum.com/topic/199872

So I suspect the firewall is doing something with our connection but the
client isn’t getting a peer disconnect?

We had a similar issue with our redis connection where the read or write
would lock for a long time until finally resulting in an EBADF exception
and
then things would work again. This was easier to work around as I just
put
a Timeout block around the read/write access.

Any suggestions on possible solutions or workarounds?

Thanks,
Brad


View this message in context:
http://old.nabble.com/Request-hangs-in-mysql-driver-for-15-minutes-tp29237070p29237070.html
Sent from the JRuby - User mailing list archive at Nabble.com.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Hi Brad

Am 22.07.2010 um 15:48 schrieb pardeeb:

We have an issue with our web app where occasionally a request will hang in
the mysql driver for 15 minutes until we get an idle thread timeout after
900 seconds and the request finally completes after 935 seconds. I don’t
have a thread dump for this as its intermittent but a similar thing happens
to some worker tasks that we have that will lock up for around 30 minutes
after a long period of idle time. The traceback for these are as follows:

Wild guess: ensure that your firewall always REJECTs your tcp packets.
Dropping packages might be ‘more secure’, but its poison for any normal
application.

To illustrate this:
if your database software has crashed but the system is still running:
-> any packets (be it syn, or a normal data packet) will cause the
database server to send a REJ, and your application server immediately
knows that something is not okay
if the whole system has crashed, or the system is no longer reachable
–> any packets will timeout, but you usually get an “ICMP not
reachable” a few seconds after the first try and your system knows that
something has gone wrong

==> in both cases the system doesn’t has to wait more than few seconds

Now, if you have a firewall which believes that your tcp connection is
not valid anymore (for example broken Idle settings, tcp itself allows
up to 24h idleness, but many firewall consider connections dead after a
much shorter time), it might silently drop anything between your
application server and your database. This is bad, its bad style and it
is pretty ‘unnatural’ form a network point of view.

This usually only affects active connections, you can verify the problem
by trying to open a new connection (for example with the cli client). If
new connections work without any problems the firewall might drop your
packets.

Cheers
Reto

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Hi Reto,

Thanks, I was on vacation last week so I wasn’t able to check into this
until today. I opened a ticket with the hosting company and repeated
what
you said but they claim that their is no idle timeout so I ran the
following
test using a db on the same subnet and against the production db on the
other DMZed subnet:

while true
  puts "#{Time.now} starting find"
  u = User.find(1)
  puts "#{Time.now} found #{u.inspect}"
  sleep 7200
end

With the DB on the same subnet, it works as expected:
Tue Aug 03 21:01:18 +0000 2010 starting find
Tue Aug 03 21:01:18 +0000 2010 found #<User id: 1>
Tue Aug 03 23:01:18 +0000 2010 starting find
Tue Aug 03 23:01:18 +0000 2010 found #<User id: 1>

With the DB on the other subnet, the request takes 935 seconds after 2
hours
of idle time:
Tue Aug 03 21:16:28 +0000 2010 starting find
Tue Aug 03 21:16:29 +0000 2010 found #<User id: 1>
Tue Aug 03 23:16:29 +0000 2010 starting find
Tue Aug 03 23:32:04 +0000 2010 found #<User id: 1>

Anyway, thanks again for the answer! Looks to me like you were dead on.

View this message in context:
http://old.nabble.com/Request-hangs-in-mysql-driver-for-15-minutes-tp29237070p29341514.html
Sent from the JRuby - User mailing list archive at Nabble.com.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

James,

Thanks for the additional information. It looks like he also wrote an
online article about that problem here:

I’m trying to find a mysql equivalent for dead connection detection.
Also,
I have asked the hosting company if they could enable it on the firewall
as
described here:

http://www.cisco.com/en/US/docs/security/asa/asa82/configuration/guide/conns_connlimits.html#wp1080752

View this message in context:
http://old.nabble.com/Request-hangs-in-mysql-driver-for-15-minutes-tp29237070p29347895.html
Sent from the JRuby - User mailing list archive at Nabble.com.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Michael T Nygard’s Release It describes this very problem (Chapter 4;
the
5am problem on page 39 in my edition).

Oracle has a dead connection detection feature. I don’t know if MySQL
has
something similar.

Otherwise you can have fun with sysctl variables; see
Ipsysctl tutorial 1.0.4, in
particular TCP variables like tcp_retries2.

Cheers,

James

I know there’s ways to configure JDBC connection pools to do periodic
“keep alives” or “liveness checks” to cull bad connections or keep
them fresh. I’ve only ever used it on Oracle, but there’s probably a
way to configure it here too, if you’re using a JDBC pool (or perhaps
Rails’ pool needs something similar).

  • Charlie

On Wed, Aug 4, 2010 at 10:47 AM, pardeeb [email protected] wrote:

described here:
  http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

I couldn’t figure out how to configure the pool to do “keep alives” to I
wrote the following (completely untested) monkey patch which I think
would
accomplish what I need. Fortunately, it looks like the hosting company
came
through and implemented DCD on the Firewall so I won’t need to use it.

module IdleDbconnectionReconnect
class << self
def install!
base = ActiveRecord::Base
base.extend(SingletonMethods)
base.class_eval do
class << self
alias_method_chain :connection, :check_idle_time
end
end
end
end

module SingletonMethods
@@idle_time = {}
@@mutex = Mutex.new

def connection_with_check_idle_time
  connection = connection_without_check_idle_time
  prev_time, curr_time = [nil, nil]
  @@mutex.synchronize do
    prev_time = @@idle_time[connection.object_id]
    curr_time = @@idle_time[connection.object_id] = Time.now.to_i
  end
  if prev_time and (curr_time-prev_time) > 1800
    Rails.logger.info("Resetting connection #{connection.object_id}, 

it
has been idle for #{curr_time-prev_time} seconds")
connection.reconnect!
end
connection
end
end
end

IdleDbconnectionReconnect.install!


View this message in context:
http://old.nabble.com/Request-hangs-in-mysql-driver-for-15-minutes-tp29237070p29356369.html
Sent from the JRuby - User mailing list archive at Nabble.com.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email