Detect TCPSocket interruptions/disappearance

reynar · May 19, 2009, 12:31pm

Hi,

I am putting together a background daemon as part of a Rails app to
handle monitoring of our phone system. The phone system exposes an TCP
socket on which it delivers lines of text in real-time corresponding
to events (new call etc).

I have put together a skeleton process to monitor this, shown below.
However I am not sure how to reliably detect when the phone system
goes away, say it is rebooted or similar.

In such a case the code simply hangs on the gets request within the
loop and never returns.

When the phone system is visible again the original connection is
either lost or timed-out, hence nothing is submitted for the daemon to
receive. Ideally I need to detect this and re-open a connection via
the daemon.

Is the gets method even the best method for this scenario?

I had considered having a separate thread monitoring the visibility of
the server and if it disappears signal to cycle the connection.
Although I’m sure there may be a better way…

Any thoughts much appreciated, Andrew.

begin

TCPSocket::open(“192.168.10.130”, 4005) do |socket|

$running = true

Signal.trap("TERM") do
  $running = false
  socket.close
end

socket.puts "LOGIN,\n"
if socket.gets == "LOGINOK\n"
  ActiveRecord::Base.logger.info "Successful Login, Start Monitor"
  while($running) do
    routing_message = socket.gets
    ActiveRecord::Base.logger.info DateTime.now.to_s + " " +

routing_message
end
ActiveRecord::Base.logger.info “Stop Monitor”
else
ActiveRecord::Base.logger.info “Failed Login”
end

end

rescue *ErrorsOnSocket => err
ActiveRecord::Base.logger.info “Socket Error, Retrying…”
sleep 15
retry
end

reynar · May 19, 2009, 3:46pm

Andrew E. wrote:

I have put together a skeleton process to monitor this, shown below.
However I am not sure how to reliably detect when the phone system
goes away, say it is rebooted or similar.

Try setting SO_KEEPALIVE on your socket.

It’s easier if you can send a periodic message to the far side and check
you get a response within a second or two. However if you can’t do that

i.e. you have to be an entirely passive listener - then SO_KEEPALIVE
sends periodic probes at the TCP layer. It may take a few minutes to
detect that the connection has gone though.

reynar · May 20, 2009, 2:42pm

In such a case the code simply hangs on the gets request within the
loop and never returns.

Odd. If the socket closes then gets should return nil

So in this case the only way you know it closed is that…it goes away
without notifying you?
If that’s the case then some type of periodic liveness ping is the only
way, I think (as Brian noted).
-=r

reynar · May 20, 2009, 3:04pm

Roger P. wrote:

In such a case the code simply hangs on the gets request within the
loop and never returns.

Odd. If the socket closes then gets should return nil

Only if the far-end closes the socket normally. If the machine is just
powered off or hard-resets, obviously no packet will be sent. And when
it comes back up, it won’t know that it used to have an open connection.

If you send data to it, it will respond with a RST. But this won’t
happen if you’re just passively listening.