OK, I have a rake task that runs a bit of code which connects up from
a Rails App to a remote data source over a WAN to replicate down some
data from an Oracle server through the Ruby OCI8 library.
This task is expected to take about 1 second per item replicated. I
do it in 1000 item lots. Occasionally the line hangs, or the
connection drops or some part of hell gets too cold, and I loose the
link, at which point we get hanging processes waiting for failures.
To handle this I wrap this whole each block in a time out block set to
2000 seconds.
Each replication is atomic, so as it completes one, it marks it done.
Any that don’t get marked completed get re-replicated. Replicating
the same thing twice isn’t a problem (but I am pretty sure I have
coded around that).
Problem is, it hangs. I have two processes that put time stamps in
their lock files of 16 and 8 hours ago. Longer than 2000 seconds.
The rake task looks like:
task :replicate => :environment do
replicate
end
def replicate
@lock_file = “#{RAILS_ROOT}/tmp/replication.lock”
begin
if File.exist?(@lock_file)
puts “Lock file exists #{@lock_file}”
else
File.open(@lock_file, ‘w’) { |f| f.puts("#{Time.now} - Started
replication")}
@created_lock_file = true
Replication.replicate!
end
ensure
File.unlink(@lock_file) if @created_lock_file
end
end
The replication code looks like this:
def Replication.do_replication
Replication.timeout(2000, “Replication – Timeout error”) do
self.unreplicated.each do |replication|
REPLICATION_LOG.info("#{Time.now} - Replicating id
#{replication.id}")
replication.replicate!
REPLICATION_LOG.flush
end
end
end
def Replication.timeout(time, message)
begin
Timeout::timeout(time) do
yield
end
rescue Timeout::Error => e
REPLICATION_LOG.error("#{Time.now} - Timeout error -
#{message}\n#{e}")
end
end
I can’t see how that is still running after 2000 seconds. Ideas anyone?
Mikel