Mechanize MySQL and threads - deadlock?

First of all: I’m still new to Ruby.

So pointing me to documentation or books is fine.

Use case:

Use mechanize to gather information. Because there are many pages I’d
like to run multiple threads each fetching pages. The fetched data
should be written to a MySQL database.

Can you point me to information telling me how to do this?

The failure looks like this now:

/pr/tasks/get_data_ruby/tasks.rb:364:in join': deadlock detected (fatal) from /pr/tasks/get_data_ruby/tasks.rb:364:inblock in
run_tasks_wait’
from /pr/tasks/get_data_ruby/tasks.rb:364:in each' from /pr/tasks/get_data_ruby/tasks.rb:364:inrun_tasks_wait’
from get-data.rb:37:in `<mai

What is causing such deadlocks at all?

Details about my implementation:

Ruby version: ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
sequel-3.8.0
mysqlplus-0.1.1

Because things always go wrong I’d like store state in database to
resume work where the script failed.

To keep things simple I tried giving each thread it’s own agent and DB
connection:

def newDBConnection
Sequel.connect(
:adapter => ‘mysql’,
:user => ‘root’,
:host => ‘localhost’,
:database => ‘get_data’,
:password=>‘XXX’)
end

share one agent and db connection per thread

class MyThread < Thread
def agent
if !@agent
@agent = Mechanize.new
@agent.max_history =1
end
@agent
end

def db
  @dbCache ||= newDBConnection
end

end

next I defined a task which reuses the db and Mechanize agent from the
thread which is running the task:

class Task
def run
# override
@thread = Thread.current
task
end

def agent
@agent ||= @thread.agent
end

def db
@dbCache ||= @thread.db
end
end

Next I wrote a simple function taking a list of tasks and a thread class
MyThread. it spawns parallel threads each getting a task from the task
list (Queue). They all may add more tasks to the queue.
The script should run until all tasks are done.

t: class extending Thread

tasks: type Queue.new

parallel: num of threads used to run those tasks

def run_tasks_wait(t, tasks, parallel)
working = 0
threads = []

run 3 threads

(1…parallel).each {|i|
threads << t.new {
firstTime = true
while working > 0 || firstTime
firstTime = false
while task = tasks.pop
working += 1
$log.debug(“starting task #{task.to_s}”)
$log.catchAndLog “caught exception in main worker thread” do
task.run if !task.nil?
end
$log.debug(“finished task #{task.to_s} threads-working:
#{working}”)
working -= 1
end
# even if there is nothing left in queue keep thread running if
there is one thread running
# this thread may push additional tasks to the queue
sleep 1
end
} }
# wait for threads
threads.each {|t| t.join() }
end

Thanks for any pointers
Marc W.

t: class extending Thread

tasks: type Queue.new

parallel: num of threads used to run those tasks

def run_tasks_wait(t, tasks, parallel)
Replacing the Queue by an Array seems to fix the issue.

Marc

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs