How to execute time consuming code

Hello all,

I have a screen scraping application (go to a lots of sites, extract 10k
stuff, integrate the results, put them to DB etc). Now i want to use a
Rails application as a frontend to this: The user can push a button
which triggers the screen scraping app and view the results (preferably
asynchronously, but that does not really matter right now).

Questions:

  • Should the screen scraping app reside in the Rails directory structure
    or somewhere else (since it can be viewed as a standalone application-
    Rails just displays data from a DB, does not necessarily need to know
    who and how did generate the DB). OTOH, if it is easier to communicate
    if the screen scraping dir is integrated into the Rails directory
    structure, i don’t have problem to do that either. In this case, where
    should i put the screen scraping code? (We are talking about lets say 50
    classes altogether)

  • How to trigger the screen scraping in the background - i.e. the user
    clicks a ‘start’ button, gets back a ‘screen scraping started’ message
    and can work with the web page further, instead of waiting on the result
    (since that can a last even several hours in extreme cases)

Thanks,
Peter

We’ve solved similar problems the following way:

We have a table of long-running jobs, and a independent process that
polls and processes these jobs. When our rails app needs to trigger a
job, it simply adds a row to the db.

For our job processing code, we still use the rails application
framework, but have new commands (in the script directory) to start
and stop the processor (independently of the server). Both parts of
the application share as much code as possible (including for the most
part the environment.rb and database.yml config).

Tom

On May 22, 2006, at 8:17 AM, Peter S. wrote:

Questions:

  • How to trigger the screen scraping in the background - i.e. the user
    clicks a ‘start’ button, gets back a ‘screen scraping started’ message
    and can work with the web page further, instead of waiting on the
    result
    (since that can a last even several hours in extreme cases)

Consider using BackgrounDRb [1] to handle long-running tasks. It
installs as a plugin and has all the code residing within your Rails
app. It can also communicate to remote servers via DRb so you can do
long-running tasks on another machine if you like.

cr

[1] Ruby on Rails Blog / What is Ruby on Rails for?