Hello all, I have a screen scraping application (go to a lots of sites, extract 10k stuff, integrate the results, put them to DB etc). Now i want to use a Rails application as a frontend to this: The user can push a button which triggers the screen scraping app and view the results (preferably asynchronously, but that does not really matter right now). Questions: - Should the screen scraping app reside in the Rails directory structure or somewhere else (since it can be viewed as a standalone application- Rails just displays data from a DB, does not necessarily need to know who and how did generate the DB). OTOH, if it is easier to communicate if the screen scraping dir is integrated into the Rails directory structure, i don't have problem to do that either. In this case, where should i put the screen scraping code? (We are talking about lets say 50 classes altogether) - How to trigger the screen scraping in the background - i.e. the user clicks a 'start' button, gets back a 'screen scraping started' message and can work with the web page further, instead of waiting on the result (since that can a last even several hours in extreme cases) Thanks, Peter
on 2006-05-22 15:20
on 2006-05-22 18:28
We've solved similar problems the following way: We have a table of long-running jobs, and a independent process that polls and processes these jobs. When our rails app needs to trigger a job, it simply adds a row to the db. For our job processing code, we still use the rails application framework, but have new commands (in the script directory) to start and stop the processor (independently of the server). Both parts of the application share as much code as possible (including for the most part the environment.rb and database.yml config). Tom
on 2006-05-22 18:35
On May 22, 2006, at 8:17 AM, Peter Szinek wrote: > Questions: > > - How to trigger the screen scraping in the background - i.e. the user > clicks a 'start' button, gets back a 'screen scraping started' message > and can work with the web page further, instead of waiting on the > result > (since that can a last even several hours in extreme cases) Consider using BackgrounDRb  to handle long-running tasks. It installs as a plugin and has all the code residing within your Rails app. It can also communicate to remote servers via DRb so you can do long-running tasks on another machine if you like. cr  http://brainspl.at/articles/2006/05/15/backgoundrb...