Hi, I’m new. 
I creating a little rails app, that will crawl the web on a regular
basis and then show the results.
The crawling will be scheduled, likely a cron job.
I can’t wrap my head around where to put my crawler. It doesn’t seem
to fit.
An example:
Model - News Story
Controllers - Grabs a story from the DB, Sort the Stories, Search the
Stories etc.
View - HTML News Story, RSS Story etc.
Then a I have a news crawler, that will go crawl some feeds for new
stories, then insert them into the db. Where do I put it, and how do I
get cron to execute it?
Maybe put it in the NewsStoryController?
Do I make another file for cron to run, that contains something like
this:
nc = NewsStoryController.new
nc.crawl
If I do this will I still have access to the ActiveRecord the same as
usual?
I’m confused, and any guidance or feedback would be awesome!
Thanks,
Chris
On Oct 23, 2006, at 12:06 AM, Chris.Mohr wrote:
nc.crawl
If I do this will I still have access to the ActiveRecord the same as
usual?
I’m confused, and any guidance or feedback would be awesome!
I’d place it in the class of it’s own, in a file under lib/ and
use script/runner to invoke it.
Crawl would be a class method, so you would invoke it like this:
script/runner ‘NewsStoryController.crawl’
or, in production
script/runner -e production ‘NewsStoryController.crawl’
–
– Tom M.
On Oct 23, 2006, at 2:06 AM, Chris.Mohr wrote:
I creating a little rails app, that will crawl the web on a regular
basis and then show the results.
The crawling will be scheduled, likely a cron job.
I can’t wrap my head around where to put my crawler. It doesn’t
seem to fit.
An alternative to the suggestion already offered would be to use
backgroundrb [1] and to set up your crawler as a worker class in
lib/workers
I’m using that to good effect in an application (not yet launched)
which makes heavy use of regular screen scraping.
James.
James S. : Web D.
Work : http://jystewart.net
Play : http://james.anthropiccollective.org