Design Dilemma - Please Help

chrismalek · October 23, 2006, 8:08am

Hi, I’m new.

I creating a little rails app, that will crawl the web on a regular
basis and then show the results.

The crawling will be scheduled, likely a cron job.

I can’t wrap my head around where to put my crawler. It doesn’t seem
to fit.

An example:
Model - News Story
Controllers - Grabs a story from the DB, Sort the Stories, Search the
Stories etc.
View - HTML News Story, RSS Story etc.

Then a I have a news crawler, that will go crawl some feeds for new
stories, then insert them into the db. Where do I put it, and how do I
get cron to execute it?

Maybe put it in the NewsStoryController?

Do I make another file for cron to run, that contains something like
this:

nc = NewsStoryController.new
nc.crawl

If I do this will I still have access to the ActiveRecord the same as
usual?

I’m confused, and any guidance or feedback would be awesome!

Thanks,

Chris

chrismalek · October 23, 2006, 8:42am

On Oct 23, 2006, at 12:06 AM, Chris.Mohr wrote:

nc.crawl

If I do this will I still have access to the ActiveRecord the same as
usual?

I’m confused, and any guidance or feedback would be awesome!

I’d place it in the class of it’s own, in a file under lib/ and
use script/runner to invoke it.

Crawl would be a class method, so you would invoke it like this:

script/runner ‘NewsStoryController.crawl’

or, in production

script/runner -e production ‘NewsStoryController.crawl’

–
– Tom M.

chrismalek · October 23, 2006, 12:06pm

Awesome…thank you.

chrismalek · October 23, 2006, 4:29pm

On Oct 23, 2006, at 2:06 AM, Chris.Mohr wrote:

I creating a little rails app, that will crawl the web on a regular
basis and then show the results.

The crawling will be scheduled, likely a cron job.

I can’t wrap my head around where to put my crawler. It doesn’t
seem to fit.

An alternative to the suggestion already offered would be to use
backgroundrb [1] and to set up your crawler as a worker class in

lib/workers

I’m using that to good effect in an application (not yet launched)
which makes heavy use of regular screen scraping.

James.

1: http://backgroundrb.rubyforge.org/

James S. : Web D.
Work : http://jystewart.net
Play : http://james.anthropiccollective.org

Design Dilemma - Please Help

Do I make another file for cron to run, that contains something like this:

nc = NewsStoryController.new nc.crawl

nc.crawl

1: http://backgroundrb.rubyforge.org/

Do I make another file for cron to run, that contains something like
this:

nc = NewsStoryController.new
nc.crawl