Caching from screen scraping


#1

Hi all,

I need to do some screen scraping from my rails app. Given an ethernet
(MAC) adress, I scrape results from an internal web page that returns
location and hostname. How can I cache the result from that screen
scraping as to be polite to the scrapee? I would like to expire the
results daily. In perl, I would use Cache::File. Can I use rails caching
for this? What’s the best way? The screen scrape is used internally in
my app and not viewed directly.

Sincerely,
Jason E.


#2

Here are two ways you can do this:

  1. If you don’t mind the entire action being cached for the 24 hours,
    you
    can use the Rails Action cache, with the new Action Cache plugin that
    adds
    the ability to expire the cache on a timer.

In your action you can do this:

caches_action :my_action

def my_action
	response.time_to_live = 1.day
	# Do some screen scraping
end
  1. If caching the entire action doesn’t work for you, you can use Rails
    fragment caching. Check out the code for the Action Cache plugin to see
    what you can do. Here’s a summary:

    a. Build a small object that keeps the expiry time and the data to
    cache
    b. Use YAML to serialize this (YAML.dump and YAML.load)
    c. Add this to the fragment cache with a unique key (read_fragment
    and write_fragment)

If you build this yourself, it should be about 3 lines of code to save
to
the cache, and about 4 lines to read and check if the cache has expired.