Scrubyt scraper help

camis · October 1, 2010, 10:05pm

Hello all. I’m trying to build a simple web scraper to mine some data
off the yellow pages. Specifically, this link:

I’m scraping all the information that I need correctly. I’m very
pleased about that! However, I’m only able to scrape the first page. I
want my script to automatically go to the next page after the first one
has been scraped, and the next after that. Scrubyt’s “next_page”
function can do this, but it can only use a full URL. On this website,
however, the “Next” link at the bottom is a relative link. Is there any
way I might be able to grab the URL of the website and add the relative
link onto it, and then go to the next page? Or is there another way of
doing it? I really appreciate the help! Thanks so much.

My code is as follows:

require ‘rubygems’
require ‘scrubyt’

yellowpages_data = Scrubyt::Extractor.define do

#Perform the action(s)
fetch ‘http://www.yellowpages.com/santa-barbara-ca/restaurants’

This part does the scraping

listing “//div[@class=‘listing_content’]” do
name “Pascucci”
#street “792 State St,”
street “//span[@class=‘street-address’]”
city “//span[@class=‘locality’]”
state “//span[@class=‘region’]”
zip_code “//span[@class=‘postal-code’]”
phone “//span[@class=‘business-phone phone’]”

       # This is the function I was talking about.  It needs a full

link to work, but I only have a relative one!
next_page “Next”, :limit => 2
end
end

puts yellowpages_data.to_xml.write($stdout, 1)