On Mon, Sep 6, 2010 at 5:01 PM, Ryan M. [email protected] wrote:
Hi Jesús,
I’m looking to output the information to an .html document (using the
Rails framework) and I’m getting the following error: can’t convert
Fixnum into Array
Also what I’m actually after trying to do is scrap each of the websites
to see if they contain a specific url so I would need to pass in a list
of about 3-4 keywords for each of the domains.
So something like
def index
keywords = %w{accounts resources membership}
sites = %w{http://www.google.com http://www.yahoo.com}
links = []
sites.each {|site| links.concat(scrape(site, keywords[]))}
end
def scrape(website,inputtext)
require ‘open-uri’
require ‘nokogiri’
doc = Nokogiri::HTML(open(website))
for sample in doc.xpath(‘//a’)
if sample.text == inputtext
keywords = doc.xpath(‘//a’)
else
keywords = “MISSING”
end
end
end
Thanks for your time.
So you want to iterate twice, in each site search for a link that
contains the specified word? Do you want to also organize for which
word and site each result comes from? If so, I’d do something like:
def index
keywords = %w{accounts resources membership}
sites = %w{http://www.google.com http://www.yahoo.com}
links_by_site = Hash.new {|h,k| h[k] = {}}
sites.each do |site|
keywords.each do |keyword|
links[site][keyword] = scrape(site, keyword)
end
end
links
end
def scrape(website,inputtext)
require ‘open-uri’ #these could maybe go at the start of the script
require ‘nokogiri’
regex = /#{inputtext}/
links_that_match = []
doc = Nokogiri::HTML(open(website))
doc.xpath(‘//a’).each do |link|
if regex =~ link.inner_text
links_that_match << link.to_html
end
end
links_that_match
end
Untested, but it can give you some ideas. The resulting hash will have
something like:
{“http://www.google.com” => {“accounts” => [], “resources” => []
…
}
Jesus.