Forum: Ruby Nokogiri scraping multiple URLs

Posted by Barry Kavanagh (ninja2k)
on 2013-02-04 10:17
Attachment: asus_nokogiri.rb (772 Bytes)
I am new to Ruby and Nokogiri so excuse my lack of knowledge. I am
wondering is it possible to scrape more than one url in Nokogiri and
output it line by line to excel?

My code scanning a single url is attached.
Posted by Joel Pearson (virtuoso)
on 2013-02-04 10:36
I haven't tested this code, it's just a flow example. There might be
some array flattening or something you'd need to do.

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

my_excel_output = []

my_array =[
[my_url1, css1],
[my_url2, css2]
]
my_array.each do |my_url, my_css|

  doc = Nokogiri::HTML(open(my_url))
  lines = doc.css(my_css).map(&:text)

  my_excel_output << lines

end

#Drop all the data into your spreadsheet.
Posted by Barry Kavanagh (ninja2k)
on 2013-02-04 13:07
Thank you for your reply however I am getting the following error:

C:/Users/barry/Desktop/tester4.rb:16:in `block in <main>': can't convert
Array into String (TypeError)
  from C:/Users/barry/Desktop/tester4.rb:11:in `each'
  from C:/Users/barry/Desktop/tester4.rb:11:in `<main>'
[Finished in 2.3s with exit code 1]


require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

my_excel_output = []

my_array =[
["http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAIC...,
'div#specifications div#spec-area ul.product-spec li'],
["http://www.asus.com/Notebooks_Ultrabooks/S56CA/ #specifications",
'div#specifications div#spec-area ul.product-spec li']
]
my_array.each do |my_url, my_css|

  doc = Nokogiri::HTML(open(my_url))
  lines = doc.css(my_css).map(&:text)

  'C:/Users/Barry/Desktop/output2.xls' << lines

end
Posted by Joel Pearson (virtuoso)
on 2013-02-04 14:05
You're trying to append the data you've scraped (an array) onto a string 
(your filename).

The idea behind my example (of untested code) is that the array 
"my_excel_output" will contain all the lines from all of you scraping.
Then after the loop is complete, you can put that code into your excel 
worksheet all at once. You might need to call "my_excel_output.flatten!" 
first if the arrays are nested too deep.
Your output code still needs to be run, I just didn't include it in my 
answer because it would be repeating what you've already written.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.