Hi all,
I have constructed a code that opens all urls in a textfile one by one,
parses through them and finally saves the results into an excel file.
When I run the code on a textfile with just a few urls, it works
perfectly.
When i run the code on a textfile with many thousands of urls, I get an
error (“in ‘intitialize’: getaddrinfo: Name or service not known
(SocketError)”). What might be causing the issue?
CODE:
require ‘nokogiri’
require ‘open-uri’
require ‘rubygems’
require ‘writeexcel’
workbook = WriteExcel.new(‘parseresult.xlsx’)
worksheet = workbook.add_worksheet
row = 0
File.foreach(“websites.txt”) do |line| #loop on basis urls textfile
searchablefile = Nokogiri::HTML(open(line)) #open each url
#creation of variables
referentieid = searchablefile.at_xpath("//td/strong[contains(text(),
‘Referentie’)]/parent::/following-sibling::")
status = searchablefile.at_xpath("//td/strong[contains(text(),
‘Status’)]/parent::/following-sibling::")
unless searchablefile.at_xpath("//td/strong[contains(text(),
‘Referentie’)]/parent::/following-sibling::").nil?
worksheet.write(row, 1, referentieid.content)
end
unless searchablefile.at_xpath("//td/strong[contains(text(),
‘Status’)]/parent::/following-sibling::").nil?
worksheet.write(row, 2, status.content)
end
row += 1 #next row for next url
end
workbook.close
ERROR:
[email protected]:~$ ruby directerubyparsewoningmarkt.rb
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
initialize': getaddrinfo: Name or service not known (SocketError) from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
open’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
block in connect' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/timeout.rb:44:in
timeout’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/timeout.rb:89:in
timeout' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
connect’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:637:in
do_start' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:626:in
start’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:306:in
open_http' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:769:in
buffer_open’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:203:in
block in open_loop' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:201:in
catch’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:201:in
open_loop' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:146:in
open_uri’
from
/home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:671:in
open' from /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:33:in
open’
from directerubyparsewoningmarkt.rb:12:in block in <main>' from directerubyparsewoningmarkt.rb:10:in
foreach’
from directerubyparsewoningmarkt.rb:10:in `
Thanks a bunch.