I want to run a check to see which pages have forms and which ones don’t
from a file with url’s. I’m using the size of the form to make that
determination. But after I get to the 13 url in the file I get an error
and the script exists. Does anyone know why?
f = File.open(“eliminate.txt”)
noformfile = File.new(“noform.txt”, “w+”)
formfile = File.new(“form.txt” , “w+”)
agent = WWW::Mechanize.new
begin
while (line = f.readline)
page = agent.get(line)
forms = page.forms
if forms.size > 0 then
formfile.puts line
else
noformfile.puts line
end
end
rescue EOFError
puts “error”
end
Chuck D. wrote:
I want to run a check to see which pages have forms and which ones don’t
from a file with url’s. I’m using the size of the form to make that
determination. But after I get to the 13 url in the file I get an error
and the script exists. Does anyone know why?
f = File.open(“eliminate.txt”)
noformfile = File.new(“noform.txt”, “w+”)
formfile = File.new(“form.txt” , “w+”)
agent = WWW::Mechanize.new
begin
while (line = f.readline)
page = agent.get(line)
forms = page.forms
if forms.size > 0 then
formfile.puts line
else
noformfile.puts line
end
end
rescue EOFError
puts “error”
end
This is the error message I’m getting:
c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in sysread': end of file reached (EOFError) from c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in
rbuf_fill’
from c:/ruby/lib/ruby/1.8/timeout.rb:56:in timeout' from c:/ruby/lib/ruby/1.8/timeout.rb:76:in
timeout’
from c:/ruby/lib/ruby/1.8/net/protocol.rb:132:in rbuf_fill' from c:/ruby/lib/ruby/1.8/net/protocol.rb:116:in
readuntil’
from c:/ruby/lib/ruby/1.8/net/protocol.rb:126:in readline' from c:/ruby/lib/ruby/1.8/net/http.rb:2017:in
read_status_line’
from c:/ruby/lib/ruby/1.8/net/http.rb:2006:in read_new' from c:/ruby/lib/ruby/1.8/net/http.rb:1047:in
request’
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:514:in
fetch_page' from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:600:in
fetch_page’
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:185:in
get' from ciscoScrape.rb:120 from ciscoScrape.rb:118:in
each’
from ciscoScrape.rb:118
On 9/20/07, Chuck D. [email protected] wrote:
I want to run a check to see which pages have forms and which ones don’t
from a file with url’s. I’m using the size of the form to make that
determination. But after I get to the 13 url in the file I get an error
and the script exists. Does anyone know why?
The error means mechanize could not read the webpage. Find out if it’s
really the 13th url, no matter in what order they are, or whether is
it some particular url that makes problems.
(find the offending url and try that on its own).
If it’s some particular url, try accessing the page from browser.
Otherwise, it might be a problem with mechanize and/or Net::Http or
anything that they use.
Finally few changes/enhancements, not related to your problem:
File.open(“eliminate.txt”) do |f|
noformfile = File.new(“noform.txt”, “w+”)
formfile = File.new(“form.txt” , “w+”)
agent = WWW::Mechanize.new
f.each do |line|
page = agent.get(line)
forms = page.forms
if forms.size > 0 then
formfile.puts line
else
noformfile.puts line
end
end
end
Jano S. wrote:
On 9/20/07, Chuck D. [email protected] wrote:
I want to run a check to see which pages have forms and which ones don’t
from a file with url’s. I’m using the size of the form to make that
determination. But after I get to the 13 url in the file I get an error
and the script exists. Does anyone know why?
The error means mechanize could not read the webpage. Find out if it’s
really the 13th url, no matter in what order they are, or whether is
it some particular url that makes problems.
(find the offending url and try that on its own).
If it’s some particular url, try accessing the page from browser.
Otherwise, it might be a problem with mechanize and/or Net::Http or
anything that they use.
Finally few changes/enhancements, not related to your problem:
File.open(“eliminate.txt”) do |f|
noformfile = File.new(“noform.txt”, “w+”)
formfile = File.new(“form.txt” , “w+”)
agent = WWW::Mechanize.new
f.each do |line|
page = agent.get(line)
forms = page.forms
if forms.size > 0 then
formfile.puts line
else
noformfile.puts line
end
end
end
This is the error message I’m getting. It’s not related to the 13th url
its more like a buf overflow problem. It will crash on anyones pc.
c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in sysread': end of file reached (EOFError) from c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in
rbuf_fill’
from c:/ruby/lib/ruby/1.8/timeout.rb:56:in timeout' from c:/ruby/lib/ruby/1.8/timeout.rb:76:in
timeout’
from c:/ruby/lib/ruby/1.8/net/protocol.rb:132:in rbuf_fill' from c:/ruby/lib/ruby/1.8/net/protocol.rb:116:in
readuntil’
from c:/ruby/lib/ruby/1.8/net/protocol.rb:126:in readline' from c:/ruby/lib/ruby/1.8/net/http.rb:2017:in
read_status_line’
from c:/ruby/lib/ruby/1.8/net/http.rb:2006:in read_new' from c:/ruby/lib/ruby/1.8/net/http.rb:1047:in
request’
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:514:in
fetch_page' from c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:600:in
fetch_page’
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:185:in
get' from ciscoScrape.rb:120 from ciscoScrape.rb:118:in
each’
from ciscoScrape.rb:118
On 9/21/07, Chuck D. [email protected] wrote:
This is the error message I’m getting. It’s not related to the 13th url
its more like a buf overflow problem. It will crash on anyones pc.
Ok, it seems to be a long-time problem - google some words from the
trace, e.g. sysread end of file reached EOFError - it will find posts
from 2005… Though my quick search haven’t revealed any solution. You
might be luckier.