disclaimer: Im not a developer, in fact I know very little of Ruby.
I had this script made for me by Endax, LLC. I have run into problems
with the script and they are unable/unwilling to help resolve the issue.
The script was working fine for a while, but now when I run it I get the
following error in the Ruby editor:
ruby realtor_scraper.rb
realtor_scraper.rb:54:ininitialize': can't convert nil into String (TypeError) from realtor_scraper.rb:54:in
open’
from realtor_scraper.rb:54
Exit code: 1
Here is the script (©2007 OnlineSigns.com)… Any input you can give me
or direction you can send me in would be greatly appreciated! Basically
Ruby can’t open the ‘result.txt’ file to save the results…
require ‘rubygems’
require ‘mechanize’
=begin
- realtor_scraper.rb
- Scrapes email addresses from the realtor.org find a member section
- See workflow section at the end of script for detail
- (c)2007 onlinesigns.com
=end
#Set up the agent
agent = WWW::Mechanize.new
#Ignore these line if not using a proxy
#agent.set_proxy(‘192.168.85.2’, ‘80’)
#agent.read_timeout=10000
#agent.user_agent_alias = ‘Mac Safari’
page = agent.get(“m1.realtor”)
login_form = page.forms.with.name(“__TraceLogin”).first
if login_form == nil
puts “The Realtor site is not available at the moment!!!”
else
#1. Login to Realtor site
login_form.Username = “154440”
login_form.Password = “jake”
login_form.PWORD = “jake”
main_page = agent.submit(login_form)
link = main_page.links.text(“Find a Member”)
if link==nil
puts “This account can’t login to Realtor site!!!”
else
#2. Go to search page
search_page = agent.click link
check_avai = search_page.body.include? "Server is temporarily
unavailable"
if check_avai
puts “Can’t load Search page. Server is temporarily
unavailable!!!”
exit 1
end
link = search_page.frames.with.name(“Viewport”).first
if link==nil
puts “Can’t load the search frame page!!!”
else
frame_search = agent.click link
check_avai = search_page.body.include? “Server is temporarily
unavailable”
if check_avai
puts “Can’t load Search frame. Server is temporarily
unavailable!!!”
exit 2
end
search_form
=frame_search.forms.with.name(“searchMemberForm”).first
#3. Open result file ro prepare to save data
File.open(ARGV[1],"w") do |file1|
#4. Read zip codes from zipcode text file (each zipcode in 1
line in that file)
File.open(ARGV[0],“r”) do |file2|
#5. With each zip code do
while line=file2.gets
# i) Enter zipcode in search form
puts "Processing on zip code: #{line}"
search_form['zip'] = line
search_form['educCerts'] = ''
# ii) Submit search form
result = agent.submit(search_form,
search_form.buttons.with.name(“bottomSearch”).first)
# iii) Open all 'name' links in result page
result.links.with.href(/.*action=getDomesticMember.*/).each
do |link|
begin
agent.transact do
#puts “#{link.text}:#{link.href}”
detail_page=agent.click link
check_avai = search_page.body.include? “Server is
temporarily unavailable”
if check_avai
puts “Can’t process on email of #{link.text}. Server
is temporarily unavailable!!!”
exit 3
end
# iv) Collect the email address in detail page
detail_page.links.with.href(/mailto:.*/).each do |ml|
# v) Output email address in the output file
puts ml.text
file1.puts “#{ml.text}”
file1.flush
end
end
#The server need to be delayed about 5 seconds before
processing next request,
#if not it will be downed as in DOS (denial of service)
attack
#You can check and increase/decrease the delayed time to
process here:
sleep 25
rescue => e
puts “E{e.class}: #{e.message}”
end
end
end
end
end
end
end
end
=begin
Workflow
- Login to Realtor site
- Go to search page
- Open result file ro prepare to save data
- Read zip codes from zipcode text file (each zipcode in 1 line in
that file) - With each zip code do
i) Enter zipcode in search form
ii) Submit search form
iii) Open all ‘name’ links in result page
iv) Collect the email address in detail page
v) Output email address in the output file
=end