Mechanize newbie

Okay, Ruby in general newbie, but I did the whole shovell project for
RoR, so I felt I was getting somewhere…

I am fooling around trying to make a spider (scraper?) to pull content
off the Forum I read all the time so that I can read it offline.

It seemed like mechanize is exactly what I want. But I try this:

require ‘rubygems’; require ‘mechanize’

agent = WWW::Mechanize.new
page = agent.get(‘http://dapo.org/forums/index.php’)

pp page

puts “\n\n trying to login… \n\n”

Fill out the login form

form = page.forms.first
form.vb_login_username = “username”
form.vb_login_md5password = “password”
form.do =“login”
form.s = “”

page = agent.submit(form)

pp page

pull down a thread

page = agent.get(‘http://dapo.org/forums/archive/index.php?t-2293.html’)

pp page

And it doesn’t login (blank page for that last get). Clues?

Thanks,
–Colin

Hi Colin, I can’t tell you how to do it in mechanize, but I can say
that what you are trying to do is super easy in Watir:
http://openqa.org/watir

Watir (Web Application Testing In Ruby) is primarily used for driving
browser-based test automation, but it has a wonderful API that makes
what you describe very easy. Originally the only choice of browser to
drive was IE, but now the FireWatir and SafariWatir projects are
getting strong as well.

Best of luck, whatever solution you go with,
Jeff

Colin

This line here

form.vb_login_md5password = “password”

Shows the form password value as being an md5password. Are you creating
an
md5 hash out of your password before supplying it to mechanize? If not
then
I would assume that is the problem.

Nathan,

You are correct. I finally figured that part out (with some help from
someone who wrote the same sort of thing in .NET).

Thanks,
–Colin