Scraping html behind a log-in

csitol · May 17, 2010, 7:16pm

I’ve used libraries, such as hpricot, to scrape html from public sites.
Does anyone know how to get at data behind a user log-in? I’m trying to
automate a service that periodically grabs data for a user. You could
think of this as analogous to what mint.com does with aggregating user
data in the financial sector.

csitol · May 17, 2010, 7:27pm

On Mon, May 17, 2010 at 7:16 PM, Phil Mcdonnell
[email protected] wrote:

I’ve used libraries, such as hpricot, to scrape html from public sites.
Does anyone know how to get at data behind a user log-in? I’m trying to
automate a service that periodically grabs data for a user. You could
think of this as analogous to what mint.com does with aggregating user
data in the financial sector.

You can try Mechanize. I’ve used it to login into a Wiki and edit pages.
I think it was also integrated with Hpricot, or maybe they changed to
Nokogiri. In any case you have an html parser at your disposal.

http://mechanize.rubyforge.org/mechanize/

Checking this, I see it’s nokogiri.

Jesus.