How to parse rendered html page

parkurm · February 2, 2010, 2:38pm

I’m trying to write a website that parses all images in a given
webpage. I initially tried to get all image links by looking for
tag, by using nokogiri html parser, and it works well with webpages
without javascript.

Some pages use javascript to render the view, and using nokogiri, I’m
just getting raw html results before it’s rendered.

How can I get a page after being rendered by javascript?

parkurm · February 3, 2010, 10:41am

You can try with http://scrubyt.org/index.html that is like mechanize
but supports javascript.

parkurm · February 2, 2010, 3:42pm

parkurm wrote:

I’m trying to write a website that parses all images in a given
webpage. I initially tried to get all image links by looking for
tag, by using nokogiri html parser, and it works well with webpages
without javascript.

Some pages use javascript to render the view, and using nokogiri, I’m
just getting raw html results before it’s rendered.

How can I get a page after being rendered by javascript?

Webrat + Selenium would be one way.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]