Hey all
I’m experimenting with writing a scraper at the moment and have hit a
major hump.
Part of the DOM is added after the page has loaded via javascript.
This means when I make an a request the HTML response I receive back
doesn’t accurately represent the page.
Here’s a simplified example:
@http_obj = Net::HTTP.new(“targetdomain.com”)
response, page_data = @http_obj.request_get( “/” )
page data doesn’t contain all of the HTML that is actually shown
Is there anyway library or gem that could simulate the browser
updating the DOM with the Javascript or any other way I could approach
this short of decoding the obfuscated Javascript file?
Thanks in advance
Gav
Gavin M. wrote:
Hey all
I’m experimenting with writing a scraper at the moment and have hit a
major hump.
Part of the DOM is added after the page has loaded via javascript.
This means when I make an a request the HTML response I receive back
doesn’t accurately represent the page.
Here’s a simplified example:
@http_obj = Net::HTTP.new(“targetdomain.com”)
response, page_data = @http_obj.request_get( “/” )
page data doesn’t contain all of the HTML that is actually shown
Is there anyway library or gem that could simulate the browser
updating the DOM with the Javascript or any other way I could approach
this short of decoding the obfuscated Javascript file?
Try Selenium or some other remote browser control.
Thanks in advance
Gav
Best,
Marnen Laibow-Koser
http://www.marnen.org
[email protected]