I am trying to write an application which uses screen scraping. I have
to first get a screen and get then scrape that page and using the link
to a frame in that page, I have to get that frame and then finally
scrape that frame and get some relevant information. When I make the
first call in the browser, it shows “Please wait while loading…”
message and that is what I am getting in the Nokogiri open call. I get
the proper page in browser once the loading is complete. Now how do I
wait my open call in code to wait so that i get a proper response. I am
using Nokogiri for scraping.
Thanks.
On May 19, 2011, at 9:14 PM, renu mehta wrote:
Thanks.
One more thought about this, are you sure that the page isn’t using
some JavaScript lazy-loading technique? “Please wait…” might be the
actual source, and thus all that a crawler can open.
Walter
On May 19, 2011, at 9:14 PM, renu mehta wrote:
Thanks.
Maybe you could do this in two passes. First, do a traditional
download of the page source, similar to what wget does in spider mode.
Then use a queue or similar to go through the downloaded pages and let
Nokogiri at it (and delete that temp file when done). Someone else may
have a better suggestion, and more in-depth knowledge of the remote
open options, but this might be a way to pursue.
Walter