Update page with results as they are scraped


#1

Hi,

Im creating a screenscraper app that takes a users search term, scrapes
several 3rd party sites and returns the aggregated results after a few
seconds.

Rather than wait for all the sites scraping to be completed before
showing the results, I want to show the results as they are scraped, ie.
when one sites results have been scraped and prepared, they are shown
immediately whilst the next set are being done.

Initially I thought id just make a scraper function like so

def scrape_sites

sites.each do |a_site|
a_sites_results = scrape_and_parse_this_site(a_site)
yield a_sites_results
end
end

Then in my controllers js response I could take the results and render
them
but then i found out htat you cant make multiple render calls in one go.

How do i go about doing this? Does this behaviour have a name like say
“live search” etc


#2

ive found that this pattern is called multi stage download. Its used by
kayak.com which is famous for its user interface. That site makes
finding cheap flights actually enjoyable.

Here are some general background links i found

this discusses the pattern in general

http://ajaxpatterns.org/Multi-Stage_Download#Code_Example

and this is a review of the user interface for kayak

http://konigi.com/podcast/kayak-com

But i still dont know where to start looking for a rails implementation
how to.

Where should i start looking (ive googled for ages, cant find anything
appropriate)


#3

On Apr 8, 12:10 pm, Adam A. removed_email_address@domain.invalid
wrote:

and this is a review of the user interface for kayak

http://konigi.com/podcast/kayak-com

But i still dont know where to start looking for a rails implementation
how to.

Where should i start looking (ive googled for ages, cant find anything
appropriate)

Given that this is all client side, what would a rails implementation
be (maybe the odd helper function but the lack of them certainly
shouldn’t stop you getting started).

As far as your particular case goes all you need is something on the
page that every however often pings your controller to say ‘do you
have any more data for me’. It would be wise to push the actual
scraping into a separate process

Fred


#4

pining the controller?

should have read

PINGING the controller.

sorry


#5

Given that this is all client side, what would a rails implementation
be (maybe the odd helper function but the lack of them certainly
shouldn’t stop you getting started).

As far as your particular case goes all you need is something on the
page that every however often pings your controller to say ‘do you
have any more data for me’. It would be wise to push the actual
scraping into a separate process

Thanks Frederick,

cna you (or anybody reading this) provide me with some helper names or
subject names that i should be investigating re: pining the controller?

is this a common rails thing?

thanks so far for your help


#6

On 8 Apr 2009, at 14:28, Adam A. wrote:

To make things clear, pinging just means (in this context) make a
request every so often. If you’re using prototype PeriodicalExecuter
is a good way to go (periodically_call_remote is a helper for that)

Fred


#7

I do something similar in one of my apps. The way I handle it is using
periodically_call_remote every 6 seconds and check it against a
controller action which looks at the DB to see if any new records were
added for that user since the last time it checked. If there are, do a
render :update |page| and add an insert_html with the new record on top.

Naturally you can increase or decrease the time between checks, but I
figure 5-10 seconds should be good for anyone.

For my specific app, if I was scraping 5 items specifically, the
periodically_call_remote function would turn itself off after I returned
5 records.


#8

Excellent, lll go away and look into those methods. Many thanks!


#9

Jack B. wrote:

I do something similar in one of my apps. The way I handle it is using
periodically_call_remote every 6 seconds and check it against a
controller action which looks at the DB to see if any new records were
added for that user since the last time it checked. If there are, do a
render :update |page| and add an insert_html with the new record on top.

Naturally you can increase or decrease the time between checks, but I
figure 5-10 seconds should be good for anyone.

For my specific app, if I was scraping 5 items specifically, the
periodically_call_remote function would turn itself off after I returned
5 records.

Hi Jack,

How do you turn off a periodically_call_remote? Also is it possible to
call periodically_call_remote on some action?

Thanks,
Sudhindra