Given a users search query my app goes off and scrapes a few sites and provides the results to the user. The user can also choose to filter these results even further by category, age etc and this will be updated via ajax without refreshing. All result items are not static. Except for its title the information for one item will change every 2 hours so theres no point in caching the data to a database. Given that i want to allow filtering of the results how should i go about storing the results after scraping? There will be at most about 1000 results each comprising about 300chars. Can i just store them in a @@results variable? How do i overcome the wiping of the data whilst in development mode? im new to rails but ive also read stuff on sessions, memcache etc but not really sure if they are whats needed for this situtation? Can anyone help?
on 2009-04-05 07:33
on 2009-04-05 10:38
On Sun, Apr 5, 2009 at 5:33 AM, Adam A. <firstname.lastname@example.org> wrote: > 1000 results each comprising about 300chars. > > > > Why not write the results to a file. You could write the raw (pre-scraped) data to a file and re-scrape it or you could save the data structure in some format (YAML is an option here) Andrew T. http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake "I have never let my schooling interfere with my education" - Mark Twain
on 2009-04-05 11:25
You can easily create a table, and stick it in as a row. in rails sqlite is easy enough, if you site is bigger you can use db2. If its like most sites, you make a "result" table that is associated to a user table.
on 2009-04-05 12:31
Hi thanks for your replies. My main concern is performance. The data is not scraped beforehand in advance, its scraped on demand by my users. They submit a search query whch i then perform on several site, scrape their results and aggregate them for the user. My site is basically a meta search engine. Storing results in a db pros: i get to use msql find conditions when the user wants to filter the results even more. cons: ill only be temporarily storng these results. As soon as the user does a new search there gone forever. I dont know the peformance hit of storing a 1000 results in a db consisting of several fields. Is a db still a wise choice? Using YAML: pros: not sure, but hey, i like using it! cons: no msql conditions so id have to create my own methods does the above change anything?
on 2009-04-05 12:57
On Sun, Apr 5, 2009 at 10:31 AM, Adam A. <email@example.com> wrote: > pros: i get to use msql find conditions when the user wants to filter > cons: no msql conditions so id have to create my own methods > > does the above change anything? > > > > -- > Posted via http://www.ruby-forum.com/. > > > > The benefit of YAML is that once you've scraped the data, you probably already have a structure in place which can easily be saved and restored. You could combine the two by storing the YAML in the database. From a performance perspective, consider caching the results of the scraping for at least some period of time so that you don't have to scrape on every search (unless the source websites change VERY frequently) Andrew T. http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake "I have never let my schooling interfere with my education" - Mark Twain
on 2009-04-05 14:17
Thanks Andrew for ruling out any doubts i had regarding using yaml. I will cache the reuslts then for around 2 hours in a db. Im now wondering how this will affect the performance of filtering. My guess is that when a user selects some filters on the results screen, these get passed as params back to the controllers index. Logic there will determine its a request to filter existing results and will access the cache in the db and grab the yaml. Then use yaml to turn the info into the relevant objects and then use enumerators find_all method to filter the results... do you think that approach is ok or is there a better way of doing it? many thanks once again. you have been a great help.
on 2009-04-05 17:28
On Sun, Apr 5, 2009 at 12:17 PM, Adam A. <firstname.lastname@example.org> wrote: > the cache in the db and grab the yaml. Then use yaml to turn the info > Posted via http://www.ruby-forum.com/. > > > > Sounds good to me. I always focus on getting the job done in the simplest way possible first. Then work on optimisation if you see a bottleneck. Your biggest problem is likely to be fetching all the other sites for scraping which caching will hopefully help with. Andrew T. http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake "I have never let my schooling interfere with my education" - Mark Twain
on 2009-04-05 18:02
Excellent thanks once again Andrew! Appreciate your advice.
on 2009-04-06 16:36
on 2009-04-06 17:27
on 2009-04-07 07:52
Here's your problem in rails: Your web server is "single" threaded, so while you scrapping, its not doing anything else, so you will need more mongrels to take care of the users. Generally you scale by having more threads, and cpu working on the problem. The database is probably not going to be your bottleneck for a while, its more the style. Why dont I train you a bit. We can do a screen share/skype session. On Apr 6, 9:27 pm, Adam A. <email@example.com>
on 2009-04-10 02:38
Hi Glennwest,sorry for the late reply. Id be up for chatting over skype if you are. Let me know either here or via a message. Thank you for your kind offer! adam.