Scraping

Hi!
I m new to webscraping but hve rubyful soup in the past. I just wanted
to
know tht hw cn i scrape a particular item frm the page instead of
scraping
the entire page. E.g Hvng got the URL for a person’s profile can i get
only
the name of the person scraped??
plz help.
Regards,
Swanand.

On 2/1/07, swanand deodhar [email protected] wrote:

Hi!
I m new to webscraping but hve rubyful soup in the past. I just wanted to
know tht hw cn i scrape a particular item frm the page instead of scraping
the entire page. E.g Hvng got the URL for a person’s profile can i get only
the name of the person scraped??
plz help.
Regards,
Swanand.

Use Hpricot:
http://code.whytheluckystiff.net/hpricot/


Zack C.
http://depixelate.com

On Nov 17, 8:03 am, venkat [email protected] wrote:

Is there a library/framework for scraping (web)?

I have a few scrapers written but would like to see if there are any
libraries available. I don’t mean Mechanize and Hpricot or any other
parsers for (X)HTML.

If you don’t mean those, what do you mean?

You can always simply fetch the raw page source and run regexps on it.
Is that more what you mean?

You might also like scrAPI
http://rubyforge.org/projects/scrapi/

-Daniel Brumbaugh K.

On Nov 16, 9:36 am, venkat <venkat@> wrote:

TIA

-Venkat

You can check SWExplorerAutomation (SWEA) from http://webius.net. SWEA
separates UI elements binding from the automation script. It makes
SWEA automation scripts more more resilient to UI changes and
dramatically decreases time needed for the script maintainance.

Phrogz wrote:

On Nov 17, 8:03 am, venkat [email protected] wrote:

Is there a library/framework for scraping (web)?

Yeah.

I wrote a little article on this about a year ago, and I almost fell off
the chair when it was referenced in ‘Learning R.’ from O’Reilly.
It describes different web scraping possibilities in Ruby:

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails

Since then I wrote a web scraping framework, scRUBYt! - based on the gem
download stats (nearly 8000) it’s very popular. It’s also very actively
developed and … well enough self-advertisement, please read the
rubyrailways article and decide it for yourself :slight_smile:

Cheers,
Peter


http://www.rubyrailways.com
http://scrubyt.org