Forum: Ruby on Rails [ANN] scRUBYt! 0.2.0 - WWW::Mechanize and Hpricot on steroid

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
Peter S. (Guest)
on 2007-02-06 01:41
(Received via mailing list)

I am pleased to announce the first public release of scRUBYt!, a simple
to learn and use, yet very powerful web extraction framework written in
Ruby. Details follow from the README:

scRUBYt! - WWW::Mechanize and Hpricot on steroids

Navigate through the Web, Extract, query, transform and save relevant
data from the Web page of your interest by the concise and easy to use

Do you think that Mechanize and Hpricot are powerful libraries? You‘re
right, they are, indeed - hats off to their authors: without these libs
scRUBYt! could not exist now! I have been wondering whether their
functionality could be still enhanced further - so I took these two
powerful ingredients, threw in a handful of smart heuristics, wrapped
them around with a chunky DSL coating and sprinkled the whole stuff with
a lots of convention over configuration(tm) goodies - and … enter
scRUBYt! and decide it yourself.

Wait… why do we need one more web-scraping toolkit?

After all, we have HPricot, and Rubyful-soup, and Mechanize, and scrAPI,
and ARIEL and scrapes and … Well, because scRUBYt! is different. It has
an entirely different philosophy, underlying techniques, theoretical
background, use cases, todo list, real-life scenarios etc. - shortly it
should be used in different situations with different requirements than
the previously mentioned ones.

If you need something quick and/or would like to have maximal control
over the scraping process, I recommend HPricot. Mechanize shines when it
comes to interaction with Web pages. Since scRUBYt! is operating based
on XPaths, sometimes you will chose scrAPI because CSS selectors will
better suit your needs. The list goes on and on, boiling down to the
good old mantra: use the right tool for the right job!

I hope there will be also times when you will want to experiment with
Pandora’s box and reach after the power of scRUBYt! :-)

Sounds fine - show me an example!

Let’s apply the "show don’t tell" principle. Okay, here we go:
ebay_data = Scrubyt::Extractor.define do

   fetch ''
   fill_textfield 'satitle', 'ipod'
   click_link 'Apple iPod'

   record do
     price '$71.99'
   next_page 'Next >', :limit => 5


       <item_name>APPLE IPOD NANO 4GB - PINK - MP3 PLAYER</item_name>
       <item_name>NEW APPLE IPOD NANO 4GB PINK MP3 PLAYER</item_name>
     <!-- another 200+ results -->

This was a relatively beginner-level example (scRUBYt knows a lot more
than this and there are much complicated extractors than the above one)
- yet it did a lot of things automagically. First of all, it
automatically loaded the page of interest (by going to,
automatically searching for ipods and narrowing down the results by
clicking on ‘Apple iPod’), then it extracted all the items that looked
like the specified example (which btw described also how the output
structure should look like) - on the first 5 result pages. Not so bad
for about 10 lines of code, eh?

OK, OK, I believe you, what should I do?

Check out the online README at:

there, scroll to the on-line version of this section (OK, OK, I believe
you, what should I do?) - there are plenty of links to get you started.



This topic is locked and can not be replied to.